View Full Version : R300 rumours..
Just read on an italian gaming website (http://www.alternative-reality.com/videogames/news/4504.htm):
ATI will present the new R300 GPU at next Comdex (here they say Comdex will be held in Taipei, in June, but that's not true. Computex will be held in In Taipei this June. Next Comdex is in Canada, this July)
Rumoured specs:
0.15 micron technology
350 Mhz core speed
future 0.13 micron manufacturing process for 400 Mhz core speed
8 rendering pipelines textures processed per pipeline: 2 or 4 (??)
800 Mhz DDR memory speed
12,6 GB/s bandwidth
2 TruForm processing units
HydraVision support
DirectX 9 support (PixelShader 2.0, displacement mapping...)
HyperZ 3
256 MB of memory
Pixel fillrate 2.8 mil (?? I know, it doesn't make sense :))
Texel fillrate 8.4 mil (3 TMUs?)
Support for 256 Bit memory. (Pins count??)
that's all folks,
Marco
Doomtrooper
27-Mar-2002, 20:33
Nao whats the date and location of Comdex in Canada this year, Toronto ??
Metro Toronto Convention Centre
July 10-12, 2002
Why on earth do you need 256mb memory? Aren't memory prices rising? I realize that some consumers will go for a card with more memory just on principle (IE: 64mb TNT2 instead of a 32mb geforce2, at one time similar in price). Come on though -- We've just recently seen the advent of playable framerates that resolutions/FSAA settings that challenge 64mb. Shouldn't they go for 128mb ram and pass on the savings?
These specifications are just rumors, nothing more nothing less.
As for 256 MB, they are talking about the max supported amount.
The original Radeon had support for 128 MBs, but did we ever see that card?
It also had support for MAXX technology, did we ever see that?
(It also had a primitive Pixel Shader that was never used ;) )
The only "real" thing we know about R300 is that it will support up to 256 MB configurations. It clearly says so in later drivers.
Doomtrooper
27-Mar-2002, 20:57
Metro Toronto Convention Centre
July 10-12, 2002
Cool I will be there !! :P
Joe DeFuria
27-Mar-2002, 21:15
RE: 256 MB....
Also, assuming all the rumors are true, and I'm reading the rumors correctly, (Supports up to 256 MB Ram, and also supports up to 256 Bit wide DDR external bus), it may be the case that in order to build a card with 256 bit wide bus, it will be required to populate it with 256 MB of ram.
I would imagine that it would be somewhat "easier" to distribute the board traces for a 256 bit board over more banks of ram. So it may be the case that in essence, what you are "buying" with that extra ram is the bandwidth.
Dave Baumann
27-Mar-2002, 21:33
Alternatively they may just be lookinf to replace some of the current FireGL line-up with workstation variants based on R300.
Livecoma
27-Mar-2002, 22:33
That sounds really really expensive.
Typedef Enum
27-Mar-2002, 23:03
These are the same specs as those posted a few days ago by some Russian ATI website (I think it was a Russian fansite)...
The link was posted @ Rage3D 2-3 days ago...virtually all the numbers are same (if not all).
LittlePenny
27-Mar-2002, 23:30
2 TruForm units? I didn't know TruForm was a unit, I just thought it was an algorithm to add more vertices. Is there really a seperate part of the die dedicated to TruForm? Could they just be talking about having two Vertex Buffers?
My take? those specs are complete guesswork.
First, they say 800 mhz DDR, 12.6 gigs/sec. But then they say 256 bit bus... first, not too likely in a consumer level part (many, many more pins on the package)... but if it was indeed a 256 bit part, it'd offer 25.2 gigs/sec theoretical max bandwidth. This spec doesn't even agree with itself....
2 "Truform Units"? What Littlepenny said. Actually, it's more likely they just made up something they thought sounded cool, but really made no sense.
8 pipelines? maybe, but not very likely. People made up the same rumors about the GF4.... but unless they've come across a LOT of extra bandwidth, there's no sense in it.
The rest the stuff (DX9, Hydravision), process sizes, we already knew.
So it's a little bit of publicly available info combined with some stuff that was probably just made up.
If any of you need directions to the Metro Toronto Convention Centre, I have a few easy steps.
1) Open your eyes
2) Find the 181 story tall CN Tower.
3) Head towards it.
4) Just east of it is the MTCC
And its as simple as that LOL ;)
Gotta love Toronto. Hard to get lost. That tower works like a compass
Anyhoo... Does anybody know if this is open to the public? Can I just pay and get in?
Doomtrooper
28-Mar-2002, 01:38
http://www.key3media.com/comdex/canada2001/registration.html
Sabastian
28-Mar-2002, 02:30
http://www.key3media.com/comdex/canada2001/registration.html
Hmm I posted this link before but no one took any notice, :-? but anyhow does this mean that possibly sence ATi is no longer announcing a product before they launch it that they are releasing the R300 in July?
"K.Y. said that in the future ATI will "have products ready for retail" at the time they announce a chip.
In short, ATI won't announce the chip first."
http://www.theinquirer.net/21030210.htm
The part about 2 TruForm units is a good laugh. :)
2 TruForm units? I didn't know TruForm was a unit, I just thought it was an algorithm to add more vertices. Is there really a seperate part of the die dedicated to TruForm? Could they just be talking about having two Vertex Buffers?
Probably just meaning double the performance within the tesselation engine.
From an italian site:
http://www.3ditalia.com
Marco
Sabastian
01-Apr-2002, 22:33
This looks farily legit. I like. 8)
http://www.3ditalia.com/img_news/3/9113/img2.jpg
http://www.3ditalia.com/img_news/3/9113/img1.jpg
http://www.3ditalia.com/show.php3?img=/img_news/3/9113/img1.jpg
Doomtrooper
01-Apr-2002, 23:21
Funny how the PCB Layout is Exactly the Same as a 8500
:wink: :wink:
http://images.anandtech.com/reviews/video/ati/radeon8500/radeon8500.jpg
I see many differences in those PCBs.
Sabastian
02-Apr-2002, 00:11
I see many differences in those PCBs.
Yeah but if you actually take a closer look at the chip you can see that it really appears to be a Radeon 7500. April fools?.......... :wink:
Tatchan
02-Apr-2002, 16:22
I'm sure about 8500XT is using BGA memory as those found in Ti4400/Ti4600, so I bet RV250, R300 also use this kind of memory.
EDIT: BTW, here's the "news" about R300:
http://www.theinquirer.net/02040201.htm
Re: The Inquirer news item :
They say the R300 has 8 pixel pipes with 4 texture units each. Then they state the chip has 2 pixel shaders and 2 vertex shaders.
I'm sorry but hasn't the Inquirer just stated that 8 = 2??
LeStoffer
03-Apr-2002, 08:52
They say the R300 has 8 pixel pipes with 4 texture units each. Then they state the chip has 2 pixel shaders and 2 vertex shaders.
I'm sorry but hasn't the Inquirer just stated that 8 = 2??
Yeah, I was thinking the same thing. :roll:
(Unless, of course, they meant that each of the 8 pipes will have two pixel shaders each! :o :lol: )
What use would the other pixel shader be for then?
The previously last post (by Humus) seems to be on a second page which cannot be reached in the normal way (you can see his post from the forum overview, if you select view oldest first and during replying though).
It could make sense.
Probably no hw can afford a pixel shader per pixel pipe.
Resources had to be shared somewhere if you don't want a
huge chip.
And anyway, if R300 doesn't bring new dramatic bandwith saving scheme or exotic memory types..2 pixel shaders working at the same time may be enough in mid-worst cases, with a lot of fragments per pixel.
ciao,
Marco
LeStoffer
03-Apr-2002, 19:42
Huh?
Okay, I'm just an idiot on this matter (but trying not to be ignorant) but I really thought that there were a pixel shader for each of the pixel pipes on the current designs (that is: 4 pipes = 4 pixel shaders). :oops:
Now you guys are saying that each pipe have to send a request to the sole Pixel Shader to get some of those "fancy pixel-thingys" done? Interesting approach IMO.
4 pipes implies 4 pixel shading units. Otherwise executing pixel shaders would be significantly slower than using standard multitexturing, which isn't the case.
LeStoffer
03-Apr-2002, 20:03
4 pipes implies 4 pixel shading units. Otherwise executing pixel shaders would be significantly slower than using standard multitexturing, which isn't the case.
Thanks Humus. I thought that I had lost my marbles there!
Anyway two pixel shaders per pipe (2x8 pipes = 16!) sounds nuts. You could increase a pixel shader's computing power in other ways than just double the number of them. It doesn't sound right IMVHO.
easyride
03-Apr-2002, 22:40
What about 8 pipes and 350 MHz clock at 0.15 micron? That really sounds impossible. Considering PS and VS 2.0, I'm thinking 110-120 million transistors for a chip like that...
4 pipes implies 4 pixel shading units. Otherwise executing pixel shaders would be significantly slower than using standard multitexturing, which isn't the case.
This is true if the pixel shader pipelines have the same resources (textures sampler, texture fetcher, texture iterators...) of the 'fixed' multitexturing pipe. Obviously this is the case with current architectures.
Couldn'be the same case with next hw iteration, less pixel shader pipes, but with a lot of resources (we know pixel shaders 2.0 are very demanding...). I don't know how much is the transistor budget in this case..so..we have just to wait and see :)
ciao
Marco
It would make sense for future hardware to have less pixel shading units than pipelines, especially now that we will get floating fragment data. Bandwidth continues to be the main performance constraint, so adding more fullblown pipes doesn't make much sense. But there are still situations where the bandwidth need is low and where adding number crushing capabilities will speed things up. Adding lightweight pipes which can only do a subset of the most common low bandwidth stuff, like depth/stencil, could be very cheap solution and beneficial for future games like Doom3.
100% agreed.
Do you believe R300 will have 8 full blown pixel shaders? I don't.
At least it seems a stupid thing to do if they doesn't have big bandwith and/or efficiency improvemnts.
A full 2.0 pixel shader pipeline will eat a considerable amount of transistors...I don't see that multiplied by 8 :)
Anyway..we'll see soon..
ciao,
Marco
Dave Baumann
04-Apr-2002, 14:24
At least it seems a stupid thing to do if they doesn't have big bandwith and/or efficiency improvemnts.
Why do you think extra bandwidth will be required? Pixel shaders are generally computationally expensive, AFAIK, rather than bandwidth expensive - bandwidth usage may decrease with increased Pixel Shader use becuase more is spent processing on chip.
Why do you think extra bandwidth will be required? Pixel shaders are generally computationally expensive, AFAIK, rather than bandwidth expensive - bandwidth usage may decrease with increased Pixel Shader use becuase more is spent processing on chip.
PS are computationally expensive but can be also very bandwith demanding. In the 2.0 revision there are 16 textures sampling registers and one can do as much as 32 address operations.
One can think that bandwith requirements grow as much as the number of calculations and so we can stay with the current bandwith till memory can keep up with calculatoins..but this could not be the case, imho, in many situations.
Also keep in mind that we're going to see a increased use of dependent texture reads. Those can't be reordered and are going to potentially stall the pipeline. Moreover, DTR have this nasty side effect to destroy memory access locality, thus making texture caches less effective and so requiring extra bandwith for textures sampling.
Obviously those are just speculations of mine, I'm not a 3d architect or engineer (now someone is thinking: yeah, we are well aware of that :oops: )
ciao,
Marco
Dave Baumann
04-Apr-2002, 15:19
PS are computationally expensive but can be also very bandwith demanding. In the 2.0 revision there are 16 textures sampling registers and one can do as much as 32 address operations.
That also depends on the texture sampling capabilities of the pipeline – if you suddenly up the card to have 16 texture units per pipe (not exactly likely with 8 pipes!) then you are going to have a much bigger bandwidth drain; but if you keep with a similar number of texture units per pipe as today then you are only increasing clock cycles.
That also depends on the texture sampling capabilities of the pipeline – if you suddenly up the card to have 16 texture units per pipe (not exactly likely with 8 pipes!) then you are going to have a much bigger bandwidth drain; but if you keep with a similar number of texture units per pipe as today then you are only increasing clock cycles.
Obviously I don't want the card to have 16 TUs per pixel pipe :)
The NG graphics hardware can just keep the current number of TUs per pipe, that's not my point. I was just saying that it just doesn't make sense (to me) to increase this current number of TUs even if we are going to have more pixel pipes. There is also another problem, with an increased number of textures to fetch per pixel the hw have to open more and more memory pages than before, efficiency is going down...
Maybe this approach is gonna hurt theoretical multitexturing fillrate, but I can't see it hurting real-world performance with complex scenes, thus freeing some transistor count to devote to some other feature, till core clock can keep up with memory clock, and that should be the case.
I see the memory as the bottleneck, not the pixel shaders calculations with future shaders. GPUs core clocks are growing fast....
ciao,
Marco
Hmm, this thread seems to have gotten a bit confused...
Just because a pixel shader can issue 32 addressing instructions doesn't mean that they all need to be issued in the same clock cycle. If the HW supports 2 texture addressing units per pixel pipe then thats the maximum number of textures that can be fetched per clock (per pipe), so the maximum BW required is the same irrespective of the number of addressing operations supported i.e 2 TMU's will take 16 clocks to fetch 32 textures.
With regard to the number of pixel shaders it is very unlikely that this would differ from the number of physical pipelines. What may vary is the instruction execution rate per clock per pipeline. I'd be surprised if anyone would ever do less than 1/clock/pipe for PS2.0 as the operations are all quite basic (unlike ps1.x). In general you'd probably want to match the blending instruction rate to the addressing instruction rate to minimise stalls on the addressing side. This said a computationaly complicated pixel shader program (many more blending than adressing ops) will almost invariably stall the pipelines (unless you execute a ridiculous number of blend ops/pipe/clock), so require BW actually goes down in this case.
More interresting will be the memory configuration used to feed 8 pipelines, a 256bit wide external mem bus has some nasty problems with things like pin count.
John.
BW required is the same irrespective of the number of addressing operations supported i.e 2 TMU's will take 16 clocks to fetch 32 textures.
Umh? It seems to me no one has said that required BW per clock is going to increase..
With regard to the number of pixel shaders it is very unlikely that this would differ from the number of physical pipelines.
why?
This said a computationaly complicated pixel shader program (many more blending than adressing ops) will almost invariably stall the pipelines (unless you execute a ridiculous number of blend ops/pipe/clock), so require BW actually goes down in this case.
agreed. But that is particulare case, what about the other way around? :)
More interresting will be the memory configuration used to feed 8 pipelines, a 256bit wide external mem bus has some nasty problems with things like pin count.
That's why we're all excited to see what ATI has developed this time :)
ciao,
Marco
Umh? It seems to me no one has said that required BW per clock is going to increase..
Yep, just me flash reading the thread.
Why wouldn't I expect the number of pixel shaders to differ from the number of pixel pipeline ? Well I'm not sure what the point would be of having 8 pipelines if it wasn't possible to run at full rate even with the simplest of shader programs. Architecturally funneling 8 pipes into a lesser number of processing units would be a nightmare, particularily if you wanted some kind of fast bypass mode for non shader based blending (which is also strange given that you'd just be bypassing into yet another blending unit).
John
Well I'm not sure what the point would be of having 8 pipelines if it wasn't possible to run at full rate even with the simplest of shader programs.
Yeah..we started this discussion from this point :)
The questions are:
1) Will next generation IMRs have enough bandwith to feed 8 pipes? (may marketing-men mind lead engineers hand?)
2) Is there enough die area to devote to full 8 pixel shader pipelines (PS 2.0)?
Architecturally funneling 8 pipes into a lesser number of processing units would be a nightmare, particularily if you wanted some kind of fast bypass mode for non shader based blending (which is also strange given that you'd just be bypassing into yet another blending unit).
Ok, I see that.
Do you think (it seems you're a 3D hw architect, are you? :)) is it possible to have a certain amount of hw resources like: iterators, texture samplers, etc..and let the driver analyze a pixel shader at shader compile time to come up with the best resources allocation? (given your previous answer I'd say no, but I keep asking..)
I mean..do you believe is it feasible to have a set of highly flexible pixel pipelines, even in their quantity/capabilities..?
Or would it be useless and/or too much complex to do in your opinion?
ciao,
Marco
LeStoffer
06-Apr-2002, 16:18
Regarding 8 pipes and bandwidth: If the extra pipes were used to do FSAA (multisample that is!) it shouldn't demand more bandwidth since they can just fetch from what is already in the texture cache AFAIK.
Orherwise 8 pipes sounds nuts for three reasons that I can see:
1) In a DX9 design you would need a larger on chip cache if you're going to have full advantage of 8 texture address per pass (in 8 different pipes)
2) As the number of polygons increases there would be more and more situations where a polygon is so small that it wont "cover" 8 pixels thus leaving from 1 to 7 pipes idle.
3) There wont be silicon enough for this crazyness.
Since DX9 is going to be more assembly language-like you could instead of the 8 pipes give more power to the 4 pixel shader processing units (a la making a more strong FP-unit in a CPU).
Regarding 8 pipes and bandwidth: If the extra pipes were used to do FSAA (multisample that is!) it shouldn't demand more bandwidth since they can just fetch from what is already in the texture cache AFAIK.
From this point of view n20+ has 16 pixel pipes then :)
2) As the number of polygons increases there would be more and more situations where a polygon is so small that it wont "cover" 8 pixels thus leaving from 1 to 7 pipes idle.
I heard rumours about R300 capable of rasterize more than one non-overlapping primitive at the same time.
3) There wont be silicon enough for this crazyness.
Maybe..or maybe not :)
ciao,
Marco
Could each pipeline have its own cache working independent from the other? Allowing for a dramatic increase in rendering ability but little increase in bandwidth from onboard memory. I really don't know since I am not a hardware engineer.
Foodman
09-Apr-2002, 05:27
I like noko's idea, but I also have no clue how feasable it is.
Sharkfood
09-Apr-2002, 06:20
I'm more interested in AA and anisotropy rumors.
When are the rumors concerning gaussian distribution sampled 16xRGSS @ 60 fps and 128x anisotropy going to start circulating?
I really don't know since I am not a hardware engineer.
Neither I, but according nvidia's patent about a texture cache architecture, they have a unique cache that can address four requests per clock (I'm ultra-over-simplifying here..), one per pixel pipeline I believe. So they have the hit rate of a single cache coupled with the efficiency of multiple texture caches (with a proper cache size adjustament..)
ciao,
Marco
I'm quite confused how the term "pixel shader" is used in this thread. I'd prefer the term "PS capable pipeline".
As i see it, there are TMUs that get texture coordinates, fetch texels and output filtered (color) values. And there are combiners that do the arithmetics stuff on these values. TMUs and combiners are arranged in a pipeline, and if this pipeline is able to perform a certain set of operations, it can be called PS x.x capable.
It might make sense to additionally have several non-PS pipelines for things like stencil ops, where "simple" fillrate is necessary. As long as they don't need much die space.
As chips become more and more programmable, i could imagine a chip with several "pixel ALUs" that do the calculations (each working on a different primitive) and several TMUs that could arbitrarily be assigned to the ALUs.
Yeah, I find it interesting how the word "shader" is used these days. No distinction is being made between the program and the hardware by most people, which is kinda odd. I've tried to be a little more clear lately by saying "pixel shading unit" when talking about the hardware and "pixel shader" when talking about the software. Interestingly, the word pixel shader really isn't appropriate for the software part either, should really be called "fragment program" or possibly "fragment combiner". :)
Hm, what's up with this forum? Page 3 of 2? Strange...
Kind of what I was trying to get across in my post's.
I think the problem is that the articale that started this thread wasn't technically "useful".
John.
vBulletin® v3.8.6, Copyright ©2000-2013, Jelsoft Enterprises Ltd.