AMD: R7xx Speculation

Status
Not open for further replies.
Sure but why looks it like a perfect 32 TMU result and not something like 34 or 37 TMUs?
BW and interpolation performance should be high enough.



Z-Fill NV AMD
4xAA 100% 100%
8xAA 20% 50%
;)

Where are your numbers from? Only asking because I'd like to know why that is the case.
 
It really depends how you define competition. I was meaning in absolute performance terms. i.e. R300 was the fastest GPU out there bar none. The 4800 series certainly aren't that so if you want the fastest then the 4800 loses to GT200.
Oh, I thought you were talking about the 9800 GTX, since you mentioned it the sentence before. Anyway, I'm not making the R300 comparison. I think it's more like the 6600GT.

EDIT: Given how well the 9800 GX2 does against the 280, I don't see why the 4870x2 (or whatever it's called) won't clobber it. Basically NVidia's only chance for leadership is a 260 GX2. I don't even know if they can pull that off, let alone the $800-$1000 street price.

Price of course is a completely different matter but depending on how pricing ends up working out, in many people eyes the 4870's primary competition could be the 260. AMD would prefer us to think its competing with the 9800GTX, and beating it at a higher price but its possible NV could say exactly the same thing for GT260 vs 4870.
Actually, I think AMD is aiming the 4850 at the 9800 GTX, and the 4870 will be aimed at 260. The 4870 will be aimed at the huge market between the 9800GTX and the GTX 280, i.e. the $250-$600 single GPU market.

What can NVidia do? The 9800 GTX+ is too slow, and the 260 is too expensive. I don't think the 640MB GTS ever got much below $300, and that huge chip is far smaller than the 260. Can they really make money on a $300 448MB GTX 260, and would people even buy it?

If the 4870 really is clocked 36% higher with 80% more BW, it's unlikely for it not to be right up there with the 260 if not faster.
 
Last edited by a moderator:
HW.fr made in 2900XT review a Z-fill test and I do not see any big improvement on G200, cache-hierarchy still seems not made for 8xMSAA.

Hope someone does some fill rate tests between the 8800GTX/U and the new 260/280GTXs. I'd like to see those numbers.
 
What can NVidia do? The 9800 GTX+ is too slow, and the 260 is too expensive. I don't think the 640MB GTS ever got much below $300, and that huge chip is far smaller than the 260. Can they really make money on a $300 448MB GTX 260, and would people even buy it?

If the 4870 really is clocked 36% higher with 80% more BW, it's unlikely for it not to be right up there with the 260 if not faster.
ATI would need a 1GB 4870 to compete with the GTX 260 (896MB is a good amount more than 512). It will have to be more than $300, at least $350-375 which puts it pretty close to the presumably still faster GTX 260 especially if that card's price gets reduced closer to $400. I will wait and see how this plays out before I buy anything, if Crossfire problems are truly solved this gen then I might get 2X 4870 and then sell them for a R700 when it finally arrives.
 
One question. Is overclocking the 4850 part of the still NDAed list? I have not seen any sites overclock yet. It will be interesting to see the performance gains as the card reaches 700mhz.


One other thing in regards to all the tech info. As I was lead to believe from the embargo lift memo that was posted only infos on the boxes could be posted. That may be why we still do not know about the ring bus, etal.
 
ATI would need a 1GB 4870 to compete with the GTX 260 (896MB is a good amount more than 512). It will have to be more than $300, at least $350-375 which puts it pretty close to the presumably still faster GTX 260 especially if that card's price gets reduced closer to $400. I will wait and see how this plays out before I buy anything, if Crossfire problems are truly solved this gen then I might get 2X 4870 and then sell them for a R700 when it finally arrives.

IIRC someone posted on this or some other forums that some shop had 1GB HD4870 on (pre-)sale for 259€
 
i think this is important for r700...... :devilish:


clamshellrb6.jpg
 
There's no doubt that Nvidia can still point to a card above the ATI's range and say "we're still faster", but at a price that is simply not viable when you can spend half on 48xx and get nearly the same performance.
It's possible that once R700 comes out Nvidia may not even be able to do that. I guess they can still point to GTX280 SLI as the ultimate in performance, but that will be commanding a truly stratospheric price. If R700 actually comes out cheaper than a single GTX280, but outperforms it, ATI will be in really quite a comfortable position. I suspect the number of potential customers who are concerned about micro-stuttering, etc. is a pretty small percentage: most of them will just look at the frames-per-second benchmarks and go with the higher fps at a lower price. I expect to see a GTX280 price crash once R700 ships (in the same way as R580 prices were forced down in price by dual-chip Nvidia cards).
 
Code:
ATI Radeon HD 4800 series :

956 million transistors on 55nm fabrication process
PCI Express 2.0 x16 bus interface
256-bit GDDR3/GDDR5 memory interface
Microsoft® DirectX® 10.1 support
Shader Model 4.1
32-bit floating point texture filtering
Indexed cube map arrays
Independent blend modes per render target
Pixel coverage sample masking
Read/write multi-sample surfaces with shaders
Gather4 texture fetching
Unified Superscalar Shader Architecture
800 stream processing units
Dynamic load balancing and resource allocation for vertex, geometry, and pixel shaders
Common instruction set and texture unit access supported for all types of shaders
Dedicated branch execution units and texture address processors
128-bit floating point precision for all operations
Command processor for reduced CPU overhead
Shader instruction and constant caches
Up to 160 texture fetches per clock cycle
Up to 128 textures per pixels
Fully associative multi-level texture cache design
DXTC and 3Dc+ texture compression
High resolution texture support (up to 8192 x 8192)
Fully associative texture Z/stencil cache designs
Double-sided hierarchical Z/stencil buffer
Early Z test, Re-Z, Z Range optimization, and Fast Z Clear
Lossless Z & stencil compression (up to 128:1)
Lossless color compression (up to 8:1)
8 render targets (MRTs) with anti-aliasing support
Physics processing support
Dynamic Geometry Acceleration
High performance vertex cache
Programmable tessellation unit
Accelerated geometry shader path for geometry amplification
Memory read/write cache for improved stream output performance
Anti-aliasing features
Multi-sample anti-aliasing (2, 4 or 8 samples per pixel)
Up to 24x Custom Filter Anti-Aliasing (CFAA) for improved quality
Adaptive super-sampling and multi-sampling
Gamma correct
Super AA (ATI CrossFireXâ„¢ configurations only)
All anti-aliasing features compatible with HDR rendering
Texture filtering features
2x/4x/8x/16x high quality adaptive anisotropic filtering modes (up to 128 taps per pixel)
128-bit floating point HDR texture filtering
sRGB filtering (gamma/degamma)
Percentage Closer Filtering (PCF)
Depth & stencil texture (DST) format support
Shared exponent HDR (RGBE 9:9:9:5) texture format support
OpenGL 2.0 support
ATI Avivoâ„¢ HD Video and Display Platform
2nd generation Unified Video Decoder (UVD 2)
Enabling hardware decode acceleration of H.264, VC-1 and MPEG-2
Dual stream playback (or Picture-in-picture)
Hardware MPEG-1, and DivX video decode acceleration
Motion compensation and IDCT
ATI Avivo Video Post Processor
New enhanced DVD upconversion to HD new!
New automatic and dynamic contrast adjustment new!
Color space conversion
Chroma subsampling format conversion
Horizontal and vertical scaling
Gamma correction
Advanced vector adaptive per-pixel de-interlacing
De-blocking and noise reduction filtering
Detail enhancement
Two independent display controllers
Drive two displays simultaneously with independent resolutions, refresh rates, color controls and video overlays for each display
Full 30-bit display processing
Programmable piecewise linear gamma correction, color correction, and color space conversion
Spatial/temporal dithering provides 30-bit color quality on 24-bit and 18-bit displays
High quality pre- and post-scaling engines, with underscan support for all display outputs
Content-adaptive de-flicker filtering for interlaced displays
Fast, glitch-free mode switching
Hardware cursor
Two integrated DVI display outputs
Primary supports 18-, 24-, and 30-bit digital displays at all resolutions up to 1920x1200 (single-link DVI) or 2560x1600 (dual-link DVI)3
Secondary supports 18-, 24-, and 30-bit digital displays at all resolutions up to 1920x1200 (single-link DVI only) *3
Each includes a dual-link HDCP encoder with on-chip key storage for high resolution playback of protected content *4
Two integrated 400MHz 30-bit RAMDACs
Each supports analog displays connected by VGA at all resolutions up to 2048x15363
DisplayPortâ„¢ output support
Supports 24- and 30-bit displays at all resolutions up to 2560x16003
HDMI output support
Supports all display resolutions up to 1920x10803
Integrated HD audio controller with up to 2 channel 48 kHz stereo or multi-channel (7.1) AC3 enabling a plug-and-play cable-less audio solution
Integrated AMD Xilleonâ„¢ HDTV encoder
Provides high quality analog TV output (component/S-video/composite)
Supports SDTV and HDTV resolutions
Underscan and overscan compensation
MPEG-2, MPEG-4, DivX, WMV9, VC-1, and H.264/AVC encoding and transcoding
Seamless integration of pixel shaders with video in real time
VGA mode support on all display outputs
ATI PowerPlayâ„¢
Advanced power management technology for optimal performance and power savings
Performance-on-Demand
Constantly monitors GPU activity, dynamically adjusting clocks and voltage based on user scenario
Clock and memory speed throttling
Voltage switching
Dynamic clock gating
Central thermal management – on-chip sensor monitors GPU temperature and triggers thermal actions as required
ATI CrossFireXâ„¢ Multi-GPU Technology
Scale up rendering performance and image quality with two GPUs
Integrated compositing engine
High performance dual channel bridge interconnect *5
 
Sure but why looks it like a perfect 32 TMU result and not something like 34 or 37 TMUs?
BW and interpolation performance should be high enough.
Efficiency of tus might be down - well it certainly will be in certain situations (since if some arrays are not using their texture units these will be idle now, instead of the old scheme where all units could be used for a single array if necessary). But this shouldn't affect those simple fillrate tests.
Instead, I think it might be possible that the quad-tus might not quite be able to deliver 4 filtered texels per clock. Since they basically need to loop for four clocks to deliver the 16 filtered texel results to the array, I'd consider it possible they "waste" an initial cycle somewhere and thus would basically need 5 clocks instead of 4 for this. Thus peak bilinear rate would only be 4/5th. That's very speculative though, but imho the different tu arrangement seems to be the biggest architectural change compared to rv670...
 
Status
Not open for further replies.
Back
Top