Unless it's in Mass Effect :smile:
Why the hell is the 9800 GTX so far down the list?
Unless it's in Mass Effect :smile:
From the Wikipedia link you provided:
"NUMA attempts to address this problem by providing separate memory for each processor"
Yeah, NUMA stands for non-uniform memory access which of course would mean a greatly extended ring bus for the MC's; but if the 4870x2 is indeed one memory space for two GPU's wouldn't that contradict with the sentence I quoted above?
I don't mean to "disagree", I don't know about this enough to disagree with anybody I'm just trying to understand. :smile:
The context is "Multi-processor systems make the problem considerably worse. Now a system can starve several processors at the same time, notably because only one processor can access memory at a time".From the Wikipedia link you provided:
"NUMA attempts to address this problem by providing separate memory for each processor"
Yeah, NUMA stands for non-uniform memory access which of course would mean a greatly extended ring bus for the MC's; but if the 4870x2 is indeed one memory space for two GPU's wouldn't that contradict with the sentence I quoted above?
I don't mean to "disagree", I don't know about this enough to disagree with anybody I'm just trying to understand. :smile:
The context is "Multi-processor systems make the problem considerably worse. Now a system can starve several processors at the same time, notably because only one processor can access memory at a time".
There are 3 primary kinds of memory usage in a GPU, vertex data, textures and render targets (though D3D10 blurs the boundaries making them all mutable). Anyway dedicated parts of each GPU make their own accesses to these kinds of data, independently of other clients. Within a single GPU you have a merry dance of conflicting requests against memory. It's up to the memory system to regiment these requests to make efficient use of memory bandwidth/latency - particularly as the burst size of GPU memory is comparatively large (which is how GPU memory can muster such high bandwidth).
One part of the solution is to utilise multiple memory channels, e.g. 4 channels that are 64-bits wide. If you provide each channel with a set of queues and fuzzy logic you can construct something that's load-balanced with high throughput for all clients (processors).
The ring bus is a scalable architecture to connect clients to MCs. With the memory system distributed across the nodes of the ring bus there's no bottleneck in a single unit that determines how clients access memory.
So, in theory, multiple GPUs in NUMA can be made to cooperate in their use of available memory channels, no matter where the memory chips are attached. This cooperation is enforced by the distributed memory system, in theory - using the same techniques as deployed within a single GPU.
You can peruse the patent for the fully distributed ring bus based memory controller here:
http://forum.beyond3d.com/showthread.php?p=1165766#post1165766
Jawed
Correct me if I'm wrong, but currently games are rendered like MJPEG-encoded movie, instead of MPEG4-encoded one.And Lux_, uhhhh, I have no idea why you think it's even theoretically possible NOT to render his frame from scratch? (at least for the main framebuffer)
It's pretty much impossible to reuse 3D data from frame to frame. Lighting may change, effects may change, position will most likely change whether due to shifts in camera position or shifts in object position.Correct me if I'm wrong, but currently games are rendered like MJPEG-encoded movie, instead of MPEG4-encoded one.
Zum Einsatz kamen zwei Benchmarks: Zu allererst wird ein zwei minütiger Videoclip in 720p HD in ein ipod-taugliches Format transkodiert. Die konventionelle Methode, also die Kodierung via itunes und einem 20 US-Dollar teuren Mpg-2 Codec, nimmt dabei auf einem Quad-Core mit 3,0 GHz mehrere Stunden in Anspruch.
Unter Zuhilfenahme einer unbekannten GPU der neuen GT200-Generation (vermutlich GTX 280) via CUDA benötigt man für denselben Vorgang streng genommen nur Sekunden. Das Video wird in 5-facher Echtzeitgeschwindigkeit kodiert. Sprich ein Videoclip mit 30 Frames pro Sekunde wird mit 150 Frames pro Sekunde bearbeitet. Daher ist es auch möglich, mehrere Videos gleichzeitig in wenigen Minuten umzuwandeln. Die CPU wird dabei nur wenig belastet. Mehr als ein Core wird für das Processing nicht beansprucht.
Der zweite und mindestens genauso eindrucksvolle Test beschäftigt sich mit der Kodierung eines 1080i HD MPEG 2 Clips in Adobe Premier Pro. Der Clip wurde mit 25 Mbit Bitrate und 30 fps aufgenommen und soll nun in ein hi264 genanntes Format zur weiteren Bearbeitung exportiert werden. Während ein herkömmlicher Core 2 Duo E6400 das Video mit 2-6 fps kodiert, sprich in 1/6 Echtzeit rechnet, bringt es die GT200 auf 46 fps. Die GT200 rechnet also auch hier wieder schneller als Echtzeit.
I agree that as of today, there probably are limitations in current APIs and hardware, that make it not worth the effort - how to manage a some kind of a general data structure that keeps track of changes, how to sync it between GPU and CPU.It's pretty much impossible to reuse 3D data from frame to frame.
That German looks like a summary of the presenter in the video itself. $20 for an MPEG2 codec or whatever, Quad Core 3GHz Core 2 vs. GTX 280 = 20 mins to transcode a ~5min 720p NV clip vs. 5x-faster-than-realtime [150fps] via CUDA-coded transcoder, etc. (I thought the presenter said the QC 3GHz was only loaded to 2 cores, though, which is odd for video processing--I'm probably wrong.) Someone present asked if porting this to GT200 (from G80, IIRC) was a big deal, he said no. He ended by futzing with some HD home video (I'm guessing the 1080i h264 bit) of an NV guy's kids.
As for AMD, didn't they release some transcoder thingy for the X1000 series? I remember a beta was released that ran only on the CPU that was still pretty quick, but not much else.
Originally Posted by Kyle_Bennett View Post
AMD showed 3800 series doing it in December of last year transcoding video about 10X faster than quad core...
Zum Einsatz kamen zwei Benchmarks: Zu allererst wird ein zwei minütiger Videoclip in 720p HD in ein ipod-taugliches Format transkodiert. Die konventionelle Methode, also die Kodierung via itunes und einem 20 US-Dollar teuren Mpg-2 Codec, nimmt dabei auf einem Quad-Core mit 3,0 GHz mehrere Stunden in Anspruch.
Unter Zuhilfenahme einer unbekannten GPU der neuen GT200-Generation (vermutlich GTX 280) via CUDA benötigt man für denselben Vorgang streng genommen nur Sekunden. Das Video wird in 5-facher Echtzeitgeschwindigkeit kodiert. Sprich ein Videoclip mit 30 Frames pro Sekunde wird mit 150 Frames pro Sekunde bearbeitet. Daher ist es auch möglich, mehrere Videos gleichzeitig in wenigen Minuten umzuwandeln. Die CPU wird dabei nur wenig belastet. Mehr als ein Core wird für das Processing nicht beansprucht.
Der zweite und mindestens genauso eindrucksvolle Test beschäftigt sich mit der Kodierung eines 1080i HD MPEG 2 Clips in Adobe Premier Pro. Der Clip wurde mit 25 Mbit Bitrate und 30 fps aufgenommen und soll nun in ein hi264 genanntes Format zur weiteren Bearbeitung exportiert werden. Während ein herkömmlicher Core 2 Duo E6400 das Video mit 2-6 fps kodiert, sprich in 1/6 Echtzeit rechnet, bringt es die GT200 auf 46 fps. Die GT200 rechnet also auch hier wieder schneller als Echtzeit.
It's pretty much impossible to reuse 3D data from frame to frame. Lighting may change, effects may change, position will most likely change whether due to shifts in camera position or shifts in object position.
IMHO, the "RV770 advantages vs G92" listed in the slide look like desparation
[/URL]