Recent content by RacingPHT

R
How do PowerVR SGX handle ddx/ddy?

Thanks Rys, I'll be very interested to know what product you are talking about. Sounds like tessellation was handled by shader unit. Sounds like you are not streaming out the mesh parameter buffers.
- RacingPHT
- Post #21
- Aug 9, 2011
- Forum: Mobile Graphics Architectures and IP
R
How do PowerVR SGX handle ddx/ddy?

Yeah. For small triangles, there should be some kind of fragment merging or single pixel thread, otherwise it lose a lot of efficiency. I think for desktop, the better way would be quad fragment merging. Or maybe a fancier word "coalescing". Yeah to me the only logical way is post...
- RacingPHT
- Post #16
- Aug 4, 2011
- Forum: Mobile Graphics Architectures and IP
R
How do PowerVR SGX handle ddx/ddy?

Maybe AMD/NVidia still save area because the cost of decode and fetch are amortized with more ALUs attached and it's the point of SIMD. I'm interested to this because seems to me that if you could let threads in a group go different path, it's not that difficult that they could be different...
- RacingPHT
- Post #14
- Aug 3, 2011
- Forum: Mobile Graphics Architectures and IP
R
How do PowerVR SGX handle ddx/ddy?

http://www.imgtec.com/powervr/insider/docs/PowerVR%20Series5%20Graphics.SGX%20architecture%20guide%20for%20developers.1.0.8.External.pdf 4.1.2. Thread scheduling Each USSE execution unit in a given SGX graphics core has its own thread scheduler. Each scheduler manages 16 threads, 4 of which...
- RacingPHT
- Post #12
- Aug 3, 2011
- Forum: Mobile Graphics Architectures and IP
R
How do PowerVR SGX handle ddx/ddy?

Didn't the document of Imagination said their USSE units execute threads in a 4x4 way? 4 threads in a group is the basic unit of context switching, and there's max 4 thread groups running on a USSE processor. So first of all, I don't think they can run shaders with granularity of 1. Second, I...
- RacingPHT
- Post #9
- Aug 3, 2011
- Forum: Mobile Graphics Architectures and IP
R
How do PowerVR SGX handle ddx/ddy?

I don't worry about early-Z GPUs. I know they are doing quads, they are just free to do whatever they want and just mask out unused fragments. It's the zero overdraw statements actually confused me, and many other things lead me thinking that PowerVR might not be quad-pixel GPUs. So you...
- RacingPHT
- Post #7
- Aug 3, 2011
- Forum: Mobile Graphics Architectures and IP
R
How do PowerVR SGX handle ddx/ddy?

That's an very interesting statement. You are actually suggesting there's an ability to switch between quad mode and pixel mode. IMO, OpenGL ES1 does not have dependant texture fetch and shaders. So, it sounds perfectly fine to me if they(or maybe you guys) run the pixel pipes not in a quad...
- RacingPHT
- Post #6
- Aug 3, 2011
- Forum: Mobile Graphics Architectures and IP
R
How do PowerVR SGX handle ddx/ddy?

Well IMO the documents said the USSE1/2 operates on 4 threads granularity(pretty much like AMD's wavefronts but so much smaller). So it's very possible that the SGX operates 4 threads(maybe fragments) in an "unrolled loop". That's pretty different from many desktop GPUs. For instance, some GPUs...
- RacingPHT
- Post #5
- Aug 3, 2011
- Forum: Mobile Graphics Architectures and IP
R
How do PowerVR SGX handle ddx/ddy?

I'm a bit confused as the powervrs are claming zero-overdraw. If this is true, the chip has to operate on per-pixel level instead of per-quad level that's pretty much standard for years. And according to the configuration, some powerVR seems using 2 pixel shaders/TMUs which does not seem natural...
- RacingPHT
- Thread
- Aug 2, 2011
- Replies: 20
- Forum: Mobile Graphics Architectures and IP
R
22 nm Larrabee

I think it's not round to nearest even. I'm not sure, but I have a little program that does 2 type of precision tests: Fetched texel precision, Texture coordinate precision. Here the my source code(6kb) don't know if it helps: http://www.freefilehosting.net/gpuprecisiontest Textue...
- RacingPHT
- Post #150
- May 27, 2011
- Forum: Architecture and Products
R
22 nm Larrabee

IMR can't copy because IMR just read the raw untessellated mesh once, but for TBDR it's three times, one for raw geometry, 2 for "compressed" W/R. Unless the compressed size is 0, it's higher bandwidth for TBDR. I can't see how it could be an advantage. What kind of lossless compression is...
- RacingPHT
- Post #138
- May 25, 2011
- Forum: Architecture and Products
R
22 nm Larrabee

Hi Nick, I tested SwiftShader a bit and found that point sampling tex2D() a 256x1 ARGB8 texture with (1.f / 256, 0) dosen't return the 1th texel, but 0th. There seems to be a slight sampling offset of -1.0/65536.
- RacingPHT
- Post #137
- May 25, 2011
- Forum: Architecture and Products
R
22 nm Larrabee

Obviously you know software renderer much much better, so I'm just been curious. The FIFO thought came across my mind because of this paper: http://graphics.stanford.edu/papers/gramps-tog/gramps-tog08.pdf Good to know. I'll play with SwiftShader for a while and maybe I'll tell you what...
- RacingPHT
- Post #124
- May 24, 2011
- Forum: Architecture and Products
R
FULL-HD GPU raytracing demo<dx9>

Not good because the position itself is not smooth enough. Just imagine a car body gets fixed right after an big accident, that's what you expect to see in the demo when the normal is interpolated. Good argument. Now we have 2 reasons to do raytracing :)
- RacingPHT
- Post #15
- May 23, 2011
- Forum: Rendering Technology and APIs
R
FULL-HD GPU raytracing demo<dx9>

Hmm, but the flops number is far from 100x. Assuming a 3Ghz Core 2 Duo: 3(Ghz) * 2(Cores) * 4(SSE width) * 2(Math issue rate) = 48 GFlops. Assuming a 1.5Ghz 96sp GPU: 1.5(Ghz) * 96(sp) * 2(mad) = 288 GFlops. In reality, the NVidia is capable is issue more muls. I tested the co-issue shows...
- RacingPHT
- Post #14
- May 23, 2011
- Forum: Rendering Technology and APIs