Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Showing results 1 to 25 of 500
Search took 2.02 seconds.
Search: Posts Made By: Nick
Forum: 3D Architectures & Chips 10-May-2013, 00:58
Replies: 33
Views: 2,838
Posted By Nick
Designing for 4 GHz doesn't necessarily mean it...

Designing for 4 GHz doesn't necessarily mean it has to run at 4 GHz. Haswell is a 4 GHz design but there will be chips based on it that are suitable for tablets.

I am proposing to unify mainstream...
Forum: 3D Architectures & Chips 08-May-2013, 16:33
Replies: 33
Views: 2,838
Posted By Nick
I disagree. By execution time, most code simply...

I disagree. By execution time, most code simply has no DLP that could be exploited by SIMD. In a non-MT design, the SIMD units will sit idle most of the time, no way around that. Widening the units...
Forum: 3D Architectures & Chips 08-May-2013, 01:48
Replies: 33
Views: 2,838
Posted By Nick
While I'm sure that would further increase the...

While I'm sure that would further increase the utilization, the real question is does it improve performance/Watt? Intel removed Hyper-Threading support from Silvermont, probably because it doesn't...
Forum: 3D Architectures & Chips 08-May-2013, 01:32
Replies: 33
Views: 2,838
Posted By Nick
AMD is trying to promote heterogeneous computing....

AMD is trying to promote heterogeneous computing. They're not going to strengthen the homogeneous throughput computing resources even if it's a performance bottleneck. They really want developers to...
Forum: 3D Architectures & Chips 08-May-2013, 01:23
Replies: 33
Views: 2,838
Posted By Nick
I don't have strong opinions about these things...

I don't have strong opinions about these things yet. I first wanted to get a sense of whether the idea of two SIMD clusters at half frequency is even feasible at all. I do assume the single integer...
Forum: 3D Architectures & Chips 07-May-2013, 19:37
Replies: 33
Views: 2,838
Posted By Nick
Intel claims...

Intel claims (http://cache-www.intel.com/cd/00/00/51/61/516194_516194.pdf) that the front-end only consumes a few percent of total core power these days, with FP execution taking up to 75% in...
Forum: 3D Architectures & Chips 07-May-2013, 18:15
Replies: 33
Views: 2,838
Posted By Nick
Here is your first fault. You are afraid about...

Here is your first fault. You are afraid about multi-threading, Multi-threading is a very good thing in throughput computing. It's very cheap and increases utilization well.[/quote]
I am not...
Forum: 3D Architectures & Chips 07-May-2013, 15:40
Replies: 33
Views: 2,838
Posted By Nick
This is the unification of a CPU and integrated...

This is the unification of a CPU and integrated GPU. The silicon that used to be spent on the GPU is distributed between the modules. So there's no extra cost. Also, the utilization of the SIMD units...
Forum: 3D Architectures & Chips 07-May-2013, 15:22
Replies: 33
Views: 2,838
Posted By Nick
That's not all too confidence inspiring....

That's not all too confidence inspiring. Programing for GPUs is notoriously unforgiving.[/quote]
This is the unification of a CPU and GPU. So having the same logical SIMD width as a GPU is not a bad...
Forum: 3D Architectures & Chips 06-May-2013, 20:02
Replies: 33
Views: 2,838
Posted By Nick
It's the same width as NVIDIA's warp size.

It's the same width as NVIDIA's warp size.
Forum: 3D Architectures & Chips 06-May-2013, 16:09
Replies: 33
Views: 2,838
Posted By Nick
Bulldozers are faster in reverse

It appears that the biggest concern about unifying the CPU and (integrated) GPU is that sequential scalar workloads demand designing for ~4 GHz operation while having wide SIMD units for graphics and...
Forum: 3D Architectures & Chips 18-Mar-2013, 17:20
Replies: 144
Views: 8,139
Posted By Nick
Thanks for those details. So the solution is some...

Thanks for those details. So the solution is some more convergence. However, note that only part of the issues would be addressed by having closeby latency optimized scalar cores handle task...
Forum: 3D Architectures & Chips 18-Mar-2013, 04:49
Replies: 144
Views: 8,139
Posted By Nick
If 20 GPU SIMD-units are executing the same...

If 20 GPU SIMD-units are executing the same program, then you have a virtually very wide SIMD. But you don't need to, you can also run different programs per unit. It's flexible. A traditional CPU...
Forum: 3D Architectures & Chips 18-Mar-2013, 04:33
Replies: 144
Views: 8,139
Posted By Nick
As I've discussed before...

As I've discussed before (http://beyond3d.com/showpost.php?p=1718735&postcount=34), CPUs can achieve high throughput with fewer threads and registers by having lower latencies and by hiding them with...
Forum: 3D Architectures & Chips 17-Mar-2013, 21:37
Replies: 144
Views: 8,139
Posted By Nick
No need to call this a rant as far as I'm...

No need to call this a rant as far as I'm concerned. I fully agree it should be replaced with something better. I'll look around for potential options. Thank you.
Forum: 3D Architectures & Chips 17-Mar-2013, 21:26
Replies: 144
Views: 8,139
Posted By Nick
That's only because Intel already mentioned...

That's only because Intel already mentioned AVX-1024 (http://software.intel.com/sites/default/files/m/d/4/1/d/8/Intel_AVX_New_Frontiers_in_Performance_Improvements_and_Energy_Efficiency_WP.pdf) in...
Forum: 3D Architectures & Chips 17-Mar-2013, 14:20
Replies: 144
Views: 8,139
Posted By Nick
The benefits would be comparable to the...

The benefits would be comparable to the unification of vertex and pixel processing: performance, cost, and new applications. For a long time the GPU's unification didn't make sense from a performance...
Forum: 3D Architectures & Chips 17-Mar-2013, 13:14
Replies: 144
Views: 8,139
Posted By Nick
But I don't have to formulate such a corner case...

But I don't have to formulate such a corner case to see OpenCL applications run faster (http://images.anandtech.com/graphs/graph5835/46687.png) on a CPU than on a heterogeneous architecture. In other...
Forum: 3D Architectures & Chips 17-Mar-2013, 05:01
Replies: 144
Views: 8,139
Posted By Nick
What bandwidth deficit?[/QUOTE] If you are...

What bandwidth deficit?[/QUOTE]
If you are going to persist in being obtuse *again*, then we can't really have a conversation. So you can either answer the question you damn well know I was asking...
Forum: 3D Architectures & Chips 17-Mar-2013, 04:36
Replies: 144
Views: 8,139
Posted By Nick
Haswell's FMA latency is 5 cycles. So how can you...

Haswell's FMA latency is 5 cycles. So how can you say the GPU's ALU latency isn't worse when Fermi takes 18 cycles? And that's without taking clocks into account. Also, even if the effective...
Forum: 3D Architectures & Chips 16-Mar-2013, 18:47
Replies: 144
Views: 8,139
Posted By Nick
Indeed we should keep our eyes open for useful...

Indeed we should keep our eyes open for useful instructions of this kind. All I'm saying is that for a unified architecture it probably doesn't make sense to go beyond what the latest GPUs do.
...
Forum: 3D Architectures & Chips 16-Mar-2013, 14:22
Replies: 144
Views: 8,139
Posted By Nick
Those dependencies are all checked for. If...

Those dependencies are all checked for.

If there are not too many of them, and the GPU runs out of parallelism within a frame, it will render things from the next frame(s). If there are too many...
Forum: 3D Architectures & Chips 16-Mar-2013, 08:11
Replies: 144
Views: 8,139
Posted By Nick
You cant because you dont know what future frames...

You cant because you dont know what future frames are going to be[/QUOTE]
Sure you do. Let's say the application makes a draw call for a red triangle, a present call, then a draw call for a green...
Forum: 3D Architectures & Chips 16-Mar-2013, 07:54
Replies: 144
Views: 8,139
Posted By Nick
I'm not sure I get this. Just a simple example. ...

I'm not sure I get this. Just a simple example.
Latency oriented: compute a certain algorithm on a single or just on a few data elements as fast as possible
Throughput oriented: compute a certain...
Forum: 3D Architectures & Chips 15-Mar-2013, 19:38
Replies: 144
Views: 8,139
Posted By Nick
I think you mix something up here. Decoupling the...

I think you mix something up here. Decoupling the TMUs from the ALUs makes it more flexible and also enables better hiding the latency of a memory accesses behind the arithmetic instructions of other...
Showing results 1 to 25 of 500

 
Forum Jump

All times are GMT +1. The time now is 08:59.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.