Haswell vs Kaveri

I think we're probably missing the obvious: 40 EUs doesn't mean they increased other units by the same amount - I'd certainly agree they'd likely be very bandwidth limited if they did. However ALUs are the least bandwidth-hungry part of modern GPUs (doesn't mean they can't increase it somewhat, but nowhere near as much) so if you've got die area to spare it's nearly always a good way to increase peak performance for ALU-heavy workloads.

Depends on what is the status of their interposer efforts. If they come online in time, then 40EU's could look VERY good.
 
I think we're probably missing the obvious: 40 EUs doesn't mean they increased other units by the same amount - I'd certainly agree they'd likely be very bandwidth limited if they did. However ALUs are the least bandwidth-hungry part of modern GPUs (doesn't mean they can't increase it somewhat, but nowhere near as much) so if you've got die area to spare it's nearly always a good way to increase peak performance for ALU-heavy workloads.

That would also end up benefitting GPU compute more, correct? If they were worried about falling too far behind in potential future compute workloads that would be a good area to increase drastically I'd imagine.

It'll be interesting if Intel intends to just keep graphics performance "good enough" or if they'll actually attempt to surpass AMD in graphics when it comes to gaming.

Compute on the other hand has the potential to affect far more of the PC performance ecosystem (assuming GPU compute eventually takes off in the consumer space) than just graphics alone.

Regards,
SB
 
I think we're probably missing the obvious: 40 EUs doesn't mean they increased other units by the same amount - I'd certainly agree they'd likely be very bandwidth limited if they did. However ALUs are the least bandwidth-hungry part of modern GPUs (doesn't mean they can't increase it somewhat, but nowhere near as much) so if you've got die area to spare it's nearly always a good way to increase peak performance for ALU-heavy workloads.
Quite disagree. SnB-EX has 8 cores which require more bandwidth. I don't think it will be a problem for Haswell to have a better ringbus, compared to that of IVY and SnB. If what you are talking about concern about memory bandwidth rather than the Ring, I think at that time DDR3 will be at a higher frequency.
 
Could Near Threshold Voltage be used in Haswell?

Anandtech said:
Intel is also sharing details of a 22nm NTV SIMD engine for use in processor graphics. Given Intel's new focus on improving processor graphics performance, the fact that we're seeing more Intel driven research around GPU technologies isn't surprising.

Also there is a slide about 162GFLOPs/W 32nm FPU!? :oops: edit: @6-Bit accuracy...
 
Last edited by a moderator:
The description of the research indicates NVT isn't ready to be productized, which means Haswell silicon wouldn't have it since it has already been taped out.

The dramatic power savings shown require a significant sacrifice in circuit performance, so the likely first places we'd see it applied would be hardware that can be used at low speeds. The high-performance latency-optimized cores don't look like they can take advantage of it as readily, because they have to operate in the turbo region.
 
What if you wanted to push the low voltage states of the CPU to NTV levels? Could a single physical design feasibly span that wide voltage range at advanced nodes?
 
I'd be a lot more excited about the NTV SIMD variable floating point precision if it went 128bit->64bit->32bit...
 
What if you wanted to push the low voltage states of the CPU to NTV levels? Could a single physical design feasibly span that wide voltage range at advanced nodes?

I'm not sure that can be done easily within the same units.

The description of NTV design indicates that the transistors, architecture, and circuits are physically different in order to compensate for challenges unique to operating at that low a voltage.

Transistors doped to work at near threshold aren't going to behave the same when jacked up to turbo levels, and the circuits tuned to handle problems at NVT may not be needed or could pose an impediment to reaching normal voltage clocks.

That doesn't rule out having NVT on the same chip, and possibly in specific parts of a single core, but I don't know if areas of silicon that are tuned to operate at NVT (mW at MHz) will scale to turbo (tens of W at GHz) too well.
Possibly, the chip can "switch gears" and switch to NTV areas, with latency costs.
 
To me, NVT looks like Intels approach for tackling Exascale.

A tiny bit of Google suggests that your expectation is mirrored by many over the last several months. However, I do not agree -- this technology tackles the opposite end of the spectrum in my opinion.

Exascale deals in epic throughput and maximum capacity; NTV deals with how to absolutely minimize idle power draw. I can certainly agree they would be useful in the same space ( 2KW machine that could idle at 10W), but the reality is that an "exascale" computer wouldn't specifically care about (and thus may not be specifically engineered to deliver) low-power idle characteristics.
 
If you care about idle, then power gating effectively solves it for any core as far as exascale is concerned.

Exascale computing is about reaching new levels of throughput, coupled with a brutal power constraint given the peak numbers desired.

The demonstrated NTV pentium was not idling. It wasn't performing monstrous levels of computation, but it was active.
Intel's other papers on the topic include ALUs and specialized hardware units.
 
But you don't reach exascale in idle, do you? At least when the exa is not the scale of core count.

@Albuquerque
:) Nice to see that the google/internetz line-o-truths seems to converge with mine for once.

From what it looks like, you get the most benefits from NVT technology for not all-to-powerful cores, probably not even keeping up with what we're seeing in mobile devices today. That makes it basically useless for desktop unless there's going to be a major major paradigm shift on the horizon.

For embarrasingly parallel stuff though, you could get a healthy net win out of distributing the work even further - probably even though there's going to be much more data movement compared to fewer but highly powerful cores.
 
The demonstrated NTV pentium was not idling. It wasn't performing monstrous levels of computation, but it was active.
Intel's other papers on the topic include ALUs and specialized hardware units.
I get what Exascale is, but the examples of NTV were for low power modes. When the device needed more performance, the voltage (and thus, power draw) scaled appropriately. That's why I do not see a direct link to NTV and any sort of "Exascale" delivery.

Does that make NTV any less interesting or useful? Absolutely not. :) I look forward to having some epic version of SpeedStep that can manage multiplier, bus speeds, unit / core gating, and the entire active power curve in one big cohesive glob.
 
The performance/watt curve is what Exascale would find interesting.
The Pentium they presented was a proof of concept that they could make a fully functional chip run at those voltages.

Pushing an NTV chip to turbo speeds is likely a net loss. The circuits aren't tuned for it, and will probably consume more power than one that targets those typical speeds.
 
I guess I still don't see it. The radically-altered Pentium they showed, when running at "NTV levels", was ridiculously low performance. I guess you might argue that a metric ton of these processors, all simultaneously running in a Beowulf cluster (LOL) might be able to pump out a few TF, but the supporting power for connecting that many chips would be insane

I guess conceptual problem I'm having is this: NTV provides far lower-powered X86 processing. I agree that it indeed significantly increases flops/watt per chip, but only when operated at very, VERY low power (and thus, in absolute terms, low performance) modes. Hell, the Atom does better than any of the Core architecture for flops/watt, but I don't (yet) see anyone making supercomputers out of them. If we wanted a race to ultimate flops/watt, we'll be doing custom SoC stuff.

THoughts?
 
Back
Top