If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#1001 |
|
Senior Member
|
ECC does not affect compute performance on Tahiti.
__________________
I speak only for myself. |
|
|
|
|
|
#1002 |
|
Senior Member
|
Not even the memory performance?
|
|
|
|
|
|
#1003 | |
|
Senior Member
Join Date: May 2008
Posts: 80
|
Quote:
@rpg.314 ECC should just increase the latency slightly afaik, but that's what GPUs suppose to hide, so it could be slower, but it shouldn't be visible in normal cases. but AMD just released some 1TFlop+ GPU also: http://www.anandtech.com/show/6025/r...-up-to-gtx-680 I still think the Phi is nothing to be disappointed bout, I'd love to have some x86 cpus with that power and nice instruction set. |
|
|
|
|
|
|
#1004 | |
|
Member
Join Date: Feb 2010
Posts: 173
|
Quote:
|
|
|
|
|
|
|
#1005 | |
|
Senior Member
Join Date: Jan 2010
Location: Hamburg, Germany
Posts: 1,017
|
Quote:
Memory performance drops, peak throughput does not. Edit: There was already a new page.
__________________
x: RCP_sat R2.x, R1.y y: RCP_sat ____, R1.y z: RCP_sat ____, R1.y |
|
|
|
|
|
|
#1006 |
|
Senior Member
Join Date: Sep 2010
Posts: 1,055
|
Guys, one question if you don't mind.
So, what is the purpose of this thingie? Only supercomputers, right? We don't expect Intel can compete with AMD and NV, regarding drivers, DirectX support, etc. My point being is that they will never offer a gaming card based on this. |
|
|
|
|
|
#1007 | ||
|
Senior Member
Join Date: May 2008
Posts: 80
|
Quote:
I wouldn't be surprised if they ripped out all texture units out of the MIC, not even opencl might work properly (I mean, technically yes, but usually texture sampling as optimization would slow it down,as it would be emulated). if history repeat, it will end like the Itanium. Quote:
like I said the page before, I hope skylake will have all the LRB juice in it. 512bit SIMD on 4 consumer cores with ~4GHz might end up with 1TFlop SP. |
||
|
|
|
|
|
#1008 |
|
Senior Member
|
Putting wide SIMD in a 3GHz, quad issue OoO core will defeat the entire purpose of SIMD. You want a simple in order core to go with the vector units.
|
|
|
|
|
|
#1009 |
|
Senior Member
Join Date: May 2008
Posts: 80
|
4x float or 8x float SIMD is ok, but a "wide" 16x float SIMD is defeating it's purpose? I cannot really come up with any idea why you might think that, can you elaborate?
|
|
|
|
|
|
#1010 | |
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
Quote:
But I was blatantly wrong to think that Knights Corner's performance was for SP. Intel previously announced Larrabee to reach 1 TFLOP as well. That was for SP, hence the confusion. |
|
|
|
|
|
|
#1011 |
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
As far as I know the "purpose of SIMD" and out-of-order execution are completely orthogonal. We have GPUs with single-issue SIMD, dual-issue SIMD, VLIW SIMD, and we have CPUs with in-order or out-of-order SIMD. And the choices seem unrelated to the width of the vectors. It would actually make sense for wider vectors to be paired with more dynamic scheduling since it would increase efficiency at a lower relative cost.
|
|
|
|
|
|
#1012 | |
|
Regular
|
Quote:
With a heterogenous solution, say Ivy Bridge and MIC, you can have a mix of SIMD code and scalar code without half your hardware idling ...
__________________
Cinematic is the new streamlined. |
|
|
|
|
|
|
#1013 | |
|
Senior Member
Join Date: May 2008
Posts: 80
|
Quote:
It's still a x86/CISC instruction set, your SIMD code does not work on registers only, you address operants straight from memory. I think there is way more the ooo units will have to work on. Especially with a lot of cores that share/trash the same memory controller. |
|
|
|
|
|
|
#1014 | |
|
Senior Member
Join Date: May 2008
Posts: 80
|
(I've split my reply into two, for better quoting, and this one could be kind of off-topic?)
Quote:
I sadly cannot dig out any MIC benchmark, allow me to project this to some heterogenous/homogenous OpenCl benchmark (showing Sandra 2012): http://www.tomshardware.com/reviews/...0k,3181-6.html you see 1. ("homogenous") Sandy bridge has 75|165 MPix/s of compute power. 2. (heterogenous) running on the IvyBridge GPU has 10|251 MPix/s of compute power, taking 75% of space on die of the CPU (regarding: http://www.chip-architect.com/news/2...es_Sandys.html ) if you'd use the same space to add 3 cores (75%, although mal balanced), you'd end up with estimated 131|288 MPix/s while I agree that this is not a fair compare, as the GPU has some minor space also used for fixed function HW, at the same time I argue it's not a nicely vectorized and optimized code for cpu SIMD.it's also a compare of pure compute power that does not reflect what you might gain with smarter algorithms that would be a win for CPU but bite the GPU. below you also see the luxmark, which is more of a real world test. The CPU versions seem to be faster, If OpenCL would run on GPU and CPU at the same time, it would lead to the best results in that case, but there is no advantage. you could rather build a homogenous system, running with natively optimized binaries and it would probably be even more of a win. Yes, I know about the 7970 (or Kepler) I also use heterogenous systems rather than homogenous, but simply 'cause I can get a 7950, overclock it to 1.2Ghz and pay ~300euro, while I would have just a 6core 3930k for ~500euro. Buying a homogenous system with enough power is irrational from the price point of view. If I had the free choice to get a 7970 or 250W of haswell cores (250W/40W*4Cores->25Cores -> ~1.5DP/3.0SP TFlop/s and 350GB/s mem), I would choose the 2nd one. Sadly the Xeon Phi seems to be also out of question because of the probably high price tag. |
|
|
|
|
|
|
#1015 |
|
Senior Member
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,636
|
Wow, you think a Haswell core will use 10W at 3.75GHz? I really doubt that.. 3.7GHz base frequency Ivy Bridge Xeon is 87W TDP, that is of course with GPU fused off. I expect the number to be closer to 20W per core than 10W.
You might think that because i7-3770K (3.5GHz base speed) is rated for 77W TDP that the cores only use 40W because the IGP can use such a big chunk of that. Probably in practice the IGP can't use anywhere close to its full thermal budget when all four cores are running full-tilt. Sure, the other integrated stuff uses some power, but at least some of that has to scale with core count (L3, not just capacity but complexity, memory channels to get your huge bandwidth, etc). Of course it's kind of moot since I doubt Haswell would scale to 25 cores anyway. And part of that big price tag is justified because these chips would be huge. Tahiti and Kepler are around 2x larger than IB, can you imagine how big your hypothetical 25 core Haswell would be? I doubt it could even be manufactured.. |
|
|
|
|
|
#1016 | |
|
Junior Member
Join Date: Feb 2012
Posts: 57
|
Quote:
- there's really not a big enough market for these things for the amount of effort Intel are putting in - the potential profits are tiny compared to Intels CPU Profits - whereas NV is piggy-backing it on the GPU designs, and for NV this area could represent a reasonable profit opportunity. - but for Intel, really, Haswell, or extended Haswell could do just as well So, the whole purpose is to stop NV getting a foothold at the top-end - which would allow them more profits, with-which to hire more, better engineers... |
|
|
|
|
|
|
#1017 |
|
Senior Member
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,636
|
I don't think it can be denied that Intel originally wanted Larrabee to be a real high end graphics card, and who knows what they had their sights set on with this.. perhaps even consoles.
Designing it specifically for the HPC market may not be a sensible investment, but salvaging what they can from an already established design surely is. Even if there's additional design cost in migrating it to 22nm and scaling it up a bit. |
|
|
|
|
|
#1018 | |
|
PM
Join Date: Dec 2002
Posts: 1,381
|
Quote:
__________________
// |
|
|
|
|
|
|
#1019 | |
|
Senior Member
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,636
|
Quote:
The processor rapso described would also need much more space for the very wide memory controllers. |
|
|
|
|
|
|
#1020 | ||
|
PM
Join Date: Dec 2002
Posts: 1,381
|
Quote:
Quote:
__________________
// |
||
|
|
|
|
|
#1021 | |
|
Senior Member
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,636
|
Quote:
Can you break that down for me? |
|
|
|
|
|
|
#1022 |
|
PM
Join Date: Dec 2002
Posts: 1,381
|
Believe what you want.
__________________
// Last edited by ninelven; 27-Jun-2012 at 05:31. Reason: Just don't care... |
|
|
|
|
|
#1023 | |
|
Senior Member
Join Date: Jun 2003
Posts: 2,571
|
Quote:
so what you are saying is that you want 10 cores that have no viable way of talking to the outside world...
__________________
Aaron Spink speaking for myself inc. |
|
|
|
|
|
|
#1024 |
|
Senior Member
Join Date: Jun 2003
Posts: 2,571
|
It isn't a matter of believing what someone wants. You cannot throw out the area for the L3 and not include it as the area actually covers the interconnect functionality between the cores and cores to the rest of the system.
__________________
Aaron Spink speaking for myself inc. |
|
|
|
|
|
#1025 |
|
Senior Member
Join Date: Dec 2004
Location: Toulouse
Posts: 4,223
|
even the celeron includes L3, 2MB for two cores (tip : the current celeron is incredibly fast, it's about like a core2duo E8500)
an updated Bulldozer will work without L3, as in Trinity. but it's a fat design anyway and you now end up with big L2, which *bridge and similar don't have these days an ivy bridge core has half the L2 of an Atom core. |
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|