Intel i7 performance suprise.

adrianxw
adrianxw
Joined: 21 Feb 05
Posts: 242
Credit: 322654862
RAC: 0
Topic 194628

Example

I was looking at some of my results and was suprised by the example I quote above. The faster i7 took more then twice as long as my Core 2 quad. Now the i7´s hyperthreading certainly accounts for some of that but that result was a real suprise.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 692131122
RAC: 68790

Intel i7 performance suprise.

Quote:

Example

I was looking at some of my results and was suprised by the example I quote above. The faster i7 took more then twice as long as my Core 2 quad. Now the i7´s hyperthreading certainly accounts for some of that but that result was a real suprise.

Hmmm...maybe this particular i7 has slow memory or somethinmg like that, because there are other i7 here that are much faster, e.g. http://einsteinathome.org/host/1104966/tasks. It's best to compare runtime for the ABP1 jobs (those that get 250 credits) because their runtime does not depend on the individual workunit. The two i7 have rather differnt runtimes for those units (ca 30k sec vs 20k sec). This can hardly be explained by overclocking alone, can it?

Bikeman

rroonnaalldd
rroonnaalldd
Joined: 12 Dec 05
Posts: 116
Credit: 537221
RAC: 0

RE: Hmmm...maybe this

Message 95531 in response to message 95530

Quote:

Hmmm...maybe this particular i7 has slow memory or somethinmg like that, because there are other i7 here that are much faster, e.g. http://einsteinathome.org/host/1104966/tasks. It's best to compare runtime for the ABP1 jobs (those that get 250 credits) because their runtime does not depend on the individual workunit. The two i7 have rather differnt runtimes for those units (ca 30k sec vs 20k sec). This can hardly be explained by overclocking alone, can it?

Bikeman


The problem should be named as Vista. This piece of software is known for causing performance impacts and many other things.

Another point would be. The comparison in single-core performance between a Core i7 and a C2Q wins the C-i7 only with an average of 5-10% by same clockfrequency! But the troughput of all 8 cores would be much higher than the 4 cores of a C2Q.
A Core i7 with HT-activated should need 1.5x of time for a unit that the same host would need without HT, but with HT you are crunching 8 units parallel and in accumulation over all units, you would calculate more units in a given timeframe.

Paul D. Buck
Paul D. Buck
Joined: 17 Jan 05
Posts: 754
Credit: 5385205
RAC: 0

RE: Example I was looking

Quote:

Example

I was looking at some of my results and was suprised by the example I quote above. The faster i7 took more then twice as long as my Core 2 quad. Now the i7´s hyperthreading certainly accounts for some of that but that result was a real suprise.


Another contributing factor is what was the task "paired" with at run time... there is a fairly old Trak ticket out there asking that UCB put some effort into making the resource scheduler a little smarter when it schedules tasks so that the most efficient pairings are used. As an example running Prime Grid tasks alongside EaH tasks so that the FP Heavy EaH task would not block the PG tasks as much and vice versa...

Sadly, UCB has taken the stance that this is not important and in fact has been drifting towards making the "mix" less and less "interesting" which means that you are going to see more and more contention and less and less efficiency as versions increase.

This is especially sad as it looks like HT is going to be standard now across almost all of the Intel product line from the i5 to the coming i9 ...

ML1
ML1
Joined: 20 Feb 05
Posts: 347
Credit: 86320566
RAC: 341

RE: ... Sadly, UCB has

Message 95533 in response to message 95532

Quote:

... Sadly, UCB has taken the stance that this is not important and in fact has been drifting towards making the "mix" less and less "interesting" which means that you are going to see more and more contention and less and less efficiency as versions increase.

This is especially sad as it looks like HT is going to be standard now across almost all of the Intel product line from the i5 to the coming i9 ...


That indeed might sometimes be bad for the HT CPUs, but that should give a performance boost for true physical multiple cores to reduce memory bandwidth contention.

Out of curiosity, any comment as to why UCB has gone that route?

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

Paul D. Buck
Paul D. Buck
Joined: 17 Jan 05
Posts: 754
Credit: 5385205
RAC: 0

RE: That indeed might

Message 95534 in response to message 95533

Quote:

That indeed might sometimes be bad for the HT CPUs, but that should give a performance boost for true physical multiple cores to reduce memory bandwidth contention.

Out of curiosity, any comment as to why UCB has gone that route?


In that this is a separate and equally compelling issue it too is not addressed. Or to put it another way, pairing up memory intense tasks, or Memory bandwidth tasks, or the previously mentioned FP vs INt tasks you would think that some attention would be paid to these issues.

As to the reason?

My simplest guess is that, based on the evidence, few of the developers (if any) or the people that the developers listen to are running systems quad or better. I saw behaviors years ago with my first Quad (a dual Xeon with HT) that to some extent still happens today. I avoid it mostly by setting TSI to 12 hours (720 Minutes) but other instabilities still exist.

Again, partly it is also the "if I squint hard enough I cannot see it so it does not exist ..." because seeing the issues would mean that they would have to address them. So, the mind model in the design is still of a single processor working on a single stream of tasks. The limitations of this model are visible on a 4 core but you have to look closely and watch the behaviors over time. These artifacts are far more noticeable on 8 or more processing elements.

Sadly the voices of some is far louder than others and years pass ...

The instance here is with the chaotic behavior that Richard Haslegrove noted and prior to that I had ... and part of the problem is that internal routines are run far more often than they need to be (Schedule and Enforce) with the main justification for this practice being a real-time project that has been defunct for a long time ... most interesting to me with this justification is that BOINC was purpose built to be a batch oriented system making it completely unsuitable for real-time processing ...

Oh, and the second justification was that repetitive running of routines will not result in chaotic behavior... because the rules are the same don't you see ... sadly this ignores decades or research in Fractals and Chaos where it has been demonstrated over and over that even simple systems can easily act in chaotic ways ...

Anyway, a failure of imagination ...

DJStarfox
DJStarfox
Joined: 25 Mar 07
Posts: 10
Credit: 2484242
RAC: 120

RE: Hmmm...maybe this

Message 95535 in response to message 95530

Quote:
Hmmm...maybe this particular i7 has slow memory or somethinmg like that, because there are other i7 here that are much faster, e.g. http://einsteinathome.org/host/1104966/tasks. It's best to compare runtime for the ABP1 jobs (those that get 250 credits) because their runtime does not depend on the individual workunit. The two i7 have rather differnt runtimes for those units (ca 30k sec vs 20k sec). This can hardly be explained by overclocking alone, can it?

Not sure what you people are talking about. My i7 can crunch the S6 WU at about 13k sec each. BTW, I'm running on Linux x86_64.

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15872262
RAC: 0

RE: RE: Hmmm...maybe this

Message 95536 in response to message 95535

Quote:
Quote:
Hmmm...maybe this particular i7 has slow memory or somethinmg like that, because there are other i7 here that are much faster, e.g. http://einsteinathome.org/host/1104966/tasks. It's best to compare runtime for the ABP1 jobs (those that get 250 credits) because their runtime does not depend on the individual workunit. The two i7 have rather differnt runtimes for those units (ca 30k sec vs 20k sec). This can hardly be explained by overclocking alone, can it?

Not sure what you people are talking about. My i7 can crunch the S6 WU at about 13k sec each. BTW, I'm running on Linux x86_64.

Your computers are hidden, so nobody can verify what you are writing about. ;-)
And there is an influence of about 5% depending on what mixture of E@H-apps is running on an i7.

DJStarfox
DJStarfox
Joined: 25 Mar 07
Posts: 10
Credit: 2484242
RAC: 120

RE: Your computers are

Message 95537 in response to message 95536

Quote:
Your computers are hidden, so nobody can verify what you are writing about. ;-)
And there is an influence of about 5% depending on what mixture of E@H-apps is running on an i7.

Here's an example task that looks typical:
http://einsteinathome.org/task/147343691

I run 4 other projects, so the mix is quite random. RAM is DDR3 1600MHz @ CAS 7. No OC either.

How much does the hyperthreading affect RAC?

Paul D. Buck
Paul D. Buck
Joined: 17 Jan 05
Posts: 754
Credit: 5385205
RAC: 0

RE: RE: Your computers

Message 95538 in response to message 95537

Quote:
Quote:
Your computers are hidden, so nobody can verify what you are writing about. ;-)
And there is an influence of about 5% depending on what mixture of E@H-apps is running on an i7.

Here's an example task that looks typical:
http://einsteinathome.org/task/147343691

I run 4 other projects, so the mix is quite random. RAM is DDR3 1600MHz @ CAS 7. No OC either.

How much does the hyperthreading affect RAC?


Past experiments show as little as 10% to as high as 40% improvement in throughput over the same CPU with HT off. You will see longer individual run times on tasks running on the system.

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15872262
RAC: 0

RE: RE: Your computers

Message 95539 in response to message 95537

Quote:
Quote:
Your computers are hidden, so nobody can verify what you are writing about. ;-)
And there is an influence of about 5% depending on what mixture of E@H-apps is running on an i7.

Here's an example task that looks typical:
http://einsteinathome.org/task/147343691


Ok, I see it's an i7 920. My root server is an i7 920 running 64 bit Linux too.

You can NOT compare GW(S5) tasks with each other, because some tasks take up to 50% longer for the same credits. ABP1 tasks have pretty constant runtimes, so you can do comparisons with them. If my i7 runs a mix of GW an ABP1 tasks, the ABP1 tasks generate about 43.x credit/h. When 8 ABP1 tasks are running c/h goes down to about 42 - no big difference. Afaik GW tasks make intensive use of SSE2 while ABP1 tasks do a bit SSE and some FPU work. This means they can run together pretty good. The arihtmetic units in a HT cpu are not doubled, but the register set is. If one task uses one specific arithmetic unit, an other task cant, but this will probably only happen with extreme applications(if ever).
I see no alternative to HT enabled. I cant change it on my server anyway.

Quote:

I run 4 other projects, so the mix is quite random. RAM is DDR3 1600MHz @ CAS 7. No OC either.

How much does the hyperthreading affect RAC?

HT will probably always raise your RAC.

cu,
Michael

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.