UBUNTU 64 10.10+ GeForce GTX260 - all WU failing

RAMen
RAMen
Joined: 18 Jan 09
Posts: 10
Credit: 13945382
RAC: 0
Topic 196058

All GPU WU on a new install are failing info below:
I believe the setup is within necessary requirements.

BOINC client version 6.10.58 for x86_64-pc-linux-gnu
Libraries: libcurl/7.21.0 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.18
Data directory:/var/lib/boinc-client (service install)
Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz [Family 6 Model 23 Stepping 10]
Processor: 3.00 MB cache
MProcessor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse
OS: Linux: 2.6.35-30-generic
NVIDIA GPU 0: GeForce GTX 260 (driver 285.05.09)
CUDA version 4010,
compute capability 1.3, 895MB, 586 GFLOPS peak)

APPLICATION: 1.00 (BRP3cuda32nv270)
MACHINE ID: 4219851

ERROR:
WU fail within seconds of startup

Stderr output

6.10.58

process exited with code 22 (0x16, -234)

execv: No such file or directory

]]>

A similar setup on a Q9400 UBUNTU 10.04 / 9800gtx+ works seamlessly
ID: 4213006

Dogstar
Dogstar
Joined: 13 Dec 05
Posts: 9
Credit: 265763868
RAC: 613024

UBUNTU 64 10.10+ GeForce GTX260 - all WU failing

Obviously your system has all the right stuff and more to run any project on BOINC. I had a problem of invalid and work unit errors about a year ago and someone suggested I might have defective memory. I thought, well the computer runs fine, applications and computer games run fine, how could I have defective memory? So under Administrative Tools I ran Windows Memory Diagnostic. Right away it said I had defective memory. I was still sceptical, so I ran it a few times and always got the same result. Then I downloaded the USB install version of Memtest86, you F8 and boot if off a USB thumb drive. I ran it and it came up with thousands of errors. I had 8GB of OCZ memory at the time (2x4). So I removed a pair of memory and ran WMD and Memtest86, no errors. I removed that pair, set them aside and installed one stick of the remaining pair of memory. It failed both tests. In remvoed that and installed the remaining stick. It passed. Now I knew which of my 4 memory modules was bad. Fortunately the memory was under warrantee and OCZ replaced it, actually they insisted on replacing the stick that paired with it also by sequential serial numbers.

You problem sounds very much like mine. Also make sure everything is firmly socketed. Good luck.

Oh, and it solved my work unit problem!

Jonny Smith
Jonny Smith
Joined: 17 May 05
Posts: 4
Credit: 22835462
RAC: 0

I have a system with the

I have a system with the following:

Sun 01 Jan 2012 14:46:17 GMT		Starting BOINC client version 6.10.59 for x86_64-pc-linux-gnu
Sun 01 Jan 2012 14:46:17 GMT		Config: GUI RPC allowed from:
Sun 01 Jan 2012 14:46:17 GMT		log flags: file_xfer, sched_ops, task
Sun 01 Jan 2012 14:46:17 GMT		Libraries: libcurl/7.21.3 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.18
Sun 01 Jan 2012 14:46:17 GMT		Data directory: /data3/jonny/phaal-kmint-boinc
Sun 01 Jan 2012 14:46:17 GMT		Processor: 4 AuthenticAMD AMD Phenom(tm) 9650 Quad-Core Processor [Family 16 Model 2 Stepping 3]
Sun 01 Jan 2012 14:46:17 GMT		Processor: 512.00 KB cache
Sun 01 Jan 2012 14:46:17 GMT		Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt
Sun 01 Jan 2012 14:46:17 GMT		OS: Linux: 2.6.38-13-generic
Sun 01 Jan 2012 14:46:17 GMT		Memory: 3.62 GB physical, 3.81 GB virtual
Sun 01 Jan 2012 14:46:17 GMT		Disk: 779.01 GB total, 476.64 GB free
Sun 01 Jan 2012 14:46:17 GMT		Local time is UTC +0 hours
Sun 01 Jan 2012 14:46:18 GMT		NVIDIA GPU 0: GeForce GTX 560 Ti (driver version unknown, CUDA version 4010, compute capability 2.1, 2048MB, 901 GFLOPS peak)
Sun 01 Jan 2012 14:46:18 GMT		NVIDIA GPU 1 (not used): GeForce 8300 (driver version unknown, CUDA version 4010, compute capability 1.1, 254MB, 24 GFLOPS peak)

This is not working on kmint (which is very, very similar to ubuntu). The same thing on kubuntuu 10.04 is working.

I have recently updated/downgraded nvidia kernel modules (from 270 series, 285 series and currently failing with 290 series), and messed about with my cuda toolkit installation, though I think einstein is providing it's own cuda toolkit library files (and is running 32 bit libraries on the 64 bit system).

CPU work units on the same host are passing, so I don't suspect memory.

My symptom is lots of messages like

01-Jan-2012 14:58:39 [Einstein@Home] Output file p2030.20100919.G44.81+00.19.S.b1s0g0.00000_3400_0_4 for task p2030.20100919.G44.81+00.19.S.b1s0g0.00000_3400_0 absent
01-Jan-2012 14:58:39 [Einstein@Home] Output file p2030.20100919.G44.81+00.19.S.b1s0g0.00000_3400_0_5 for task p2030.20100919.G44.81+00.19.S.b1s0g0.00000_3400_0 absent
01-Jan-2012 14:58:39 [Einstein@Home] Output file p2030.20100919.G44.81+00.19.S.b1s0g0.00000_3400_0_6 for task p2030.20100919.G44.81+00.19.S.b1s0g0.00000_3400_0 absent
01-Jan-2012 14:58:39 [Einstein@Home] Output file p2030.20100919.G44.81+00.19.S.b1s0g0.00000_3400_0_7 for task p2030.20100919.G44.81+00.19.S.b1s0g0.00000_3400_0 absent

I'm suspicious that there is some sort of cuda toolkit/driver mismatch that is breaking it on my system, but I don't really have enough info to properly diagnose it. I was hoping there might be more info on here. I'm a little troubled that it doesn't pick up my CUDA driver version.

Jonny Smith
Jonny Smith
Joined: 17 May 05
Posts: 4
Credit: 22835462
RAC: 0

Incidentally the same box

Incidentally the same box running kubuntu 10.04 is working fine - on there the boinc topmatter:

Sun 01 Jan 2012 15:44:39 GMT		Starting BOINC client version 6.10.17 for x86_64-pc-linux-gnu
Sun 01 Jan 2012 15:44:39 GMT		log flags: file_xfer, sched_ops, task
Sun 01 Jan 2012 15:44:39 GMT		Libraries: libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15
Sun 01 Jan 2012 15:44:39 GMT		Data directory: /var/lib/boinc-client
Sun 01 Jan 2012 15:44:39 GMT		Processor: 4 AuthenticAMD AMD Phenom(tm) 9650 Quad-Core Processor [Family 16 Model 2 Stepping 3]
Sun 01 Jan 2012 15:44:39 GMT		Processor: 512.00 KB cache
Sun 01 Jan 2012 15:44:39 GMT		Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf
Sun 01 Jan 2012 15:44:39 GMT		OS: Linux: 2.6.32-37-generic
Sun 01 Jan 2012 15:44:39 GMT		Memory: 3.62 GB physical, 3.81 GB virtual
Sun 01 Jan 2012 15:44:39 GMT		Disk: 30.52 GB total, 590.27 MB free
Sun 01 Jan 2012 15:44:39 GMT		Local time is UTC +0 hours
Sun 01 Jan 2012 15:44:39 GMT		NVIDIA GPU 0: GeForce GTX 560 Ti (driver version unknown, CUDA version 4000, compute capability 2.1, 2047MB, 225 GFLOPS peak)
Sun 01 Jan 2012 15:44:39 GMT		NVIDIA GPU 1 (not used): GeForce 8300 (driver version unknown, CUDA version 4000, compute capability 1.1, 254MB, 24 GFLOPS peak)

(note the slightly different cuda version!)
a quick look in /var/log/kern.log confirms
NVRM: loading NVIDIA UNIX x86_64 Kernel Module 280.13 Wed Jul 27 16:53:56 PDT 2011

so this is OK.

Jonny Smith
Jonny Smith
Joined: 17 May 05
Posts: 4
Credit: 22835462
RAC: 0

As a further follow up to

As a further follow up to this, I think the CUDA on the machine that was failing to run work units is fixed in so much as primegrid is happily running CUDA workunits on it now. Unfortunately I have messages from E@H telling me

Tue 03 Jan 2012 11:46:35 GMT	Einstein@Home	Sending scheduler request: To fetch work.
Tue 03 Jan 2012 11:46:35 GMT	Einstein@Home	Requesting new tasks for GPU
Tue 03 Jan 2012 11:46:38 GMT	Einstein@Home	Scheduler request completed: got 0 new tasks
Tue 03 Jan 2012 11:46:38 GMT	Einstein@Home	Message from server: No work sent
Tue 03 Jan 2012 11:46:38 GMT	Einstein@Home	Message from server: (reached daily quota of 4 tasks)
Tue 03 Jan 2012 11:46:38 GMT	Einstein@Home	Message from server: Project has no jobs available

I'm rather hoping this is related to the maintenance that was reported earlier, and will clear up, but I have been receiving this message for about 45 hours since the most recent time boinc downloaded E@H GPU workunits.

Jonny Smith
Jonny Smith
Joined: 17 May 05
Posts: 4
Credit: 22835462
RAC: 0

OK, I should probably leave

OK, I should probably leave this thread to stew for a little while. Suffice to say, although the primegrid work units are running fine on the Mint-KDE system, the E@H units are still failing. It is true that the drivers are now a little more recent on this system, but I'm happy with the more recent drivers, so I guess while I'm booted into KDE, I'll have to make do without E@H.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6550
Credit: 289165317
RAC: 93045

Yes, that 'driver version

Yes, that 'driver version unknown' is odd. I'll ask ....

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 950
Credit: 25167626
RAC: 1

RE: Suffice to say,

Quote:
Suffice to say, although the primegrid work units are running fine on the Mint-KDE system, the E@H units are still failing.

What's the ID of the host affected? You can see it in your E@H account details on our website.

Also, does PrimeGrid provide a native 64 bit CUDA app? We provide 32 bit apps only. Do you have all 32 bit compatibility libs installed? Do you see any issue with our CPU apps on that box?

Cheers,
Oliver

 

Einstein@Home Project

inomyabcs
inomyabcs
Joined: 11 Feb 05
Posts: 6
Credit: 16488057
RAC: 0

Your last workunit had this

Your last workunit had this error.

6.10.59

process exited with code 127 (0x7f, -129)

../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP4_1.00_i686-pc-linux-gnu__BRP3cuda32nv270: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory

]]>

It looks like that Oliver might be on the right track to ask about your library versions as the file above is a standard package on Ubuntu. But since you are running the 64bit version, it is not able to find this file in the 32bit compatible libraries. You can probably fix this by running sudo apt-get install ia32-libs .

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.