issue crunching GPU WUs with ati graphics

astromatto
astromatto
Joined: 26 May 07
Posts: 5
Credit: 276585685
RAC: 189531
Topic 228237

Hi,

I am trying to understand what could be possibly the cause of my issue but I can't get a good grasp on the root cause, my WUs error our all the time almost immediately.
The only interesting info I get is the following error log:

[13:01:23][57640][INFO ] Application startup - thank you for supporting Einstein@Home!
[13:01:23][57640][INFO ] Starting data processing...
[13:01:23][57640][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc.
[13:01:23][57640][INFO ] Using OpenCL device "gfx1031" by: Advanced Micro Devices, Inc.
[13:01:23][57640][ERROR] Couldn't create OpenCL command queue (error: -6)!
[13:01:23][57640][INFO ] OpenCL shutdown complete!
[13:01:23][57640][ERROR] Demodulation failed (error: 2013)!
[13:01:23][57640][WARN ] Sorry, at the moment your system doesn't have enough free CPU/GPU memory to run this task!

I am running the following hw/sw:

ArchLinux with kernel 5.19.11
amdgpu drivers with rocm-opencl-runtime
boinc 7.20.2
AMD Radeon 6700XT

Anybody got a similar issue?
 

Wedge009
Wedge009
Joined: 5 Mar 05
Posts: 117
Credit: 16005012337
RAC: 7337944

If at all possible, can you

If at all possible, can you see if you still get the error with BOINC before v7.18? I seem to have the same error as you're reporting, when running recent BOINC versions.

https://boinc.berkeley.edu/forum_thread.php?id=14786

Soli Deo Gloria

mikey
mikey
Joined: 22 Jan 05
Posts: 12044
Credit: 1834323219
RAC: 37243

Wedge009 wrote: If at all

Wedge009 wrote:

If at all possible, can you see if you still get the error with BOINC before v7.18? I seem to have the same error as you're reporting, when running recent BOINC versions.

https://boinc.berkeley.edu/forum_thread.php?id=14786 

My problem with that pc is that it is actually NOT an AMD cpu so no AMD drivers were ever loaded on that pc, it runs MilkyWay tasks just fine and MilkyWay sees as it an NVIDIA Quadro K600 (1023MB) driver: 340.10 OpenCL: 1.1 which is what it really is. That power supply is maxed out so it's a gpu without a power plug and I should not have put it on Einstein to begin with.

Wedge009
Wedge009
Joined: 5 Mar 05
Posts: 117
Credit: 16005012337
RAC: 7337944

I'm sorry, I don't understand

I'm sorry, I don't understand your response. Did you intend to reply to this discussion? I thought this was about AMD, not Nvidia.

Soli Deo Gloria

Wedge009
Wedge009
Joined: 5 Mar 05
Posts: 117
Credit: 16005012337
RAC: 7337944

I did some digging -

I did some digging - initialize_ocl() seems to be a function in Einstein code, not BOINC. For whatever reason, though, newer BOINCs causes a problem in it. According to the source code for BRP - which may well be out of date - error code 2013 is the definition in demod_binary.h for RADPUL_OCL_MEM_ALLOC_DEVICE. It's one of the error codes in response clCreateCommandQueue(), which is an OpenCL function. Error code -6 corresponds to CL_OUT_OF_HOST_MEMORY. It seems to be a common error code for a variety of reasons, so I suspect it's not really out of memory, just some weird interaction between potentially old Einstein code and new BOINC.

Soli Deo Gloria

astromatto
astromatto
Joined: 26 May 07
Posts: 5
Credit: 276585685
RAC: 189531

I solved my issue randomly by

I solved my issue randomly by trying to run boinc in another folder without systemd, this is NOT related to einstein in any way but in the way ArchLinux packages boinc, the systemd script uses the following value: ProtectSystem=strict

This seems to create some very weird permission issues that I don't fully understand, but basically the user boinc can't fully use OpenCL, commenting out that line makes everything working!

Wedge009
Wedge009
Joined: 5 Mar 05
Posts: 117
Credit: 16005012337
RAC: 7337944

Well, I'm glad you seem to

Well, I'm glad you seem to have your situation resolved, although I'm puzzled why none of your hosts are registered as having been active with Einstein within the last 30 days. I do wonder what systemd might have to do with BOINC and ROCm, however...

Soli Deo Gloria

astromatto
astromatto
Joined: 26 May 07
Posts: 5
Credit: 276585685
RAC: 189531

I am still finishing some

I am still finishing some pool work for gridcoin, that's why. Once that will be done I'll move back to my own account :)

Wedge009
Wedge009
Joined: 5 Mar 05
Posts: 117
Credit: 16005012337
RAC: 7337944

For the benefit of anyone

For the benefit of anyone finding this thread later, I point out your work-around mentioned in https://einsteinathome.org/content/brp7-opencl-ati-wont-run-because-not-enough-mem

Soli Deo Gloria

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.