I just aborted a WU

Neal Watkins
Neal Watkins
Joined: 19 Feb 05
Posts: 18
Credit: 1273657
RAC: 82
Topic 197562

It seemed to hang with about 11 hours remaining. OK - I've seen that before so I let it run for a long time. Still there. I did a snooze to let the cpu rest for a while and un-snoozed boinc. The time remaining jumped to 15 hours. It went down slowly (as before) so I snoozed again as a test. Yep - the time remaining jumped to 15 hours so I aborted the WU. Is this normal?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5851
Credit: 110575270041
RAC: 32869936

I just aborted a WU

Quote:
It seemed to hang with about 11 hours remaining. OK - I've seen that before so I let it run for a long time. Still there.


The task was from the Gamma Ray Pulsar (FGRP3) run. They can be slow to finish, particularly on old hardware like yours. Even though you say "a long time", it probably wasn't quite long enough. Just let them run, they will finish.

Quote:
I did a snooze to let the cpu rest for a while and un-snoozed boinc. The time remaining jumped to 15 hours. It went down slowly (as before) so I snoozed again as a test. Yep - the time remaining jumped to 15 hours so I aborted the WU. Is this normal?


Yes, if you don't have the BOINC setting 'Leave apps in memory when suspended' set to yes. If you don't have that setting enabled, any time you suspend proceedings for whatever reason, the task will be unloaded from memory and the accumulated crunching time from the last checkpoint will be lost. When you resume, the task will be reloaded from a saved checkpoint on disk rather than a saved image in memory and so all the progress since the last checkpoint will have to be performed again. It's not all that significant timewise on a modern machine but can represent quite a bit of time on an older host - particularly when near the end of the task. You don't actually mention % progress. Elapsed time and % progress are better numbers to quote rather than remaining time which can be a wildly inaccurate estimate - particularly on modern BOINC versions.

So, unfortunately, you wasted 2 days of crunching when the task was probably fairly close to completion.

Cheers,
Gary.

Neal Watkins
Neal Watkins
Joined: 19 Feb 05
Posts: 18
Credit: 1273657
RAC: 82

By a "long time" I meant

By a "long time" I meant overnight.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5851
Credit: 110575270041
RAC: 32869936

FGRP3 tasks (which that one

FGRP3 tasks (which that one was) run in two separate 'phases'. The analysis phase does create reasonably regular checkpoints. On older hardware, they will still be quite a distance apart, time-wise. Maybe as much as 20-40 mins or more.

When the analysis phase finishes, there is a 'post-processing' stage where no further checkpoints are created. The time taken for this stage cannot be predicted in advance - I think it depends on what 'candidates' may have been found in the analysis phase. It can be quite a time even on a fast processor. However it is variable and unpredictable. It just may be that the one you aborted had a particularly long running post-processing stage.

If your machine normally finishes tasks and they validate, there's no solid reason to think that there was some problem with your machine or this particular task. I would regard a 'long time' to be more in the order of 24-36 hours rather than just overnight. I don't recall ever seeing one of these tasks failing to finish if it had progressed to the stage where yours seemed to be - the post-processing stage. You don't say what the % progress was showing at the time. If it was somewhere around 90% or so, it was probably in that final stage. If you get any more like that one, just let it run and see what happens.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.