Increased WU size -:(

Carlos_Pfitzner
Carlos_Pfitzner
Joined: 27 May 06
Posts: 27
Credit: 36999
RAC: 0
Topic 191477

Quote:
Bernd said: To make up for the faster Apps we increased the size of the workunits. The "long" ones will be roughly five times as long as the "long" ones from S4, the "short" ones will be roughly twice as long as their S4 counterparts.

I ask u reconsider and make smaller WUs again !

Problems of large WUs

1) Download time

On Dialup this takes a loong time to download
*Dialup are not stable links and time-out and drop-connection often

At each connection hicup a retry to download is made ... This happens to re-start d/l from begin

So, to download a 16 MB WU, I end up with a 64 MB
network traffic, (or more) cause the retry(s)

2) Crunch time.

Even using Akos optimzed app this is delaying
much more than 1 or 2 hours elapsed to crunch a single WU.

Well, at least on my Atlhon XP+ @ 1600 mhz.

As more the WU delays to crunch, greater a probability
that a power loss do occurs while crunching it
and ruin *all* work already done.
cause corrupted file system -or- damaged HD

Disk space need to keep 2 wus is big!
*cannot download only 1 WU at a time , cause "time to download"
so, is need to have 1 WU crunching and another ready to crunch,
to avoid CPU temperatures changes,
that can crack silicon, and renders whole pc innoperant

I suppose that not only I is connected via Dialup
and crunching with a CPU of less than 3000 MHZ clock speed

IMHO a greater % of crunchers ... more than half !

this way, I will be forced to Quit !

*However u may make also a user selectable WU size
on einstein preferences

to satisfy everyone whishes

Thanks

Click signature for global team stats

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

Increased WU size -:(

No question that S5 makes EAH a harder project for us crunchers running "old timers, however: (comments inline)

Quote:


I ask u reconsider and make smaller WUs again !

Problems of large WUs

1) Download time

On Dialup this takes a loong time to download

Agreed, the large "raw" data file is not a joy to have to DL for dialup hosts, but keep in mind that you generally run more than one result off any given data pack so it's not like every DL needs to get a new one.

Quote:


2) Crunch time.

Even using Akos optimzed app this is delaying
much more than 1 or 2 hours elapsed to crunch a single WU.

I agree the length of the crunch is far greater than before generally, but is unavoidable due to the higher sensitivity of the S5 run, and other backend factors not strictly related to sheer crunching concerns.

However your concerns over loss of computation time are not totally valid. A power failure or other system crash generally only results in loosing the calculation from the last checkpoint, not the whole thing.

Your point about a complete hard disk failure is well taken from a data loss and work loss POV, but that's true *every* time you run your machine. In my case for example, I don't relish the thought of one of my K6-2/500's loosing a drive 175 hours into a 225 hour S5 whopper, but that's the risk you take when you *choose* to participate. Fortunately, HD failures like that are infrequent enough, even with running 24/7/365, I'll take that chance. The other data I have onboard is much more important to me, that's why I do backups so my risk of loss is relatively low, including for EAH.

Quote:

Disk space need to keep 2 wus is big!
*cannot download only 1 WU at a time , cause "time to download"
so, is need to have 1 WU crunching and another ready to crunch,
to avoid CPU temperatures changes,
that can crack silicon, and renders whole pc innoperant

As I said before, the only time you carry more than one data file is when there are no more results to crunch from the existing one, and the project requests that BOINC delete the old one when it's through with it. This was true for S4 as well. FWIW, I haven't noticed a really significant increase in disk space used since S5 began.

Your point about reducing thermal cycling for computer components to enhance life expectancy is valid, that's one of the reasons I run DC as well. However manufacturers test their components over the full range from cold start to full load, and simulate that over their expected "normal" useful life (3 to 5 years).

Quote:


I suppose that not only I is connected via Dialup
and crunching with a CPU of less than 3000 MHZ clock speed

IMHO a greater % of crunchers ... more than half !

this way, I will be forced to Quit !

*However u may make also a user selectable WU size
on einstein preferences

to satisfy everyone whishes

Thanks

Your other comments have merit, but the project team has to consider their requirements and resources as well as those of the volunteers.

In my case, EAH is now a "tight" deadline project for my slugs, but that isn't going to stop me from letting them plow through as many results as they can.

Regarding you suggestion about user selection on WU length, the problem with that is it leads to "cherry picking" by the more credit oriented participants and that leads to a whole set of problems in and of itself. One thing is clear, and that is there is a fair number of them running EAH, based on some of the posts I've read lately, particularly with the request to cease running Akos' prototype optimized apps.

Alinator

baracutio
baracutio
Joined: 18 Jan 05
Posts: 2
Credit: 598761
RAC: 63

RE: On Dialup this takes a

Quote:

On Dialup this takes a loong time to download
*Dialup are not stable links and time-out and drop-connection often

At each connection hicup a retry to download is made ... This happens to re-start d/l from begin

So, to download a 16 MB WU, I end up with a 64 MB
network traffic, (or more) cause the retry(s)

hm, i think boinc supports suspend/resume downloads, right? i have 56k dial up too and i'm downloading einstein-wus in 2 parts (maybe 8mb today and the rest tomorrow).

mfg bara

Fuzzy Duck
Fuzzy Duck
Joined: 3 Dec 05
Posts: 37
Credit: 936924
RAC: 0

Carlos, I really don't

Carlos,

I really don't understand any of your points.

I have previously d/l'ed using dial-up, including these large 16MB files. When the connection is lost, it does not have to start from the beginning.

What has the size of the WU got to do with losing power? If you corrupt the file system and/or HD, the completed WU will also be lost (most likely).

I assume your point about storage of just 2 WU's and cracking of silicon is surely just a joke.

FD.

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

RE: RE: On Dialup this

Message 40858 in response to message 40856

Quote:
Quote:

On Dialup this takes a loong time to download
*Dialup are not stable links and time-out and drop-connection often

At each connection hicup a retry to download is made ... This happens to re-start d/l from begin

So, to download a 16 MB WU, I end up with a 64 MB
network traffic, (or more) cause the retry(s)

hm, i think boinc supports suspend/resume downloads, right? i have 56k dial up too and i'm downloading einstein-wus in 2 parts (maybe 8mb today and the rest tomorrow).

mfg bara

I think that's the case and they implimented that last year some time when they were having trouble with some of the mirror sites, but I can't swear to it.

Alinator

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5851
Credit: 110769191695
RAC: 32952138

Hi, You seem to

Hi,

You seem to misunderstand the way the project works so perhaps I can explain a few things to help clear up the misconceptions you seem to have.

Quote:


On Dialup this takes a loong time to download
*Dialup are not stable links and time-out and drop-connection often

At each connection hicup a retry to download is made ... This happens to re-start d/l from begin

So, to download a 16 MB WU, I end up with a 64 MB
network traffic, (or more) cause the retry(s)

It is certainly true that, on a dialup link, it could take quite a while to download the 16MB large files. Under S4 the equivalent files were only about 4.5MB to about 7.5MB if I remember correctly. I didn't take that much notice of the precise size.

At a rough guess an S5 data file might last 5 times longer than an S4 one - the 5 times ratio mentioned by Bernd. So in the time you get to work on one 16MB large file for S5 you might have had to download 5 x 6MB S4 files. I strongly suspect that you will actually be downloading less in total under S5 than you were for S4.

As far as interrupted downloads are concerned, I was under the impression that BOINC could restart an interrupted download from the point of interruption without having to start at zero. I don't have dialup so I don't know for sure. However, on a broadband connection, I'm sure I've seen downloads restart at the point of interruption when a lost network connection is restored.

Quote:

Even using Akos optimzed app this is delaying
much more than 1 or 2 hours elapsed to crunch a single WU.

Well, at least on my Atlhon XP+ @ 1600 mhz.

I would guess that an AMD XP1600+ could easily take 18-20 hours to crunch a long result. Let's call it 20 hours. Let's say it gave 176 credits. Imagine you could slice that long result into 10 short ones that each took 2 hours to crunch and returned 17.6 credits. How could this be a better outcome? At the end of a 20 hour period your have contributed exactly the same amount of science and received exactly the same number of credits. What benefit is there in calling it 10 results rather than 1 result? In a very short time they've all been removed from the online database anyway. Any thought that you are somehow contributing more to the project is completely an illusion.

Quote:
As more the WU delays to crunch, greater a probability
that a power loss do occurs while crunching it
and ruin *all* work already done.
cause corrupted file system -or- damaged HD

A power loss simply does not ruin the work already done. As a result is being crunched, checkpoint files are written regularly. If a power loss occurs and then BOINC is restarted, crunching will resume from the saved checkpoint and virtually nothing is lost.

Yes, a power loss could cause disk corruption or physical damage. But can you explain how long results compared to short results would make the risk of disk corruption any different?

Quote:
Disk space need to keep 2 wus is big!
*cannot download only 1 WU at a time , cause "time to download"
so, is need to have 1 WU crunching and another ready to crunch,

Do you realise that once you have downloaded a single large data file (16 MB) you can get literally 100's of results from that one download? Let's say you set a 5 day cache and that causes a download of 6 fresh results. Very little extra data is downloaded - you would hardly notice it, even on dialup. What is downloaded is a set of very short "instructions" on how to "slice off" those 6 extra results from your large data file. The physical data for those new results is already on your computer. So there is virtually no time loss in getting new results when needed.

Only when that large data file is completely "used up" will you need to download a fresh large data file. At that point the original one is deleted, freeing up the space it occupied. And yes it might take a while to get that new file. However if you have a cache of say 1 day you will have 24 hours warning of the need to download a new large data file.

Quote:
to avoid CPU temperatures changes,
that can crack silicon, and renders whole pc innoperant

Sure, there is a temperature change when a CPU goes from 100% load to idle or vice versa. I've never heard of that causing silicon to crack :). If it could then you had better not fire up your computer in the morning or shut it down at night because there is a bigger temperature change at those times :).

Quote:

I suppose that not only I is connected via Dialup
and crunching with a CPU of less than 3000 MHZ clock speed

IMHO a greater % of crunchers ... more than half !

this way, I will be forced to Quit !

I don't understand how being on dialup with a less than 3.0GHz CPU would cause you to quit?? When you were crunching S4 at an accelerated rate weren't you putting your dialup link under much greater strain because you would have been downloading 4.5Mb to 7.5MB data files at a much higher rate than you are now??

I have a lot of experience with slower machines, seeing as I own quite a few in the range of 400 to 1000MHz, mainly PIIIs. They are all doing just fine on the S5 data. Sure a few of them take up to 60 hours to crunch a long S5 result and get their 176 credits. I've just looked at a Dell PIII-450 that finished its last S4 result on June 17 and returned its first S5 on June 19. It has been crunching for exactly three months without a single break and has a total credit of 12,100 and a RAC of 134. What's wrong with that? It has not crashed once and has had no downtime in the whole three months. Because the S5 optimisations are not as good as what Akos achieved with S4, its RAC will fall a bit in the future, but so what!! It's still doing the science!

Cheers,
Gary.

[B@H] Ray
[B@H] Ray
Joined: 4 Jun 05
Posts: 621
Credit: 49583
RAC: 0

Large units yes, but when it

Large units yes, but when it is downloaded you get several WU's to run off the same file, these are small downloads.

All work with Akos optimzed apps has ended for now, all have to switch back to the stock app. Several other threads on that.

1 to 2 hours crunch time is not all that bad, when you get the larger ones they should take ~20 hours. Even with download problems that is a long time between downloads.

Just try and hang in there, you may get a better connection next time.


Try the Pizza@Home project, good crunching.

Scott Brown
Scott Brown
Joined: 9 Feb 05
Posts: 38
Credit: 215235
RAC: 0

I have one machine on dial-up


I have one machine on dial-up and think that those responding seem to be missing a couple of points for such set-ups...

1) While all are correct regarding S5 downloads having a lot of 'bang-for-the-buck' in number of workunits to crunch, this is irrelevant if one cannot keep the line tied up for the needed time to complete the download. Though it is true that the download will restart at the point where the disconnection happened, this still may result in unecessary idle times for some machines depending on set-up, number of projects, etc. (i.e., a machine may be connected at such inconsistent frequency that all work is completed before the new download is finished--especially since upload files are only about 120-150k or so). Simply put, the longer workunits may make EAH a very inefficient project for some dial-up users.

2) There seems to be some confusion regarding the nature of 'risk' (a statisitcal term) as it pertains to long vs. short workunits for dial-up users. It is true that the risk of failure applied at any single point in time is probably identical across both types of workunits. However, the cumulative risk (which is the much more relevant issue regarding failure issues--see numerous publications on hazard modeling, failure-time models, etc.) is obviously greater for the longer workunits, in general, and may be moreso for dial-up users. Given the infrequent connections for some dial-up users and that such users are not generally connected to LAN backup procedures (usually resorting to some form of manual backup), the risk of loss is quite likely greater for all workunits, and may increase at a faster rate than for non-dial-up users as workunit length increases (e.g., an exponential vs. a linear increase in risk for the two different groups of users).

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

Agreed, but that's why you

Agreed, but that's why you need the human, "Central Scrutinizer" to weigh all the factors involved in deciding how best to allocate your resources.

There is no "one size fits all" solution.

Alinator

Joachim Schmidt
Joachim Schmidt
Joined: 19 Feb 05
Posts: 35
Credit: 391050
RAC: 0

Solution for problem 1

Solution for problem 1 Scott:
Just set your cache high enough ...

Solution for problem 2:
there is no solution because there is no problem :)
if your disk fails, you have much more problems than a Workunit that got lost.
For example your really important data, buying an installing your new disk etc.

greets

Dronak
Dronak
Joined: 21 Mar 05
Posts: 28
Credit: 10402879
RAC: 0

RE: RE: Bernd said: To

Quote:
Quote:
Bernd said: To make up for the faster Apps we increased the size of the workunits. The "long" ones will be roughly five times as long as the "long" ones from S4, the "short" ones will be roughly twice as long as their S4 counterparts.

I ask u reconsider and make smaller WUs again !
2) Crunch time.

Even using Akos optimzed app this is delaying
much more than 1 or 2 hours elapsed to crunch a single WU.

First, the optimized applications aren't allowed now. You should stop using them if you haven't already.

I guess because I'm on DSL and have a good size hard drive, I don't see any real problems with download time or storage of work units. However, the crunch time is definitely a factor. I've gone from crunching 1 work unit in about 1 hour with the old optimized applications (maybe it was 2 hours with the standard application, I don't remember) to crunching 1 work unit in about 17 hours with the standard application. This is kind of ridiculous. I can't even return one unit a day now since I run SETI as well and it always has work available. OK, maybe the S5 application is faster than the S4 application, but when the 5x longer work units take around 10x longer to crunch, it seems out of proportion. I'd definitely like to see shorter work units, but somehow I doubt that's going to happen. :(

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.