Message boards : Number crunching : Discussion on increasing the default run time
Author | Message |
---|---|
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
We are planning to increase the default run time from 3 hours to 6 hours and the minimum from 1 to 3 hours to reduce the load on our servers. There will be a transition period where your client will adjust to the new run time which will affect the number of tasks that are queued on your client. I've created this thread for a discussion on what would be the best way to transition to an increased run time. This obviously will only affect people with default run times (people who have not bothered to set this preference) or people who have set their run time to be less than 3 hours. (edit: not 6, whoops!) |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
For people that pull a week of work at a time, due to infrequent internet connections, increasing the runtime from 3 to 6 hours would mean they get twice as much work as they can crunch. Would it be possible to increase the default like 5 minutes a day or something? That would be so gradual that after a week you would be at 3:35 as compared to the 3hrs previously (i.e. only a max of 18% variance). It would take you 6 weeks to get all the way up to 6hrs, but the work flow should be pretty steady for the client. It shouldn't noticably over or under load with work. [edit] I guess for all the same reasons, a gradual change to the min. runtime would be required too. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Anyone that wants to avoid such problems could always change their runtime from the default at a time of their choosing, either before or during such a transition. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 167 |
...assuming they knew about the proposed change. How many of the crunchers actively read the forums? I suspect a very small number. How about a "Rosetta News Letter" mass mailing? If it was a problem, why didn't the project ask "the regulars" to change their default run time ages back? That might have bought some time or even alleviated the issue. I've just changed all mine. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
|
Gavin Shaw Send message Joined: 1 Feb 07 Posts: 10 Credit: 506,456 RAC: 0 |
This obviously will only affect people with default run times or people who have set their run time to be less than 6 hours. Perhaps I'm just thick or slow (it is the weekend where I am), but how does changing the min time to 3hr and the default to 6hr affect me when I have my run time set to 4hr? It is still greater than the min time so nothing should change right? Never surrender and never give up. In the darkest hour there is always hope. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
It is still greater than the min time so nothing should change right? Right. You are not impacted by the proposed change to default run time, because you are not using the default. And you are not impacted by the proposed change to minimum runtime, because you are over the proposed new minimum runtime. Rosetta Moderator: Mod.Sense |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
There might be a downside to increasing the default run time: if a task takes abnormally long for any reason it relies on the watchdog thread to stop it if it exceeds 3 times the preferred time (see below for an example). So if rosetta gets stuck in an infinite loop or something the amount of time wasted will be equal to 3 times the preferred time: clearly shorter preferred times are preferable in such a case. 206764478 Name 1hzh_2cxh_fchbonds_20_30sarel_SAVE_ALL_OUT_4704_289_0 Workunit 188615593 <core_client_version>6.2.18</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 14400 ********************************************************************** Rosetta is going too long. Watchdog is ending the run! CPU time: 48690.3 seconds. Greater than 3X preferred time: 14400 seconds ********************************************************************** called boinc_finish </stderr_txt> ]]> |
Gavin Shaw Send message Joined: 1 Feb 07 Posts: 10 Credit: 506,456 RAC: 0 |
|
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
There might be a downside to increasing the default run time: if a task takes abnormally long for any reason it relies on the watchdog thread to stop it if it exceeds 3 times the preferred time (see below for an example). So if rosetta gets stuck in an infinite loop or something the amount of time wasted will be equal to 3 times the preferred time: clearly shorter preferred times are preferable in such a case. That's a good point. Perhaps the Watchdog should be more aggressive about aborting stuck workunits. Maybe it could abort the WU after 2x, or even 1.5x the specified crunching time. The old 3x with 3 hours is 9 hours, and 1.5x with the new 6 hours would still be 9 hours. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Yes, if the default runtimes were changed, the watchdog could be revised as well. The watchdog used to wait for 4x the preferred runtime. Rosetta Moderator: Mod.Sense |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2116 Credit: 41,115,753 RAC: 17,487 |
I understand completely the motivation behind increasing the default run time and if I only received Rosetta Beta 5.98 WUs I'm sure I'd hold to that default successfully. But as I report here (and previously) I get Mini Rosetta WUs constantly crashing out with "Can't acquire lockfile - exiting" error messages - maybe 60% failure rate with a 3-hour runtime, reducing to 40% failure rate with a 2-hour run time. I've seen this reported by several other people running a 64-bit OS - not just on Vista or with an AMD machine. That said, I don't know how widespread it is. Perhaps you can analyse results at your end. As stated in the post linked above, I get no errors at all with Rosetta Beta, so I'm inclined to think it's not some aberration with my machine. I'd really like to see some feedback on this issue and some assurance it's being investigated in some way. I'd ask that a minimum run time of 2 hours is allowed (I can just about handle that) or some mechanism that allows me to reject all Mini Rosetta WUs. If not, I'm prepared to abort all Mini Rosetta WUs before they run. It's really a waste of time me receiving them if 60% are going to crash out on me anyway. I've commented on this before here, here, here and first of all and more extensively here - see follow-up messages in that thread. No such issues arose for me with my old AMD single core XPSP2 machine - only when I got this new AMD quad-core Vista64 machine. Any advice appreciated. It's a very big Rosetta issue for me, so while I'm sure you'll save a whole load of bandwidth if you go ahead with the proposed changes I just hope some allowance can be made for people in my situation. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 167 |
"Can't acquire lockfile - exiting" That's familiar. Go to "Your Account" then "Computing Preferences" check that at the bottom of the first block "Use at most" is set to 100%. That lock file error is common on systems where this is not set to 100% at some projects. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2116 Credit: 41,115,753 RAC: 17,487 |
adrianxw wrote: "Can't acquire lockfile - exiting" Thanks for the comment - very promising. I'm showing (sorry for the layout): Processor usage Default Home Specifically, which 'use at most' are you referring to? The one under procesor usage? My Default Computer Location is set to 'Home' if that make a difference. |
FalconFly Send message Joined: 11 Jan 08 Posts: 23 Credit: 2,163,056 RAC: 0 |
I don't mind 6h default runtime, as that's what I'm using right now anyway. I also wouldn't mind setting it higher, but : Is it still correct that the Rosetta Client can enter a deadlock and will abort the WorkUnit not before 2x (or even 4x ?) of the scheduled runtime has elapsed ? At least that's what I remember from reading the Q&A a long time ago. I don't have any problems getting an occasional Computing Error or stalled WorkUnit but would mind wasting 24h (or even more) of runtime. If that's all history already and not valid anymore, I'd happily switch to 24h runtime. Just thought I'd ask, as I'm about to set Rosetta to full throttle in my network. -- edit -- I'm also seeing h001b_BOINC_ABRELAX_RANGE_yebf failing with Compute Errors (on different Systems including other Hosts of the Quorum)... Losing 2-5h of work is one thing, losing 12-23h would be more disappointing. Right now (pending any "max time exceeded" related problems), that would by my only concern increasing runtime significantly beyond what I got right now. (would be cool if correct/complete predictions of a failed WorkUnit before the error occured could be credited and counted - that way a model induced compute error wouldn't really matter anymore regardless of runtime) |
ejuel Send message Joined: 8 Feb 07 Posts: 78 Credit: 4,447,069 RAC: 0 |
We are planning to increase the default run time from 3 hours to 6 hours and the minimum from 1 to 3 hours to reduce the load on our servers. Can you please explain this in lamens' terms? Are you stating that you are making changes on the server or on our clients? If on our clients, please explain what you mean. For example, are you making the Work Units twice as big/complex which means my machine will take twice as long to crunch each WU? If you are talking about the server are you stating that our client must wait at least 6 hours before connecting again for reporting or new WUs? Again, your quote is very open ended and can mean a number of things. Thanks. -Eric |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
ejuel, DK is talking about the Rosetta specific preference for how long each task runs on your client machine. If you express no preference, the default is for tasks to run for 3 hours presently. But the drop down list lets you chose from a preference of 1 hour, through 24 hours for each task. This is just a preference. It's not a hard limit and, as is often discussed on the message boards, there are cases where task run well past the runtime preference. By increasing the minimum from 1hr to 3hrs, and the default from 3hrs to 6hrs, more tasks will execute more predictably and consistently within the established preference. The net result of that is that your client (if running with default settings) runs through 4 tasks per day per core, rather then 8. Still doing 24hrs of useful work to help the science of Rosetta@home. Just running more models against each task before reporting the results back. So, it is a change to the definition of the default value for your runtime preference, which is defined on the server side, and effects every task run under the profile the setting pertains to. Rosetta Moderator: Mod.Sense |
ejuel Send message Joined: 8 Feb 07 Posts: 78 Credit: 4,447,069 RAC: 0 |
ejuel, DK is talking about the Rosetta specific preference for how long each task runs on your client machine. If you express no preference, the default is for tasks to run for 3 hours presently. But the drop down list lets you chose from a preference of 1 hour, through 24 hours for each task. Thanks...but a few follow-up questions: 1)Why are all my not-processed-yet WUs now predicting 9hours 41mins to process rather than 6 hours? 6 vs 9:41 is a big difference. 2)What will happen to the 15+ WUs I have that are not completed yet, but are due within 48 hours? Mathematically there is no way I can crunch through 15+ WUs in 48 hours if each WU will take 9:41 to finish. 3)I assume RAC will not change...since RAC is not counting the quantity of WUs but rather the work/time ratio done on those WUs. Any other pitfalls we should consider? -Eric |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
1) BOINC "learns" how long it takes your machine to complete tasks for each project. One or more of your very recent tasks took closer to 10hrs to complete. And so BOINC estimates that future tasks may take about as long (not a valid assumption). 2) Existing WUs in your cache *ARE* effected by runtime changes. That is one of many reasons to discuss and consider the topic carefully before making such a change in the project. And so, if the change were made today, and you've got all that work due in 2 days, your machine would miss some deadlines and the tasks would not receive credit. Then things would be back to normal. (or you would have to manually abort a few of them, until your machine adjusts to the new runtime). 3) Correct. RAC will not be directly impacted by the change. Rosetta Moderator: Mod.Sense |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Is there a reason why the watchdog couldn't work at the level of each individual model rather than the task as a whole? That way, you'd avoid the potential extra time wastage that might happen with longer run times if a model goes haywire. |
Message boards :
Number crunching :
Discussion on increasing the default run time
©2024 University of Washington
https://www.bakerlab.org