Discussion on increasing the default run time

Author	Message
Matthew Lei Send message Joined: 5 Jun 06 Posts: 4 Credit: 258,058 RAC: 0	Message 58316 - Posted: 1 Jan 2009, 4:00:14 UTC - in response to Message 57828. We are still waiting to test out some bug fixes. Does that mean you guys are going ahead with the change? ID: 58316 · Rating: 0 · rate: / Reply Quote

P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0	Message 58437 - Posted: 3 Jan 2009, 22:19:02 UTC Hi. The sooner they bring this in the better i say, less people hammering the servers. pete. ID: 58437 · Rating: 0 · rate: / Reply Quote

6dj72cn8 Send message Joined: 18 Apr 06 Posts: 5 Credit: 207,684 RAC: 0	Message 58778 - Posted: 13 Jan 2009, 4:04:51 UTC My preference is for a minimum of two hours. ID: 58778 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 58783 - Posted: 13 Jan 2009, 14:37:47 UTC Harry, could you please talk a bit about WHY that is your preference. What is it about how you use your machine that makes this better for you? Rosetta Moderator: Mod.Sense ID: 58783 · Rating: 0 · rate: / Reply Quote

6dj72cn8 Send message Joined: 18 Apr 06 Posts: 5 Credit: 207,684 RAC: 0	Message 58793 - Posted: 14 Jan 2009, 0:50:15 UTC After some thought I am unable to justify my preference in a fashion likely to be helpful or meaningful to the project. I therefore withdraw my previous comment and ask you to ignore it. ID: 58793 · Rating: 0 · rate: / Reply Quote

LizzieBarry Send message Joined: 25 Feb 08 Posts: 76 Credit: 201,862 RAC: 0	Message 58794 - Posted: 14 Jan 2009, 1:58:54 UTC It may be that a short run-time is chosen by some users to better reflect the kind of time they spend on a computer before shutting down, thus enabling a WU to complete without keeping the computer on longer than wished. While checkpointing has been infrequent this may be a way of preventing the same WU from restarting over and over. But if checkpointing is to be made more frequent with the new Mini Rosetta version being tested soon that part of the problem may disappear. Going back through this thread, the idea that the minimum be increased from 1 to 2 hours and the default increased from 3 to 4 hours as a first interim change makes sense. Not as drastic as doubling it all at once. It can be assessed for unexpected results before increasing the default from 4 to 5 hours maybe a month later. Then again to 6 hours for a while before changing the minimum from 2 to 3 hours. Every change would go towards helping the server loads to a lesser degree. ID: 58794 · Rating: 0 · rate: / Reply Quote

Virtual Boss* Send message Joined: 10 May 08 Posts: 35 Credit: 713,981 RAC: 0	Message 58799 - Posted: 14 Jan 2009, 11:51:58 UTC Last modified: 14 Jan 2009, 12:00:11 UTC For those who may be interested in the effect of runtime changes. Below is a list which shows credit vs down/up traffic for 3 weeks pre/post runtime change. Weekly Dates Credit Down MB UP MB 26Oct08-01Nov08 2079 150.1 6.0 02Nov08-08Nov08 2519 235.9 8.7 09Nov08-15Nov08 3222 211.2 29.1 16Nov08-22Nov08 3894 118.2 6.4 23Nov08-29Nov08 4348 120.0 12.3 30Nov08-06Dec08 2839 117.3 4.9 After this thread started I changed my runtime preferences. Before 15Nov all my hosts were default 3hrs. On 15Nov I changed runtimes as follows: 10hrs - 1 Host (~80% of RAC) 6hrs - 2 Hosts (~11% of RAC) 4hrs - 3 Hosts (~9% of RAC) The List shows the obvious drop in internet traffic and increased Credit output due to changing the runtime (which was the only change made). [EDIT] The increase in credit shown here is more likely due to variation in project crunching ratio - but overall shows ~10-15% increase since change.[/EDIT Bruce ID: 58799 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 58802 - Posted: 14 Jan 2009, 13:30:17 UTC Yes virtual, I wouldn't "sell it" as a RAC improvement. It should be basically unmeasurable. And, as you can imagine, if a new protein come in to study during your test, then you had an extra couple of 2-3MB files to download. It really varies. So, it isn't even really intended to be much of a bandwidth saving. I really boils down to the number of hits on the scheduler. After that, the specific file transfers are not the main focus of changing these values. You can also reduce file transfer bandwidth (and scheduler hits) if you keep more days of work on your machine. If you say connect about every 3 days for example, rather then the 0.1 days default, that can make a nice reduction in hits on the servers. Not that setting to .1 days means you actually always do 240 hits per day, but the higher value will reduce the number of requests. Rosetta Moderator: Mod.Sense ID: 58802 · Rating: 0 · rate: / Reply Quote

Virtual Boss* Send message Joined: 10 May 08 Posts: 35 Credit: 713,981 RAC: 0	Message 58805 - Posted: 14 Jan 2009, 14:41:46 UTC - in response to Message 58802. Yes virtual, I wouldn't "sell it" as a RAC improvement. It should be basically unmeasurable. And, as you can imagine, if a new protein come in to study during your test, then you had an extra couple of 2-3MB files to download. It really varies. So, it isn't even really intended to be much of a bandwidth saving. I really boils down to the number of hits on the scheduler. After that, the specific file transfers are not the main focus of changing these values. You can also reduce file transfer bandwidth (and scheduler hits) if you keep more days of work on your machine. If you say connect about every 3 days for example, rather then the 0.1 days default, that can make a nice reduction in hits on the servers. Not that setting to .1 days means you actually always do 240 hits per day, but the higher value will reduce the number of requests. Hi Mod.Sense I agree that there is a large number of variables, but they do tend to average out over the longer term. Below are the figures for 2 months pre/post which still show a significant decrease in traffic. Even though during the post period I have noticed there have been increased numbers of new proteins and several series which repeatedly 'crashed out' on my hosts and problems with credit generated, which would all have the effect of reducing the amount of traffic reduction I have seen. These figures indicate a 33% increase in credit per MB of Download. Date ranges Credit DownMB Ratio 16Sep08-14Nov08 25373 1196.2 21.21 15Nov08-13Jan09 30202 1057.2 28.57 Simple maths will tell you that for a particular protien, if you double your runtime then you will roughly double the number of models completed, thereby roughly doubling your credit (per MB DL). If you still crunch for the same number of hrs per day this means your traffic is roughly halved. I believe in the longer term my stats will approach that figure. I was also wondering where the total credit increase came from, and suspect it may partially be due to less cpu time 'wasted' by 1 - network traffic and 2 - loading and initialising the work unit before it can start actually crunching any useful data. I guess more time will give more accurate findings. And Yes - My overall servers hits have reduced considerably. (maybe by 30-40% guesstimate) Bruce ID: 58805 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 58807 - Posted: 14 Jan 2009, 16:05:30 UTC I was referring to RAC. You are using a credit per MB of BW system. So, yes, I'd expect that the more factors you can play together that reduce MB of download will improve your credit per MB. But, it will not make a material difference in credit per day of crunching, which is what RAC amounts to. I'm curious too, how did you measure your bandwidth? Are you using a proxy server that recorded that? I'm not questioning your figures. Just looking for more ways to measure it my self :) Rosetta Moderator: Mod.Sense ID: 58807 · Rating: 0 · rate: / Reply Quote

Virtual Boss* Send message Joined: 10 May 08 Posts: 35 Credit: 713,981 RAC: 0	Message 58847 - Posted: 16 Jan 2009, 13:18:22 UTC - in response to Message 58807. I was referring to RAC. You are using a credit per MB of BW system. So, yes, I'd expect that the more factors you can play together that reduce MB of download will improve your credit per MB. But, it will not make a material difference in credit per day of crunching, which is what RAC amounts to. I'm curious too, how did you measure your bandwidth? Are you using a proxy server that recorded that? I'm not questioning your figures. Just looking for more ways to measure it my self :) I am using a commercial program called BWMeter, primarily to control b/w allocations to each host on my network to stop any host 'hogging' the internet. I also has quite good statistics among many other features. ID: 58847 · Rating: 0 · rate: / Reply Quote

]{LiK`RangerS` Send message Joined: 27 Oct 08 Posts: 39 Credit: 6,552,652 RAC: 0	Message 59477 - Posted: 9 Feb 2009, 4:32:30 UTC - in response to Message 58847. im going quad :D ID: 59477 · Rating: 0 · rate: / Reply Quote

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 59982 - Posted: 5 Mar 2009, 2:00:19 UTC So, what was the decision on increasing the minimum and default runtime? Did you decide to upgrade the DB server instead? Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 59982 · Rating: 0 · rate: / Reply Quote

mike46360 Send message Joined: 21 May 07 Posts: 10 Credit: 18,011 RAC: 0	Message 61792 - Posted: 16 Jun 2009, 18:09:27 UTC We are planning to increase the default run time from 3 hours to 6 hours and the minimum from 1 to 3 hours to reduce the load on our servers. I increased the run time from 3 hours to 6 hours last night.. Does this help the folding at all or is it just to ease the pain on the servers? ID: 61792 · Rating: 0 · rate: / Reply Quote

ByRad Send message Joined: 12 Apr 08 Posts: 8 Credit: 15,904,173 RAC: 0	Message 61794 - Posted: 16 Jun 2009, 20:45:48 UTC But there will be albo a problem... Just for try I have changed my runtime from default (3 hours) to th maximum value of 24 hours for couple of days. Ane the effect was that only 4 of 14 tasks have finished without errors (I tried 2 days on WinXPx86 and 2 days on Win& 64b, so it doesn't depend on the wersion of rosetta (I mean x64 / x86, not v.1.74) ). In that period I was running my PC all the time (24h a day) restarting it once or twice a day. So increasng the runtime will also reduce the number of Work Units finishing properly. Because of this I think that it woult be really nice idea if the result of every finished WU (valid or erroneus) were sent to the serwer, because the error can occur in the first model but also after 100 models finished properly. And if there were also sent some informations about error, it would give some debug informations for developers (without huge increasing of the traffic). ID: 61794 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 61832 - Posted: 18 Jun 2009, 13:18:33 UTC mike, it just keeps your machine busy with less overhead on the project servers. ByRad, the Rosetta applications does send partial successes. If you complete 50 models and then number 51 fails, the task is reported back and should show as a success. It also sends some information back to help diagnose what caused the problems with the 51st model. So, the system may not always work perfectly, but the suggestions you have made are already in the code. Rosetta Moderator: Mod.Sense ID: 61832 · Rating: 0 · rate: / Reply Quote

P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0	Message 62671 - Posted: 31 Jul 2009, 6:27:03 UTC Hi. I see that nothing has been done about this, it might help you with the type of server problems your having at the moment. putting the default to at least four hours. I.M.H.O. ID: 62671 · Rating: 0 · rate: / Reply Quote

Warped Send message Joined: 15 Jan 06 Posts: 48 Credit: 1,788,185 RAC: 0	Message 63031 - Posted: 24 Aug 2009, 15:32:11 UTC I live in a bandwidth-impoverished part of the world, with high prices and low speed. Consequently, I have selected 16 hours run time. However, I find this thread as well as the others discussing long-running models to be of little interest when I have work units running for about 4 hours. Is the preferred run time really applied? *Warped* ID: 63031 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1834 Credit: 124,950,919 RAC: 22,405	Message 63032 - Posted: 24 Aug 2009, 15:55:08 UTC Last modified: 24 Aug 2009, 15:56:55 UTC I'd happily change my run-time prefs so that computers that are on lots have a high run-time and the others have a low run-time but I find this really difficult as they're tied to the BOINC work/home/school settings (which I think are poor, but not the project's fault ;) ). I also use BAM but that doesn't allow changes to the run-time, so I'm left with the default. Being able to select a run-time preferences per machine would be useful, but probably only for a minority i guess... (just noticed the project haven't posted on this for a while!) ID: 63032 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Jun 08 Posts: 1261 Credit: 14,421,737 RAC: 0	Message 63042 - Posted: 25 Aug 2009, 23:34:48 UTC - in response to Message 63031. I live in a bandwidth-impoverished part of the world, with high prices and low speed. Consequently, I have selected 16 hours run time. However, I find this thread as well as the others discussing long-running models to be of little interest when I have work units running for about 4 hours. Is the preferred run time really applied? I have noticed that on my faster machine, the limit of 99 decoys is usually reached before the 12-hour expected runtime I've requested. You might want to check the report visible on the Rosetta@home of how well the workunit succeeded to see if your workunits also often stop at the 99 decoys limit instead of near the requested run time. ID: 63042 · Rating: 0 · rate: / Reply Quote