Message boards : Number crunching : Many crashes.
Author | Message |
---|---|
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 187 Credit: 6,354,325 RAC: 5,819 |
I recently got 13 tasks and 12 of them failed. One completed successfully. The machine runs other Boinc projects successfully. A typical failure looks like this: Task 1451454977 Name rb_11_21_153050_149232_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_07_08_2728289_69_0 Workunit 1295214447 Created 22 Nov 2021, 7:04:32 UTC Sent 22 Nov 2021, 7:07:03 UTC Report deadline 25 Nov 2021, 7:07:03 UTC Received 22 Nov 2021, 16:39:33 UTC Server state Over Outcome Computation error Client state Compute error Exit status 1 (0x00000001) Unknown error code Computer ID 5958977 Run time 14 min 38 sec CPU time 14 min 19 sec Validate state Invalid Credit 0.00 Device peak FLOPS 3.86 GFLOPS Application version Rosetta v4.20 windows_x86_64 Peak working set size 373.02 MB Peak swap size 352.90 MB Peak disk usage 0.33 MB Stderr output <core_client_version>7.16.20</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @rb_11_21_153050_149232_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 4 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_11_21_153050_149232_ab_t000__robetta.zip -frag3 rb_11_21_153050_149232_ab_t000__robetta.200.3mers.index.gz -fragA rb_11_21_153050_149232_ab_t000__robetta.200.8mers.index.gz -fragB rb_11_21_153050_149232_ab_t000__robetta.200.7mers.index.gz -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1169817 Using database: database_357d5d93529_n_methylminirosetta_database [ ERROR ]: Caught exception: File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306 chi angle must be between -180 and 180: -nan(ind) Did I get a bad batch, or is something else going on? |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
I had some rb_11 tasks get funky , mostly over running time, others took a walk on the wild side It`l pass |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1671 Credit: 17,534,162 RAC: 22,270 |
Did I get a bad batch, or is something else going on?Bad batch. If you click on the link for the Work Unit, you can see that the other systems that tried to process those Tasks also errored out. Grant Darwin NT |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 187 Credit: 6,354,325 RAC: 5,819 |
Did I get a bad batch, or is something else going on? I agree about a bad batch. I have since had 5 work units complete successfully and no more failures. However, one of my failures had another user complete it successfully. |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 187 Credit: 6,354,325 RAC: 5,819 |
I notice all my units run on my Linux machine end up valid. And about half my units on my Windows machine are now coming up valid. FWIW. |
Message boards :
Number crunching :
Many crashes.
©2024 University of Washington
https://www.bakerlab.org