Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 337 · 338 · 339 · 340 · 341 · Next

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2382
Credit: 45,224,309
RAC: 23,807
Message 112763 - Posted: 5 Jun 2025, 2:04:29 UTC - in response to Message 112760.  

But there are now six Rosetta tasks stuck in Ready to Report. Two of them have been there since Sunday and four all day Monday.

Requesting updates provokes the message that "Communication is deferred" for two hours or more. Their deadline is the next day.

This is all very odd. It seems like your PC can't connect to the bakerlab servers again.
I presume you've tried a reboot over the last couple of days? It might be worth a try.
Someone who knows more about servers than me (almost everyone) might be able to suggest something. I'm useless at this kind of thing tbh. Sorry.

Six Rosetta tasks still stuck in Ready to Report, four days now... Update always results in "communication deferred"

I've rebooted several times. Other projects are running fine.

I'm thinking of re-setting the project. Maybe that will jar the system into action.

I'm not inclined to think a project reset is going to make the bakerlab server any more reachable.
Can you confirm the few lines in your event log that result in communication being deferred? I'm assuming the server isn't reachable, but just to be sure.
And re-check your hosts file (without any extension - make doubly sure) is still as it should be.

I just don't get how you were able to contact the server to grab and return a few dozen tasks, then it becomes unreachable without something changing in between.
Anyone else with any ideas, pipe up.
ID: 112763 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bill Swisher

Send message
Joined: 10 Jun 13
Posts: 67
Credit: 55,689,068
RAC: 133,640
Message 112764 - Posted: 5 Jun 2025, 2:30:09 UTC - in response to Message 112763.  

Anyone else with any ideas, pipe up.

ping -c5 bakerlab.org
ID: 112764 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bill Swisher

Send message
Joined: 10 Jun 13
Posts: 67
Credit: 55,689,068
RAC: 133,640
Message 112765 - Posted: 5 Jun 2025, 5:05:25 UTC - in response to Message 112764.  

The "-c5" depends on the flavor of your OS.
ID: 112765 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
William Albert

Send message
Joined: 22 Mar 20
Posts: 28
Credit: 2,134,858
RAC: 4,236
Message 112766 - Posted: 5 Jun 2025, 18:31:00 UTC - in response to Message 112762.  

Your AMD Ryzen 7 5800X machine has so many error'd/invalid WUs that I would strongly suspect failing hardware.

It is. It's a repeated disk failure that's kind of described here
I don't trust myself fixing this, so I'm waiting for my hardware guy to get back.
He's out of the country atm - and my fingers are permanently crossed that it doesn't become irretrievable before he returns.


If you have hardware that you know to be unstable, I would suggest not using it for crunching WUs.

Not only is it a waste of electricity when WUs error out or are invalid, since Rosetta@Home doesn't validate results with wingmen, the WUs that don't outright fail can still contain bogus results.
ID: 112766 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stevie G

Send message
Joined: 15 Dec 18
Posts: 129
Credit: 1,028,210
RAC: 477
Message 112767 - Posted: 5 Jun 2025, 20:15:12 UTC - in response to Message 112759.  
Last modified: 5 Jun 2025, 20:25:58 UTC

Two more Validate errors tonight, meaning 2x12hr tasks not being awarded credit.
Another unheard appeal for the daily job that cleans this up to be reinstated.

Probably caused by some disk errors I'm getting locally, but annoying nonetheless :(

More disk errors, 5 more Validation errors (likely more to come). All lost credits again.
I'm going to have to do something about this...

More did come - 8 in all. A temporary fix is in, but it'll return until I can clone onto a new drive :(

Another 8 validation errors and 4 compute errors on top <sigh>

Up to 22 now. So demoralising...


If you knowledgeable people are having so much trouble with Rosetta, I don't feel so stupid when I can't get it to work. That's why I said this project was too unreliable and quirky for a person with my limited abilities.

One would think the project administrators and researchers might have a concern about the problems their crunchers encounter with their project. No? They might get better results and happier volunteers if these issues were resolved.

But nobody cares if you are demoralized?

S. Gaber
Oldsmar, FL
ID: 112767 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2382
Credit: 45,224,309
RAC: 23,807
Message 112768 - Posted: 6 Jun 2025, 3:00:50 UTC - in response to Message 112766.  

Your AMD Ryzen 7 5800X machine has so many error'd/invalid WUs that I would strongly suspect failing hardware.

It is. It's a repeated disk failure that's kind of described here
I don't trust myself fixing this, so I'm waiting for my hardware guy to get back.
He's out of the country atm - and my fingers are permanently crossed that it doesn't become irretrievable before he returns.

If you have hardware that you know to be unstable, I would suggest not using it for crunching WUs.

Not only is it a waste of electricity when WUs error out or are invalid, since Rosetta@Home doesn't validate results with wingmen, the WUs that don't outright fail can still contain bogus results.

I know you're right
I've finally got round to whatsapp'ing my guy to see when he's free
It's only a 1Tb HDD so shouldn't take too long once he can get to it
ID: 112768 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2382
Credit: 45,224,309
RAC: 23,807
Message 112769 - Posted: 6 Jun 2025, 3:15:42 UTC - in response to Message 112767.  

Two more Validate errors tonight, meaning 2x12hr tasks not being awarded credit.
Another unheard appeal for the daily job that cleans this up to be reinstated.

Probably caused by some disk errors I'm getting locally, but annoying nonetheless :(

More disk errors, 5 more Validation errors (likely more to come). All lost credits again.
I'm going to have to do something about this...

More did come - 8 in all. A temporary fix is in, but it'll return until I can clone onto a new drive :(

Another 8 validation errors and 4 compute errors on top <sigh>

Up to 22 now. So demoralising...

If you knowledgeable people are having so much trouble with Rosetta, I don't feel so stupid when I can't get it to work. That's why I said this project was too unreliable and quirky for a person with my limited abilities.

One would think the project administrators and researchers might have a concern about the problems their crunchers encounter with their project. No? They might get better results and happier volunteers if these issues were resolved.

To be fair, the cause of the problem is that I've transferred this hard drive over 2 or even 3 machines, and is probably a dozen years old by now, having been thrashed by Rosetta 24/7/365 and it's finally coming to the end of its life.
It's entirely my own laziness that's causing my issues to persist for so long.
And my appeal for the project to reinstate their cleanup job is just me asking them to immunise me from the consequences off my own inaction.
I have plenty of things I'm dumb as a rock over and the benefit of these forums is there's seemingly always someone who knows their way around every issue, and is generous enough to offer the benefit of their expertise, however baffling things appear.
ID: 112769 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2382
Credit: 45,224,309
RAC: 23,807
Message 112770 - Posted: 6 Jun 2025, 22:08:31 UTC - in response to Message 112768.  

Your AMD Ryzen 7 5800X machine has so many error'd/invalid WUs that I would strongly suspect failing hardware.

It is. It's a repeated disk failure that's kind of described here
I don't trust myself fixing this, so I'm waiting for my hardware guy to get back.
He's out of the country atm - and my fingers are permanently crossed that it doesn't become irretrievable before he returns.

If you have hardware that you know to be unstable, I would suggest not using it for crunching WUs.

Not only is it a waste of electricity when WUs error out or are invalid, since Rosetta@Home doesn't validate results with wingmen, the WUs that don't outright fail can still contain bogus results.

I know you're right
I've finally got round to whatsapp'ing my guy to see when he's free
It's only a 1Tb HDD so shouldn't take too long once he can get to it

Booked in for Monday.
I'll be setting No New Tasks and running my cache down to nothing in advance.
ID: 112770 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 158
Credit: 31,977,565
RAC: 105,172
Message 112774 - Posted: 8 Jun 2025, 13:15:37 UTC - in response to Message 112768.  
Last modified: 8 Jun 2025, 13:29:38 UTC


I know you're right
I've finally got round to whatsapp'ing my guy to see when he's free
It's only a 1Tb HDD so shouldn't take too long once he can get to it


Sid,
I think you are running Windows machines. So...

You have forced a "Chkdsk" ? It takes quite a while during the system boot, but locks out all the bad blocks on your HDD. If the HDD is not picking up bad blocks frequently this could restore it to usability.

I am assuming you are not hearing a new "howl" or a chirping noise?

Respectfully,

===edit===
To force CHKDSK to scan a drive on startup in Windows, you can use the fsutil dirty set command in the command prompt as administrator. This command sets the "dirty bit" on a volume, indicating that a disk check is needed when the computer restarts, and then it will trigger CHKDSK automatically.
===

Detailed Steps:

1. Open Command Prompt as Administrator:
Search for "Command Prompt" in the Start Menu.
Right-click on it and select "Run as administrator".

2. Determine the Drive Letter:

If you don't know the drive letter of the partition you want to check, you can use "diskpart" to identify it.
Open Command Prompt as administrator and type "diskpart".

Type "list vol" to see a list of volumes with their letters.

3. Set the Dirty Bit:

Use the following command, replacing "X:" with the actual drive letter of the volume you want to check:

Code

fsutil dirty set X:

For example, if you want to check drive C, you would use fsutil dirty set C:.

1. Schedule the Check:
The fsutil dirty set command schedules the volume to be checked on the next restart.
2. Restart Your Computer:
Proud member of the O.F.A. (Old Farts Association)
ID: 112774 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2382
Credit: 45,224,309
RAC: 23,807
Message 112777 - Posted: 9 Jun 2025, 17:17:10 UTC - in response to Message 112774.  

I know you're right
I've finally got round to whatsapp'ing my guy to see when he's free
It's only a 1Tb HDD so shouldn't take too long once he can get to it

Sid,
I think you are running Windows machines. So...

You have forced a "Chkdsk" ? It takes quite a while during the system boot, but locks out all the bad blocks on your HDD. If the HDD is not picking up bad blocks frequently this could restore it to usability.

Thanks for your suggestions, Tom.
I did try a CHKDSK early on, but the problem seems to be far beyond FAT corruption to corruption of the Volume tables, with the drive reporting as being in RAW format rather than NTFS
This page describes similar problems and potential solutions, but I don't trust myself to run all the options successfully and I'd risk losing info that goes back to 2003, so I'm reluctant to attempt it myself
How to fix RAW hard drive to NTFS in Windows 11/10
Fortunately, it's happening on my data drive and not my boot drive, so I think it'll be saveable in the right hands (definitely not mine)

In any case, I've delivered the PC to my hardware guy earlier this afternoon with a new hard drive of identical size to clone onto, an additional 16Gb RAM stick to fit and a request to thoroughly clean the numerous (quite appalling level of) dust bunnies from all the nooks and crannies and internal fans.
I have not been treating it well while running it hard 24/7/365 for the last 4-5yrs...

Hoping for it back by Wednesday pm. I'm struggling away on my laptop in the meantime.
I hate all laptops but it serves my purpose for a short while. I'll live.
ID: 112777 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bill Swisher

Send message
Joined: 10 Jun 13
Posts: 67
Credit: 55,689,068
RAC: 133,640
Message 112778 - Posted: 9 Jun 2025, 18:58:21 UTC - in response to Message 112777.  

I hate all laptops but it serves my purpose for a short while. I'll live.

Amen. I put a small AMD 5700G machine together and gleefully took the one laptop I owned and tossed it into the dumpster. While traveling I use a tiny Samsung pad, bluetooth mouse & keyboard, to handle my emails via a browser (total weight about 750g). I bought it to use as a replacement for a Kindle that went TU.
ID: 112778 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
wolfman1360

Send message
Joined: 18 Feb 17
Posts: 73
Credit: 18,567,011
RAC: 8,698
Message 112779 - Posted: 10 Jun 2025, 1:59:03 UTC

Is anyone else having difficulty adding new machines?
Doesn't look like the feeder is running according to the event log?
The threads I did add appear to no longer require virtual box, which goes to show just how out of the loop I am.
On the subject of laptops and portables - the AMD Ryzen in my recently purchased ROG Ally is doing more crunching at 30 wats than my 7 year old Ryzen 1800x at 120 plus. When I see things like that I feel like I should just retire these poor old Xeons.. sometimes. They make good space heaters in the winter at least.
ID: 112779 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1854
Credit: 18,534,891
RAC: 0
Message 112780 - Posted: 10 Jun 2025, 6:55:58 UTC - in response to Message 112779.  

Doesn't look like the feeder is running according to the event log?
If you are using IPv6 you need to disable it, or edit your Hosts file as per the instructions several posts back in this thread (as well as in several other threads).
Then hope for work to actually become available (there have been issues with the feeder for a couple of months now- there may be millions of Tasks queued up, but the Ready to Send buffer is often 0 for extended periods of time).
Grant
Darwin NT
ID: 112780 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 158
Credit: 31,977,565
RAC: 105,172
Message 112781 - Posted: 10 Jun 2025, 12:32:18 UTC - in response to Message 112780.  

Doesn't look like the feeder is running according to the event log?
If you are using IPv6 you need to disable it, or edit your Hosts file as per the instructions several posts back in this thread (as well as in several other threads).
Then hope for work to actually become available (there have been issues with the feeder for a couple of months now- there may be millions of Tasks queued up, but the Ready to Send buffer is often 0 for extended periods of time).


Wolfman,

Try running this from a terminal window wherever your boinccmd is located on your new LInux systems.

watch -n 300 ./boinccmd --project https://boinc.bakerlab.org/rosetta/ update
Proud member of the O.F.A. (Old Farts Association)
ID: 112781 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2382
Credit: 45,224,309
RAC: 23,807
Message 112782 - Posted: 11 Jun 2025, 0:02:04 UTC - in response to Message 112778.  

I hate all laptops but it serves my purpose for a short while. I'll live.

Amen. I put a small AMD 5700G machine together and gleefully took the one laptop I owned and tossed it into the dumpster. While traveling I use a tiny Samsung pad, bluetooth mouse & keyboard, to handle my emails via a browser (total weight about 750g). I bought it to use as a replacement for a Kindle that went TU.

My laptop is a 5700U 8C/16T on which I've limited it to 8 of its 16 threads, partly to keep the heat down and partly because I've only got 16Gb RAM
It's running fine, except it doesn't live the charmed life of my PC and I've only just grabbed enough tasks to keep all those 8 threads busy.
The PC always found a way to magic a 25-30 task buffer for its 16 threads to work on.

The disk transfer isn't going so well.
Cloning resulted in crashes, so we're reverting back to file transfers, which tbh I could do safely, but I'm letting him persist.
File transferring is resulting in crashes too, but only in specific areas, so I'll only lose a very limited amount of data.
Hopefully back tomorrow, either early or late. I'll find out in the morning.
ID: 112782 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 158
Credit: 31,977,565
RAC: 105,172
Message 112783 - Posted: 11 Jun 2025, 1:48:22 UTC - in response to Message 112782.  

I guess an offline backup should be in your future too.
Proud member of the O.F.A. (Old Farts Association)
ID: 112783 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 158
Credit: 31,977,565
RAC: 105,172
Message 112784 - Posted: 11 Jun 2025, 1:50:29 UTC

I had my first validate error in quite a while.

<sniff>

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1440968092
Proud member of the O.F.A. (Old Farts Association)
ID: 112784 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jon C Melusky
Avatar

Send message
Joined: 29 Nov 05
Posts: 15
Credit: 252,822
RAC: 164
Message 112785 - Posted: 11 Jun 2025, 16:46:39 UTC

Is it normal for Rosetta to fill up 3.41 GBs of space on the hard drive?
ID: 112785 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 286
Credit: 536,594
RAC: 273
Message 112786 - Posted: 11 Jun 2025, 16:56:08 UTC

Yes.
ID: 112786 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2382
Credit: 45,224,309
RAC: 23,807
Message 112787 - Posted: 11 Jun 2025, 21:49:34 UTC - in response to Message 112782.  

Hopefully back tomorrow, either early or late. I'll find out in the morning.

Back this morning. Needed to do a bit of tidying up to get everything in the right place (don't ask)
I thought I'd try to fix another little issue before getting back under way and managed lock myself out of all the most critical parts of my computer (something to do with permissions)
Finally managed to fix it this evening by progressively undoing everything I did.

First tasks came down and everything crashed out within seconds - WCG tasks running fine though.
My latest idea I'm guessing is that it's something to do with Norton denying access to some directories, so I've just whitelisted the Boinc program folder and the data folder.
Awaiting another download to see if it worked.

If anyone wants to make any other suggestions, these are the results I've had up to now.

On the plus side, the new drive is performing very well. My settings, less so.

On the other minus side, the new stick of RAM I wanted fitting didn't match the timings of my existing stick.
I'll be buying a matched pair next week to sort it out properly. The price of trying to be cheap in the past.

regards
Dumbo
ID: 112787 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 337 · 338 · 339 · 340 · 341 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2025 University of Washington
https://www.bakerlab.org