Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 338 · 339 · 340 · 341

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2380
Credit: 45,183,670
RAC: 23,362
Message 112788 - Posted: 11 Jun 2025, 23:18:42 UTC - in response to Message 112787.  

First tasks came down and everything crashed out within seconds - WCG tasks running fine though.
My latest idea I'm guessing is that it's something to do with Norton denying access to some directories, so I've just whitelisted the Boinc program folder and the data folder.
Awaiting another download to see if it worked.

Yup, that made no difference whatsoever... <sigh>
ID: 112788 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 157
Credit: 31,813,796
RAC: 102,358
Message 112789 - Posted: 12 Jun 2025, 1:28:12 UTC - in response to Message 112788.  

First tasks came down and everything crashed out within seconds - WCG tasks running fine though.
My latest idea I'm guessing is that it's something to do with Norton denying access to some directories, so I've just whitelisted the Boinc program folder and the data folder.
Awaiting another download to see if it worked.

Yup, that made no difference whatsoever... <sigh>


If I am clear, you have the Windows on a seperate HDD. And you just replaced the Data HDD where the Boinc is living?

Have you checked the permissions on your Rosetta exe? Or better yet just "reset" the project and it should download everything clean and start running again.

I hope.
Proud member of the O.F.A. (Old Farts Association)
ID: 112789 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 157
Credit: 31,813,796
RAC: 102,358
Message 112790 - Posted: 12 Jun 2025, 1:31:47 UTC

There are several systems that normally have higher RAC's than I do. Yet many of them don't have a full enough cache to run all of the available threads.

Since I know we have published both Linux and Window's polling scripts. They should be able to suck down enough to keep up?

Does anyone reading here run those systems? Tell me/us what is going on?

Respectfully,
Proud member of the O.F.A. (Old Farts Association)
ID: 112790 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1853
Credit: 18,534,891
RAC: 0
Message 112791 - Posted: 12 Jun 2025, 5:02:11 UTC - in response to Message 112789.  

Or better yet just "reset" the project and it should download everything clean and start running again.
Yep.
Reset the Project, and let it re-download all new files (given your disk issues, the existing files could very well be corrupted).
Grant
Darwin NT
ID: 112791 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stevie G

Send message
Joined: 15 Dec 18
Posts: 129
Credit: 1,028,210
RAC: 581
Message 112792 - Posted: 12 Jun 2025, 6:45:33 UTC - in response to Message 112763.  
Last modified: 12 Jun 2025, 6:47:07 UTC

But there are now six Rosetta tasks stuck in Ready to Report. Two of them have been there since Sunday and four all day Monday.

Requesting updates provokes the message that "Communication is deferred" for two hours or more. Their deadline is the next day.

This is all very odd. It seems like your PC can't connect to the bakerlab servers again.
I presume you've tried a reboot over the last couple of days? It might be worth a try.
Someone who knows more about servers than me (almost everyone) might be able to suggest something. I'm useless at this kind of thing tbh. Sorry.

Six Rosetta tasks still stuck in Ready to Report, four days now... Update always results in "communication deferred"

I've rebooted several times. Other projects are running fine.

I'm thinking of re-setting the project. Maybe that will jar the system into action.

I'm not inclined to think a project reset is going to make the bakerlab server any more reachable.
Can you confirm the few lines in your event log that result in communication being deferred? I'm assuming the server isn't reachable, but just to be sure.
And re-check your hosts file (without any extension - make doubly sure) is still as it should be.

I just don't get how you were able to contact the server to grab and return a few dozen tasks, then it becomes unreachable without something changing in between.
Anyone else with any ideas, pipe up.


I reset the project,, rebooted, hit update several times per day and still get no tasks.

You remember the six completed task I had that were ready to send for several days? My account now reports them all "Timed out-- no response."

So I don't understand what is wrong.

S. Gaber
ID: 112792 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2380
Credit: 45,183,670
RAC: 23,362
Message 112793 - Posted: 12 Jun 2025, 10:43:48 UTC - in response to Message 112789.  

First tasks came down and everything crashed out within seconds - WCG tasks running fine though.
My latest idea I'm guessing is that it's something to do with Norton denying access to some directories, so I've just whitelisted the Boinc program folder and the data folder.
Awaiting another download to see if it worked.

Yup, that made no difference whatsoever... <sigh>

If I am clear, you have the Windows on a separate HDD. And you just replaced the Data HDD where the Boinc is living?

Have you checked the permissions on your Rosetta exe? Or better yet just "reset" the project and it should download everything clean and start running again.

I hope.

With Grant confirming your suggestion I've done this straight away.
Before sending the PC away I'd run all tasks down to nothing and set No New Tasks on all projects. I didn't want anything to start working again before I was ready.
The old data HDD was drive E and came back as drive D, so I had to change that back to E in disk management
And it came back with User directories set to C so I re-set symbolic links of docs, downloads, music, pictures, videos etc back to E as well

I was very lucky my repairer sent me pictures of my directories E:/ E:/Users & E:/Users/<Name> to help ensure I wasn't forgetting anything
I didn't ask for those, but he was double-checking what to file-copy after he discovered cloning was failing due to repeated crashes (the reason I needed the new drive) and it turned out to be a Godsend - very fortunate.
Yes, I have had further problems with permissions, which I <think> I've now resolved, but I may've just resolved the most obvious ones and there's others still lurking in the background.
I've no obvious way of knowing without a dialog box popping up - I'm not technical enough to know how to find out.

Moving on, resetting Rosetta was producing no reaction in my Event log for several minutes, so I took the opportunity to review the detailed Security history in Norton.
There are lots of blocked transactions, the source of which is almost completely opaque.
Specifically: "Rule IGMP Public Blocked IGMP(2) traffic with (192.168.0.1)"
Over the years, Norton has hidden more and more under the bonnet, to the point where finding out what it's doing and why is increasingly hidden away.
I discovered this Rule IGMP is one of its default Traffic rules (and isn't 192.168.0.1 my own router?). I took the view it wasn't wise to change the rule in any way.
Trawling through other detailed settings I discovered Boinc blocked in its Sandbox section. Why or how, I don't know. I changed that to allow it.
I also ensured I'd properly whitelisted the whole C Boinc directory and E Boinc data directory.

Going back to Event log, after 30mins of nothing, a new Master file download succeeded. I suspect following my removal of Boinc from Norton's Sandbox block list
After 90 more minutes of attempts to download Rosetta tasks, finally I got some.
And they're running without crashing out. Success!

I should point out, throughout this period, I've been receiving and successfully running WCG tasks to completion, so my PC hasn't been idle.
Why, I don't know.
Whatever problems I've since discovered and resolved in Norton should've affected WCG tasks just as much as Rosetta. But clearly they didn't.
It''s a mystery I'm not going to get bogged down in. Rosetta and WCG are both running succesfully and i can depart for work for 3 days without having to worry about it.
I've now set WCG back to NNT to get Rosetta back in full flow.

Thanks for letting me bounce ideas off you guys. It genuinely did help. I'd got myself bogged down without the suggestion of a new route around the problem.
ID: 112793 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 157
Credit: 31,813,796
RAC: 102,358
Message 112794 - Posted: 12 Jun 2025, 12:46:56 UTC - in response to Message 112792.  



So I don't understand what is wrong.

S. Gaber



Me neither.

Does your hosts file look something like this?

127.0.0.1 localhost
127.0.1.1 Lynnes-Monolith
128.95.160.156 boinc-files.bakerlab.org
128.95.160.156 bwsrv1.bakerlab.org


# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
Proud member of the O.F.A. (Old Farts Association)
ID: 112794 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stevie G

Send message
Joined: 15 Dec 18
Posts: 129
Credit: 1,028,210
RAC: 581
Message 112795 - Posted: 12 Jun 2025, 16:58:51 UTC - in response to Message 112794.  



So I don't understand what is wrong.

S. Gaber



Me neither.

Does your hosts file look something like this?

127.0.0.1 localhost
127.0.1.1 Lynnes-Monolith
128.95.160.156 boinc-files.bakerlab.org
128.95.160.156 bwsrv1.bakerlab.org


# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters


I copied the host file you suggested and reset the project again.

Here's my event log:
6/12/2025 12:55:10 PM | Rosetta@home | Resetting project
6/12/2025 12:55:15 PM | Rosetta@home | Master file download succeeded
6/12/2025 12:55:20 PM | Rosetta@home | Sending scheduler request: To fetch work.
6/12/2025 12:55:20 PM | Rosetta@home | Requesting new tasks for CPU and AMD/ATI GPU
6/12/2025 12:55:21 PM | Rosetta@home | Scheduler request completed: got 0 new tasks
6/12/2025 12:55:21 PM | Rosetta@home | Server error: feeder not running
6/12/2025 12:55:21 PM | Rosetta@home | Project requested delay of 3600 seconds
ID: 112795 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1242
Credit: 14,421,737
RAC: 1
Message 112796 - Posted: 12 Jun 2025, 20:47:30 UTC - in response to Message 112795.  

This line usually indicates that the server you are trying to download work from is not running, so all you can do is wait for it to start running again:

6/12/2025 12:55:21 PM | Rosetta@home | Server error: feeder not running
ID: 112796 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 157
Credit: 31,813,796
RAC: 102,358
Message 112797 - Posted: 13 Jun 2025, 1:52:53 UTC - in response to Message 112795.  
Last modified: 13 Jun 2025, 1:57:00 UTC



So I don't understand what is wrong.

S. Gaber



Me neither.

Does your hosts file look something like this?

127.0.0.1 localhost
127.0.1.1 Lynnes-Monolith
128.95.160.156 boinc-files.bakerlab.org
128.95.160.156 bwsrv1.bakerlab.org


# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters


I copied the host file you suggested and reset the project again.

Here's my event log:
6/12/2025 12:55:10 PM | Rosetta@home | Resetting project
6/12/2025 12:55:15 PM | Rosetta@home | Master file download succeeded
6/12/2025 12:55:20 PM | Rosetta@home | Sending scheduler request: To fetch work.
6/12/2025 12:55:20 PM | Rosetta@home | Requesting new tasks for CPU and AMD/ATI GPU
6/12/2025 12:55:21 PM | Rosetta@home | Scheduler request completed: got 0 new tasks
6/12/2025 12:55:21 PM | Rosetta@home | Server error: feeder not running
6/12/2025 12:55:21 PM | Rosetta@home | Project requested delay of 3600 seconds


Now start this script from a command line window:

Windows script to keep running updates on Rosetta at Home.
From https://boinc.bakerlab.org/rosetta/show_user.php?userid=412375 aka: kotenok2000

cd /d c:Program FilesBOINC
:loop
boinccmd.exe --project https://boinc.bakerlab.org/rosetta/ update

TIMEOUT /T 600 
goto loop



I have had trouble with missing back slashes when trying to post this. There is a back slash between the c: and the "Program Files". And another between "Program Files" and BOINC.

And if your Boinc lives someplace else you need to change the drive letter and path to suit.

The reason you run this script is to more reliably get downloads from Rosetta.
Proud member of the O.F.A. (Old Farts Association)
ID: 112797 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2380
Credit: 45,183,670
RAC: 23,362
Message 112798 - Posted: 13 Jun 2025, 2:02:07 UTC - in response to Message 112792.  

But there are now six Rosetta tasks stuck in Ready to Report. Two of them have been there since Sunday and four all day Monday.

This is all very odd. It seems like your PC can't connect to the bakerlab servers again.

Six Rosetta tasks still stuck in Ready to Report, four days now... Update always results in "communication deferred"
I'm thinking of re-setting the project. Maybe that will jar the system into action.

I'm not inclined to think a project reset is going to make the bakerlab server any more reachable.
Can you confirm the few lines in your event log that result in communication being deferred? I'm assuming the server isn't reachable, but just to be sure.
And re-check your hosts file (without any extension - make doubly sure) is still as it should be.

I just don't get how you were able to contact the server to grab and return a few dozen tasks, then it becomes unreachable without something changing in between.
Anyone else with any ideas, pipe up.

I reset the project, rebooted, hit update several times per day and still get no tasks.

You remember the six completed task I had that were ready to send for several days? My account now reports them all "Timed out-- no response."

So I don't understand what is wrong.

I hear everything you're saying, but C:/Windows/System32/drivers/etc/hosts is not being read
That's 'hosts' with no extension - not .bak .old .txt .doc or anything else, just hosts
And not in any other directory - specifically the folder written above

For whatever reason that none of us seems to understand, Rosetta won't magically come back just by waiting
Something changed somewhere. You <must> have had it right to get tasks to come down, then it <must> have changed to stop connecting
And whatever the file is that you're editing now simply cannot be the one in that very specific folder

I know I'm writing this from a distance like I know better than you, but if you've repeatedly put the lines you've been given in the right file in the right place you simply wouldn't keep on receiving the message "Server error: feeder not running".
You might get other messages saying all sorts of things, but not that.
That's a line that says I'm not looking at the hosts file you keep editing.
ID: 112798 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 338 · 339 · 340 · 341

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2025 University of Washington
https://www.bakerlab.org