Beyond newbie Q&A

Message boards : Number crunching : Beyond newbie Q&A

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5

AuthorMessage
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 411
Credit: 12,359,416
RAC: 3,742
Message 110260 - Posted: 16 Dec 2024, 11:07:07 UTC - in response to Message 110256.  

The number of structures or decoys or whatever done within that one task isn't relevant.


If it's not relevant to the final outcome, shouldn't we contribute the fewest possible structures?


It is not relevant to the counting of tasks, it is relevant to the science and to the researcher.

The researcher will set the default task time to, roughly, generate the optimum number of decoys ( some 8 hour tasks, some 3 hours) so go with the flow.
ID: 110260 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2184
Credit: 41,726,991
RAC: 6,784
Message 110261 - Posted: 16 Dec 2024, 11:36:51 UTC - in response to Message 110256.  

The number of structures or decoys or whatever done within that one task isn't relevant.

If it's not relevant to the final outcome, shouldn't we contribute the fewest possible structures?

I skim some conversations and at some later point read them properly and wonder how we got here.
This one is a good example.
I'm not sure you've been well-served by the replies you've had up to now, so it's no wonder you're not happy with them and still have questions.
Forget what you've been told up to now.

At one time we had a Forum Moderator on here named Mod.Sense who wasn't part of the project but had a really good understanding of how this place works and helped us a lot until he departed 4yrs ago.
I miss his contributions.

Looking him up, conveniently, his last post answered this very question here, from which:
Note that not all tasks can be completed in two hours. With such a short runtime preference, you are more likely to see tasks running longer than the preference. When you look at credits, you really must consider the amount of actual CPU time, not the number of work units, and not just the runtime preference.

There are no "missing results". So, set your preferences in a way that works for you and your machine.

If you use Dr. Baker's analogy of exploring a planet's surface for the highest or lowest elevation on the planet, then each model is one of the explorers. They start their exploration from a random point on the planet. When a work unit has enough time to begin another model, that next model will be started at another random point on the planet, with no regard to the first model or what it found. If you drop 10,000 explorers on the planet, your success in finding the true highest or lowest elevation would essentially be proportional to the surface area of the planet. If 10,000 explorers are adequate for Mars, you might need 100,000 for Saturn. So, when they feel they have a Saturn-sized protein for study, they might create more work units. But, as you point out, they have no way to predict exactly how many models will result. If they approach the end of work units coming back in and still only have 80,000 results, then they create more work units to obtain the 100,000 results desired.

Having said that, once they see the results, they can sometimes give hints to future explorers, or essentially drop more of them near the Himalayas. So they might create a secondary batch of work units, which are designed to concentrate the focus based on what was learned in the first round.

At the risk of completely misunderstanding this and ruining it by explaining further...
They have a protein to solve and they break it up into 10s or 100s of thousands of jobs that appear as jobs queued on the front page (obviously this may be the grand total of several researchers' jobs).
For each one we download we individually set a runtime, with a usual default of 8hrs but we can adapt that for ourselves.
Each task, I think, is seeded with a random starting point and runs to completion, which I think is called a 'decoy', and if there's sufficient runtime left another random starting point is seeded into the same task to see if there's a different outcome in that overall space. I think the random starting point is why no one decoy from one person is comparable to any other decoy from the same host or any other host. The more goes had at this the more representative it'll be that the sampling reflects what's in that space.
And then, when the results get returned, and all the decoys from all the jobs are stitched back together, they get a fuller representation of the results they want to see. And either that confirms what they expected or it doesn't and they can fine-tune their next batch of work in their next iteration in the more promising areas.

Does this help any, or have I made matters worse?
Anyway, it seems to me that default 8hr runtimes provide sufficient sampling of what they're researching to be confident of what's coming back to them.
Personally I run for 12hrs to keep my PCs more occupied with Rosetta tasks, which is fine, but I don't think there's any benefit in running any shorter than the default 8hrs and it may even be undersampling for the project's purposes.

At various times for a variety of reasons I've pushed for the removal of the shortest user-defined runtimes as it wastes tasks that others could run more fully and may degrade the quality of the information returned as well as meaning more unnecessary server hits.

8hr default runtimes should always be adhered to unless the user has a specific individual problem they're trying to solve for a short period of time before reverting to 8hrs. So, definitely not 2, 3 4 or 6hrs
Longer runtimes are fine as long as deadlines are hit, will help in extending task availability for users at the same time, while reducing server hits.
ID: 110261 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dayle

Send message
Joined: 6 Jan 14
Posts: 17
Credit: 909,023
RAC: 1,014
Message 110270 - Posted: 16 Dec 2024, 21:14:35 UTC - in response to Message 110261.  
Last modified: 16 Dec 2024, 21:15:25 UTC

Thank you! So much more helpful - that was 90% of what I was looking for from the start.

What's missing now is confirmation that your assumptions are correct, IE are eight hours scientifically proficient or is it the most they felt comfortable asking of any one volunteer by default?

If they're looking for a certain number of models per simulation, they really should have a system in place to prevent excess computations, especially on newer devices with high single-core speed.

If they want a broad survey in the beginning and something specific later, the requested quantity of work units per model might shift. Whatever they want, they know it internally. How difficult would it be for the team to post the goal number on the news tab with each release of work units?
ID: 110270 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2184
Credit: 41,726,991
RAC: 6,784
Message 110271 - Posted: 17 Dec 2024, 0:10:36 UTC - in response to Message 110270.  

Thank you! So much more helpful - that was 90% of what I was looking for from the start.

Good. I think I got a bit muddled part-way through, but you saw through my bad writing.
And re-reading previous replies I think I disparaged Bryn and Grant's replies too much as they weren't wrong, just not sufficiently clear.

What's missing now is confirmation that your assumptions are correct, IE are eight hours scientifically proficient or is it the most they felt comfortable asking of any one volunteer by default?

If they're looking for a certain number of models per simulation, they really should have a system in place to prevent excess computations, especially on newer devices with high single-core speed.

Mod.Sense referred to Dr Baker's analogy in the quote above and David Baker's just won the Nobel Prize.
If his understanding isn't correct, I don't think there's anyone on the planet better placed to take a view, so I'm pretty sure you can take that to the bank.

Wrt default runtimes, there's a thread that started many years ago talking about extending runtimes from the original 3hrs to 4hrs, then to 6hrs - none of which came to a definitive conclusion - before the project unilaterally changed the default runtime to 8hrs. There was a brief moment (less than 24hrs) more recently when they increased the runtime to 16hrs but they withdrew that very quickly, so 8hrs remains the default.
My guess (only that) is they'll make an assumption on the speed of the machines of hosts/users to estimate the number of decoys they get back from any batch they issue and over time (decades) that led to the increased runtime from 3 to 4 to 6 to the current 8hrs.
Tbh this is an internal evaluation within the project and its researchers and nothing to do with the likes of us.
On the, now, very rare occasions I've communicated with the project team, it's been made <very> clear to me that the tail here does not wag the dog.

When it comes to user settings exceeding the default runtime, effectively the project gets more returns than the minimum they require and increases resolution of the results from a specific task.
In the days when there were plentiful supplies of tasks - like 20million issued at a time rather than 800k or 1.5m at best nowadays, I always used the default runtime to return adequate results from each task and get to the next one as soon as possible.
More recently, when tasks have been few and far between, I've slightly increased my runtimes to 12hrs to increase the number of decoys returned, but only as long as I return them within deadlines.
Meeting deadlines is always essential. I'm not aware there's such a thing as 'excess computations' when we're only ever sampling within the space of a task, as long as we meet task deadlines.

If they want a broad survey in the beginning and something specific later, the requested quantity of work units per model might shift. Whatever they want, they know it internally. How difficult would it be for the team to post the goal number on the news tab with each release of work units?

Unlike other projects, the labs that submit tasks through Rosetta do look at what comes back and will account for that in their next batch of work.
In the early days of Covid, when they repurposed the project to address the challenge of that time, a researcher did post here to remark on the speed, volume and quality of the data being returned and how they adapted the next batch to reflect their learnings. That's part of the reason why I'll never leave here as long as there's some demand for our contribution.
In terms of the project giving us some statistical info on each batch goes, I'm genuinely not seeing that as anything they'd waste a single second of their day on, given all the other things they could do with their time.
People in the forums may not like that, but I'm just being realistic. Not going to happen.
They don't award Nobel Prizes for responsiveness to random Forum requests - let's be real about this.

And I recall at one time being told the number of people involved in task creation and issue was perhaps surprisingly small, so while it would be good to have all the niceties covered, it's just not going to happen on the most basic level, so I'd advise everyone to recalibrate their expectation levels.
Like, just consider that it takes days to notice a server has fallen over. Tbh it's probably researchers asking where their results are that causes it to get rebooted, which is more probably why it takes 2 days out of a 3-day deadline to notice anything that's happened.
I mean, I'd like to think someone reads my email bumps, but they probably don't. I'm probably on a block-list, more likely!
ID: 110271 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5

Message boards : Number crunching : Beyond newbie Q&A



©2025 University of Washington
https://www.bakerlab.org