Other projects.

Author	Message
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1938 Credit: 18,534,891 RAC: 0	Message 113579 - Posted: 24 May 2026, 5:30:23 UTC - in response to Message 113578. So...ahhh...what's happening in the world of Rosetta? The main page Server Status hasn't updated since the 17th. Grant Darwin NT ID: 113579 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2587 Credit: 47,220,881 RAC: 13	Message 113580 - Posted: 26 May 2026, 8:39:38 UTC - in response to Message 113579. So...ahhh...what's happening in the world of Rosetta? The main page Server Status hasn't updated since the 17th. I didn't realise - mainly through not looking But going by what's reported at Boincstats, it's still within 9 tasks of the level it's really at (0) ID: 113580 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2587 Credit: 47,220,881 RAC: 13	Message 113594 - Posted: 16 Jun 2026, 8:32:37 UTC I found an update in the only WCG forum I visit Jun 15, 2026 Here is what I've been doing, and an official post on the lab website "Operational Status" tab should follow soon: 1. Docker/Podman/Virtual Box workunits are now circulating in small batches in the beta30 project, some running on the cancer datasets for Lung and Ovarian using highest scoring large signatures to seed the search space for the optimizer in a promising direction, and with this capability now working I am also going to distribute our Mojo/Modular build for beta30 so we can see if this angle to get general GPU support is going to work (https://mojolang.org/docs/requirements/#gpu-compatibility) 2. The results API switch is currently running in "shadow mode" in production, which just means for every request going to the legacy database another is being sent to the citus coordinator so I can assess parity, get some performance numbers, size the connection pools and limits, and see if Websphere is going to crash for some reason or other which it already did when I first rolled this out. I am going to make this switch soon, the hope is that the massive archive of results that will simply timeout the API right now if large enough will be viewable even for the largest results sets, and that an accurate picture of workunits IN_PROGRESS and otherwise will henceforth be visible on the website and reflect what is currently running on volunteer devices. I want to move the stats rollup, the stats dump, and Results API to citus too, but these are all scary and invasive overhauls, so we'll see if just the Results page can be fixed this way first before tackling the chronic under-reporting of the stats rollup and stats dump by switching out the database. 3. Bulk of the incorrectly "floor" 14.0 credited workunits have been recalculated, seems to be some stragglers and I will do another pass to look for them. Also, the aborts of `_0` and `_1` workunits and many cached resends should start to dissipate this week, and my hope is that a more typical distribution and ratio should fall into place for MCM1 as that happens. I pushed a feeder build with a guard against assigning already canonicalized results, as it seems resends got ahead of the `_0` and `_1` populations? I'm still looking into what I did that caused this. My expectation is that somewhere in the multiple repair passes on results, or when I created additional resends manually by bumping up target_nresults, or when to relieve memory pressure on the nodes and handle 404s from the transitioner bug, server crashes, and botched repair operations, I added a service that sits behind apache, consults the corresponding batch plan protobuf "schema", and just templates out workunits based on the mcm1_create_work code, I managed to create this problem. 4. MCM1 results are not going into a black hole. We run a batch conversion to the Parquet file format for each 24h window (find uploads in the in-memory cache with mtime > tail of previous "assimilated" listing), and those Parquet files go into the "Ducklake" (https://ducklake.select/) data lake. I can query with SQL using DuckDB every single signature uploaded since the move to Nibi with standard SQL. The conversion code is based on the new mcm1_validator_assimilator code that actually unrolls the signatures and compares their contents. I used this data store to grab some of the best 100 gene signatures to test MDMG, the new LibTorch MAM1 application, on Ovarian in the staging environment and it worked and found some decent shorter signatures right away. Also, to "fall through" to searching for missing results that aren't in the local in-memory cache on a node anymore, and to restore correct credit after introducing a disastrous credit amplification bug. We intend to merge the historical results from Jurisica lab servers into this dataset at Nibi, once the S3 object storage feature is made available to us, and expand in multiple directions from there. Using the top historical signatures to seed the search space of MDMG batches, given the articles we publish already establish promising results identified by our lab scientists in the MCM1 data as is, we expect to get our best results to date. ID: 113594 · Rating: 0 · rate: / Reply Quote