SAEBER: Sparse Autoencoders for Biological Entity Risk

What do protein model brains know about the virulence of pathogens?

We trained Sparse Autoencoders (SAEs) and Logistic Regression classifiers on activations (intermediate calculations) from RF3 and RFDiffusion3 , leading open source protein folding and design models. Then we asked them to classify viral hazards in SafeProtein from length-matched UniProt benigns, and tell us which SAE features are correlated with viruses and toxins.

We also controlled for paralog memorization with homologous clustering using mmSeq2 and sequence length, just in case.

Explore the data and features below!

Feature Visualizer

PyMOL renders of the top hazard- and benign-firing SAE features. Red sticks mark the residues with the highest activation per design. Click a feature on the left to load its renders.

Feature #60

2 renders · block12

Probe AUROC, random vs homology-clustered split

Five-fold stratified cross validation, with stop_overfit hyperparameters held constant. Cluster splits drop AUROC ~0.10 on RFD3, near zero on RF3.

Raw activations

SAE-encoded

Overfitting gap

train AUROC − eval AUROC, RFD3 cells

SAE vs raw activations AUROC

positive means SAE generalizes better

Which SAE features fire on hazards?

Univariate AUROC against the full label set, BH-FDR corrected. RFD3 block12 carries the cleanest hazard signal, with feature #639 standing out at AUROC 0.81.