Key Takeaways:
- KANs (Kernel Activation Networks) improve predictive performance and interpretability for enzyme commission number prediction.
- KANs identify existing motif sites using a proposed interpretation strategy, providing trustworthiness in predictions.
- Pruning KANs can optimize architecture and improve predictive performance while reducing the number of parameters.
- The proposed KAN interpretation strategy outperforms the current state-of-the-art interpretation strategy, ECPICK.
- KANs can discover unknown motif sites within enzyme sequences, highlighting their potential for advancing enzyme function prediction.
Introduction to KANs and Enzyme Commission Numbers
The study explores the use of Kernel Activation Networks (KANs) to improve predictive performance and interpretability for enzyme commission number prediction. Enzyme commission numbers are a classification system used to categorize enzymes based on the reactions they catalyze. The authors evaluated three state-of-the-art models, including convolutional neural network-based architectures, attention-based architectures, and large language model-based architectures, with and without KAN integration. As the authors noted, "KANs yield consistent micro-level gains and substantial macro-level gains across architectures."
Dataset and Experimental Setup
The dataset used in the study consisted of protein sequences from Swiss-Prot and the Protein Data Bank (PDB), with over 200,000 sequences and approximately two-thirds from Swiss-Prot and one-third from PDB. The authors removed redundancy by eliminating exact duplicates and required complete four-level EC annotations. The models were trained on a training set, validated on a validation set, and tested on a test set. The authors also evaluated the models on a dedicated low-similarity test set to avoid performance overestimation. As the authors stated, "We evaluated the models by considering only low-similarity test set, where proteins shared at most 50% sequence identity and at most 80% coverage with the training data."
Predictive Performance of KAN-Integrated Models
The results showed that KAN integration improved predictive performance across all models and EC levels. DeepECtransformer, for example, showed relative gains in micro-averaged F1 scores of 15.4% (level 1), 15.7% (level 2), 14.6% (level 3), and 10.2% (level 4) over the MLP variant. The authors noted that "KANs consistently improved predictive performance across all EC levels." The macro-averaged F1 score was also enhanced by KAN integration, with DeepECtransformer improving by 13.3%, 19.2%, 34.2%, and 24.2% at levels 1-4.
Interpretation Strategy for KANs
The authors proposed an interpretation strategy specifically designed for KANs, which quantifies the contribution of amino acids in the input sequence to the model’s predictions. The strategy was validated by comparing the identified amino acids with well-characterized motif sites in enzyme sequences. The authors found that the proposed KAN interpretation method successfully identified both the oxygen binding motif site and hem-binding motif sites of the bacterial CYP106A2 family. As the authors stated, "The proposed KAN interpretation strategy highlights biologically relevant motifs and achieves higher overlap with annotated motif sites compared to ECPICK."
Pruning for Architecture Optimization
The authors conducted an additional experiment to evaluate the pruning strategy for KANs. The results showed that pruning can optimize the architecture and improve predictive performance while reducing the number of parameters. The best model achieved a macro-averaged F1 score of 82% and contains 12,020,528 parameters, whereas the unpruned model yielded a macro-averaged F1 score of 81.46% with 16,429,184 parameters. The authors noted that "pruning reduces the size of the network without retraining the model."
Conclusion
In conclusion, the study demonstrates the effectiveness of KANs in improving predictive performance and interpretability for enzyme commission number prediction. The proposed KAN interpretation strategy provides trustworthiness in predictions and can discover unknown motif sites within enzyme sequences. Pruning KANs can optimize architecture and improve predictive performance while reducing the number of parameters. As the authors stated, "The proposed KAN interpretation strategy provides trustworthiness in prediction by identifying existing motif sites related to the enzyme function and could potentially discover unknown motif sites within enzyme sequences."
https://www.nature.com/articles/s44387-025-00059-x

