Vision Transformers for Analyzing High-Resolution Pathology Images

1 minute read

Published:

Master Thesis
Transformers are powerful models that can capture long-range dependencies between data using an attention mechanism. However, applying transformers to medical problems poses challenges due to the high-resolution and complexity of pathology images, as well as the scarcity and noise of labels. In this thesis, we have provided a comprehensive overview of the state-of-the- art transformers method for analyzing high-resolution pathology images, using the glioblastoma dataset IvyGAP and the renal cancer dataset as case studies. We have discussed how to pre- train transformers using self-supervised learning methods that can learn useful representations from unlabeled data. We have also explored how to find regions of interest within a whole slide image using different levels of supervision: self-supervised, weakly supervised and strongly supervised. We have demonstrated that transformers can provide a semantic understanding of the data and outperform convolutional-based models on downstream tasks such as classification and segmentation. Finally, we employ posterior networks to estimate the aleatoric and epistemic uncertainty of ViT predictions and evaluate their usefulness for clinical decision making. We propose a novel method to filter potentially mislabeled data in order to make more accurate and confident predictions. Our experiments show that our methods achieve state-of-the-art results on various glioblastoma tasks and provide meaningful insights into the behavior and limitations of ViTs for medical machine learning.

Github Thesis Presentation