Virchow2G-mini
Virchow2G-mini
Version: 1
PaigeLast updated May 2025
Virchow2G Mini is a distilled, lightweight vision transformer model derived from Virchow2G, designed to deliver high-performance pathology insights with exceptional computational efficiency. Trained on 3.1 million whole slide histopathology images, it serves as a tile-level feature extractor (frozen or finetuned) suitable for a wide range of downstream computational pathology applications. It supports both hematoxylin and eosin (H&E) and immunohistochemistry (IHC) stained slides, enhancing its versatility across various pathology tasks. Despite its compact size, Virchow2G Mini achieves performance comparable to larger models, making it ideal for high-throughput workflows and resource-constrained environments. Virchow2G Mini is based on a ViT-S/14 architecture with 22 million parameters. It processes input images at a native size of 224×224 pixels with a 14×14 patch size. While it omits some of the advanced architectural components of its predecessor—such as SwiGLU activations, LayerScale stabilization, and register tokens—to maximize inference efficiency, it retains robust representational capabilities. The model was distilled from Virchow2G using a modified DINOv2 self-supervised objective, replacing the original KoLeo regularizer with a kernel density estimator and adopting an extended context translation augmentation strategy. Training involved sampling tiles across four magnifications (5x, 10x, 20x, and 40x) and utilized mixed precision (fp16) to optimize efficiency while preserving accuracy. Virchow2G Mini is designed to support flexible downstream fine-tuning or prompting for varied pathology tasks.

Intended Use

Primary Use Cases

Virchow2/2G/2G-mini serve as powerful and flexible tile-level feature extractors for a wide range of computational pathology applications. Their ability to generate high-quality embeddings from diverse tissue types, magnifications, and staining protocols enables efficient development of AI models across diagnostic, prognostic, and research use cases.
  1. Cancer Detection and Subtyping
    The models can be used to power AI systems that detect cancer presence, classify tumor subtypes, and distinguish benign from malignant tissue across multiple organ types.
  2. Prognosis and Biomarker Discovery
    Virchow2 family models can support prognosis prediction tasks, such as survival outcome prediction or recurrence risk stratification, by extracting rich histomorphologic features correlated with patient outcomes.
  3. Cellularity and Tissue Composition Analysis
    The models can be adapted for cellularity quantification tasks, such as estimating tumor cellularity, tumor-infiltrating lymphocyte (TIL) density, or stromal composition.

Out-of-Scope Use Cases

  1. Whole Slide Image (WSI)-level Inference Without Aggregation
    Virchow2 models operate at the tile level (224×224 pixel crops). They do not directly perform WSI-level diagnosis or prediction unless downstream aggregation (e.g., pooling or attention-based aggregation) is applied.
  2. Cell Segmentation or Instance Detection
    These models are not designed for precise cell-level segmentation without further development.
  3. Real-time Clinical Decision-Making Without Validation
    Although pretrained on a large, diverse dataset, Virchow2 models are not intended for real-time clinical use without additional downstream fine-tuning, task-specific validation, regulatory review, and deployment controls.
  4. Unstained or Non-Pathology Images
    The models are trained specifically on stained histopathology images (H&E, IHC). They are not suitable for unstained images, other imaging modalities (e.g., MRI, CT), or non-biomedical domains without retraining.
Getting Started See steps on how to use this model https://github.com/Paige-AI/paige-ml-sdk/blob/main/examples/azure_ml_example.ipynb

Responsible AI Considerations

Virchow2, Virchow2G, and Virchow2G Mini were developed with several responsible AI considerations in mind. First, the pretraining dataset included 3.1 million whole slide images from over 225,000 patients, sourced from globally diverse institutions across all continents, representing a broad range of tissue types, staining protocols (H&E and IHC), disease states, and scanner variations. This dataset diversity was critical to minimize bias, improve generalizability, and ensure that the learned representations would be robust across different patient populations and clinical settings. Additionally, no patient-identifiable information was included in the training data, and all data handling complied with institutional review board (IRB) standards and privacy regulations. To further promote model stability and fairness, the training pipeline incorporated balanced sampling strategies across tissue types, diagnoses, and magnifications, preventing overfitting to overrepresented conditions. Domain-specific algorithmic adjustments, such as the use of extended-context translation augmentation and kernel density regularization, were introduced to better preserve critical histologic features and avoid artifacts that could otherwise introduce unintended model biases. Together, these design choices aimed to support ethical, equitable, and clinically robust AI development for digital pathology applications.

Training Data

The models were trained on a dataset of 3.1 million whole slide images (WSIs) sourced from over 225,000 patients across a mix of internal (Memorial Sloan Kettering Cancer Center) and global external institutions. The data covered a wide range of tissue types, disease states (benign, precursor, malignant), and staining protocols (both H&E and IHC). Training tiles were sampled at multiple magnifications (5x, 10x, 20x, and 40x) to support learning across resolutions.
Tile extraction involved sampling 392×392 pixel regions with high tissue content, from which 224×224 crops were taken during training. Balancing strategies were applied across tissue types, stain types, diagnoses, and magnifications to promote data diversity and avoid over-representation of common conditions.
Model Specifications
Context Length2048
LicenseCustom
Training DataSept 2024
Last UpdatedMay 2025
Input TypeImage
Output TypeText
ProviderPaige
Languages1 Language