Task 3 – Bladder Cancer Recurrence Prediction - CHIMERA

Task Objective¶

This task extends Task 2 by integrating histopathology and transcriptomics to predict recurrence in HR-NMIBC patients. The aim is to model patient-level time-to-recurrence using both morphological and molecular data. Schematic overview of the multimodal prediction pipeline. Histopathology, RNA-seq and clinical data are encoded using pretrained networks and combined for prediction. Please note that RNA-seq data is derived from a selected tumor region within the histopathology slide.

Evaluation Metric¶

Model performance is evaluated using the censored concordance index (C-index). This metric measures the proportion of all comparable patient pairs where the model correctly predicts the ordering of outcomes.
Two patients are considered comparable if:
- Both experienced the event (e.g., recurrence) at different times, or
- One experienced the event, and the other was event-free, but with a longer observed follow-up time
A pair is not comparable if both patients experienced the event at the same time.
A pair is considered concordant if the patient with the higher predicted risk score has a shorter actual survival time. In other words, the model correctly orders the two patients in terms of risk.
The C-index ranges from:
- 0.5 → random predictions
- 1.0 → perfect concordance

The complete evaluation pipeline, including code for computing the censored concordance index, will be made publicly available to ensure transparency and reproducibility.

Data Details¶

Training Data¶

• 🧠 Histopathology: A single H&E-stained whole slide image (WSI) per patient, with 0.25 µm/pixel resolution at its highest resolution. Note that this WSI is either of an adjacent section of the H&E slide used for bulk RNA-seq, the same H&E slide with a punched cavity on the tissue section, or an H&E slide of another tumor of the same patient.

• 🧠 Histopathology: Binary tissue mask outlining the tissue section

• 🧬 Transcriptomics: Bulk RNA-seq data extracted from selected tumor regions, normalized using DESeq2

• 📋 Clinical Data: Same variables as Task 2.

Feature	Type / Values	Description
age	Integer (years)	Age of the patient in years
sex	Male / Female	Biological sex of the patient
smoking	Yes / No	Smoking history
tumor	Primary / Recurrence	Indicates whether the tumor is primary or recurrent
stage	TaHG / T1HG / T2HG	Tumor stage: Ta (inner lining), T1 (connective tissue), T2 (muscle invasion); all high-grade
substage	T1m / T1e	T1m: ≤ 0.5mm invasion; T1e: > 0.5mm invasion
grade	G2 / G3	G2: moderately differentiated; G3: poorly differentiated
reTUR	Yes / No	Re-transurethral resection (TUR) performed before BCG induction
LVI	Yes / No	Lymphovascular invasion observed on H&E slide
variant	UCC / UCC + Variant	Urothelial carcinoma alone or with variant histology
EORTC	High risk / Highest risk	European Organization for Research and Treatment of Cancer (EORTC) risk classification
no_instillations	Integer	Total number of BCG instillations. "-1" indicates missing data.
BRS	BRS1 / BRS2 / BRS3	Biomarker-derived BCG response subtype from RNA-seq
Reference Standard
progression	0 / 1	Progression to advanced disease (1-true/0-false)
time_to_HG_recur_or_FUend	Float (months)	Time to high-grade recurrence or end of follow-up in months
Additional information (not used in evaluation/test)
HG_recur_BCG_failure	0 / 1	BCG failure (1-true/0-false)
time_to_prog_or_FUend	Float (months)	Time to progression or end of follow-up in months
time_to_FUend	Float (months)	Time to end of follow-up in months

Data versions¶

v1¶

126 paired multimodal training data (_HE.tif, _HE_mask.tif, _CD.json, and _RNA.json).
Contains incorrect histopathology slides and/or tissue mask (_HE.tif, _HE_mask.tif):
- Corrupted: 3A_024,
- Incorrect spacings: 3A_017, 3A_031, 3A_042, 3A_050, 3A_141, 3A_143, and 3A_157
- Scan is out of focus, resulting in failed tissue segmentation: 3A_025 (blank mask)

v2¶

176 paired multimodal training data (_HE.tif, _HE_mask.tif, _CD.json, and _RNA.json). Note that all the clinical data (_CD.json) files have been updated to reflect the features used in validation and test.
These slides are fixed: 3A_024, 3A_017, 3A_031, 3A_042, 3A_050, 3A_141, 3A_143, and 3A_157. 3A_025 slide which is out of focus, however, cannot be fixed.
Added extra materials to support model training:
- Feature embeddings (*.pt) and
- Coordinate of the patches (*.npy), where both are
  - extracted on the _HE.tif and _HE_mask.tif of each case,
  - using UNI at 0.25mpp resolution with 224×224 patch size, and
  - using slide2vec
Note: The newly added 50 RNA-seq files (_RNA.json ) each begins with a prefix "3B", which indicates the source "Cohort B" as described in [1]. These Cohort B samples were sequenced using a different protocol than those from Cohort A (labeled with a prefix "3A"). Please note that no batch effect adjustment was performed on these 2 cohorts.
UPDATE on 2025-06-08: The recently released V2 dataset contains clinical data files (_CD.json) with a missing parameter ("progression"). The clinical data files have been corrected and reuploaded to AWS, ready to be downloaded via the AWS command line below.

Download Training Data¶

Instruction (latest version)¶

Install AWS CLI https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
Bucket name: s3://chimera-challenge/v2/task3/
Command line: aws s3 sync --no-sign-request s3://chimera-challenge/v2/task3/ <destination_path>

Bucket structure (latest version)¶

v2/
  task3/
    data/ 
      task3_quality_control.csv 
      {patient_id}/
        {patient_id_CD.json}
        {patient_id_HE.tif}
        {patient_id_HE_mask.tif}
        {patient_id_RNA.json}
    features/
      coordinates/
        {patient_id_HE.npy}
        {patient_id_HE.npy}
        {patient_id_HE.npy}
      features/
        {patient_id_HE.pt}
        {patient_id_HE.pt}
        {patient_id_HE.pt}

Reference¶

de Jong FC, Laajala TD, Hoedemaeker RF, Jordan KR, van der Made AC, Boevé ER, van der Schoot DK, Nieuwkamer B, Janssen EA, Mahmoudi T, Boormans JL. Non–muscle-invasive bladder cancer molecular subtypes predict differential response to intravesical Bacillus Calmette-Guérin. Science translational medicine. 2023 May 24;15(697):eabn4118.