Task 1: Prostate Cancer Biochemical Recurrent Prediction - CHIMERA

Task Objective 🔍¶

This task focuses on predicting biochemical recurrence (BCR) in prostate cancer patients using multimodal data, including histopathology (H&E-stained whole slide images), multiparametric MRI (mpMRI), and clinical information. Participants are tasked with developing models that predict patient-level recurrence using the complete set of slides and multimodal data per patient to closely mimic the complexity of real-world prostatectomy specimens.

Patient Cohort 👨‍⚕️¶

All patients included in this study were diagnosed with prostate cancer and underwent radical prostatectomy. The following criteria were used for inclusion:

Multiparametric MRI (T2-weighted, ADC, and high b-value sequences) must be available, acquired after 2012
Surgery was performed before 2020 PSA levels at the time of surgery and at the time of recurrence must be available
Patients provided consent for the use of their data for scientific research

Data Modalities – Training Data 💾¶

1. Histopathology (WSIs)

Multiple WSIs per patient
Scanned using 3DHISTECH PANNORAMIC 1000 at 0.5 µm/pixel resolution
Foreground-background tissue masks are provided for preprocessing and efficient tissue region extraction

2. Imaging (mpMRI)

Data were acquired using Siemens 3T MRI scanners and included three MRI sequences:

T2-weighted imaging
High b-value diffusion (HBV)
Apparent diffusion coefficient (ADC) maps.
Prostate segmentation masks are also provided.

3. Clinical Data

Feature	Type	Description
Age at prostatectomy	Integer	Age at time of surgery
Primary Gleason grading	Ordinal (1–5)	Most prevalent histological pattern identified in the pathology slides.
Secondary Gleason grading	Ordinal (1–5)	Second most prevalent pattern observed
Tertiary Gleason grading	Ordinal (1–5) / NaN	A third, distinct histological pattern making up less than 5% of the tumor, potentially indicative of more aggressive behavior.
ISUP grading	Ordinal (1–5)	Based on Gleason score (EAU guidelines)
PSA level prior to surgery	Float (μg/L)	Prostate-specific antigen level measured before surgery. Levels below 0.10 μg/L (noted as "<0.10") are considered non-detectable. <0.10 = non-detectable
PSA level at recurrence	Float (μg/L)	PSA level measured at the time of biochemical recurrence. Values below 0.10 μg/L are considered non-detectable. <0.10 = non-detectable
Biochemical Recurrence status	Binary (0 = no, 1 = yes)	Defined as any post surgery PSA value≥ 0.1 μg/L.
pT-staging	String	Pathological staging of the tumor extent within and beyond the prostate tissue based on histopathological assessment. For further details regarding the values we refer you to the European Association of Urology guidelines section 4.1.
Lymph node invasion	Binary and String (0/1/"x")	Cancer in resected lymph node. (x = lymph nodes were not removed during surgery.)
Capsular penetration	Binary and String (0/1/"unknown")	Indicates whether the tumor has penetrated the prostate capsule. ("unknown" = not identified during pathological review)
Positive surgical margins	Integer (0/1/2)	Indicates whether cancer cells are present at the inked surgical margin. (2 = pathologist could not asses)
Seminal vesicle invasion	Binary and String (0/1/"x")	Indicates whether the tumor has invaded the seminal vesicles. (x = Seminal vesicles were not removed during surgery)
Lymphovascular invasion	Binary (0/1)	Indicates presence of cancer cells within lymphatic or blood vessels.
Earlier therapy	String	Prior prostate cancer treatment
Reference Standard
Time to last follow-up / BCR	Float (months)	BCR cases: number of months between surgery and recurrence. Non-BCR cases: number of months between surgery and the most recent PSA measurement.

Evaluation Metric 📊¶

Model performance is evaluated using the censored concordance index (C-index). This metric measures the proportion of all comparable patient pairs where the model correctly predicts the ordering of outcomes.

Two patients are compared if:

Both experienced the event (e.g., recurrence) at different times, or
One experienced the event, and the other was event-free, but with a longer observed follow-up time

A pair of patients is not comparable if both patients experienced the event at the same time.

A pair is considered concordant if the patient with the higher predicted risk score has a shorter actual survival time. In other words, the model correctly orders the two patients in terms of risk.

The C-index ranges from: 0.5 → random predictions 1.0 → perfect concordance

The complete evaluation pipeline, including code for computing the censored concordance index, will be made publicly available to ensure transparency and reproducibility.

Download Training Data¶

Install AWS CLI https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
Bucket name: s3://chimera-challenge/task1/
Command line: aws s3 sync --no-sign-request s3://chimera-challenge/v2/task1/ <destination_path>

Public Data 🌐¶

We strongly recommend exploring the LEOPARD Challenge on the Grand Challenge platform, which focused on predicting time to biochemical recurrence using H&E-stained whole-slide images (WSIs). This dataset can be valuable for both training and pre-training your algorithm.

For additional pre-training purposes, consider the following datasets:

PI-CAI: Prostate Imaging—Artificial Intelligence Challenge for prostate cancer detection in MRI.

PANDA: Prostate cANcer graDe Assessment Challenge focused on automated grading of prostate cancer in WSIs.

TCGA-PRAD: The Cancer Genome Atlas Prostate Adenocarcinoma Collection, which includes histopathology images and corresponding genomic data.