Quantitative Methods 2017
Quantitative Methods in Radiation Oncology: Models, Trails And Clinical Outcomes
2017 ESTRO SCHOOL LIVE COURSE
Hypothesis testing
Outline
• Introduction • Statistical inference
• Assumptions, potential, limitations
• Quantifying uncertainty • Parametric vs. non-parametric statistics • Breakdown of basic assumptions • Multiple testing and post-hoc hypothesis
Why statistics?
• Answer questions where measurements are subject to random variation:
• Is the improvement in local control a coincidence or a ‘real’ improvement?
• Is the decrease in toxicity a coincidence or a ‘real’ improvement?
• Is the improvement in plan conformity a coincidence or a ‘real’ improvement?
Statistical inference: basic assumption
• Improbable events do not occur
Improbable events do not occur
• Assume a null hypothesis
• Find probability of same or ‘more extreme’ result by chance • p-value
• If the probability is small, we conclude that the null hypothesis is wrong
Example: Contingency tables
Observed
Toxicity
No Toxicity Total
Treatment A 30
70
100
Treatment B 40
60
100
Total
70
130
-
Assume null hypothesis: Equal toxicity
Expected
Toxicity
No Toxicity Total
Treatment A
100
Treatment B
100
Total
70
130
-
Example: Contingency tables
Observed
Toxicity
No Toxicity Total
Treatment A 30
70
100
Treatment B 40
60
100
Total
70
130
-
Assume null hypothesis: Equal toxicity
Expected
Toxicity
No Toxicity Total
Treatment A 35
65
100
Treatment B 35
65
100
Total
70
130
-
Example: Contingency tables
Observed
Toxicity
No Toxicity Total
Treatment A 30
70
100
Treatment B 40
60
100
Total
70
130
-
Expected
Toxicity
No Toxicity Total
Treatment A 35
65
100
Treatment B 35
65
100
Total
70
130
-
P=0.14
=2.2
http://www.medcalc.org/manual/chi-square-table.php
http://statpages.org/ctab2x2.html
Limitations and caveats
• The opposite is not true: • Two treatment arms are not proven equal by showing that the p-value is high! • Statistics can only reject a null hypothesis – not prove it
Statistical power
• Number of patients needed to see effect
Toxicity
No toxicity
New technique 5
15
p=0.45
15 T e p value depends on - effect size - and sample size. No toxicity Toxicity
Old technique 10
New technique 20
60
p=0.05
Old technique 40
60
Confidence intervals
• More patients increase reliability, decrease uncertainty
Toxicity
No toxicity Proportion
p=0.45
New technique Old technique
5
15
25% (9-49%)
10
15
40% (21-61%)
Toxicity
No toxicity Proportion
p=0.05
New technique Old technique
20
60
25% (16-36%)
40
60
40% (30-50%)
Quantifying uncertainty: confidence intervals • A point estimate alone is meaningless! • We need to assess the uncertainty
• Example:
Nutting et al Lancet Oncol 2011
Non-inferiority studies
• It is impossible to show equivalence • But it is possible to reject a hypothesis of inferiority • By omitting (part of) the radiation, the local control does not decrease more than xx%
• Non-inferiority studies aim at a sufficiently narrow CI to rule out a clinically relevant detriment
A note of caution
50 Gy/25 fractions
Non-inferiority studies
Standard superiority study
48 Gy/24 fractions
46 Gy/23 fractions
2 Gy/1 fraction
No radiotherapy
Parametric vs. non- parametric statistics
Parametric statistics
• Assume a parametric distribution of data • Make test based on this distribution
• Examples:
• Paired and unpaired T-test • Chi2 test for contingency tables • Binomial tests • Regression models • Cox proportional hazards
What is under the hood of your statistical software? • Compare two normally distributed series with equal variance:
x
- x
1
2
t =
Test quantity. Note common variance
1
1 2 ( 1 n 1
s
+
)
n
2
e = 2[1 - F
( t )]
t ( df 1
+ df 2 )
Two sided test
Cummulative distribution function.
What is under the hood of your statistical software? • Compare two normally distributed series with equal variance:
1 n 1
1
2 (
x 1
- x
= x
1 - x
– s
) t
( df
+ df
m 1
- m 2
+
)
2
t =
2
1
0.975
1
2
n
1
1
2
2 (
s
+
)
1
n
n
1
2
Parametric statistics
• Necessary to validate assumption • Example: test for deviation from normality • Remember: p-value >0.05 not enough
Normal
Normal P-P Plot of GTVrad
1.0
Mean = 58.1428 Std. Dev. = 43.04318 N = 85
25.0
0.8
20.0
0.6
15.0
Frequency
0.4
10.0
Expected Cum Prob
0.2
5.0
0.0
0.0
.00
50.00
100.00
150.00
200.00
250.00
0.0
0.2
0.4
0.6
0.8
1.0
GTVrad
Observed Cum Prob
Parametric statistics
• Necessary to validate assumption • Example: test for deviation from normality • Remember: p-value >0.05 not enough
Normal P-P Plot of lnGTVrad
Normal
1.0
Mean = 3.7738 Std. Dev. = .83951 N = 85
15.0
0.8
10.0
0.6
Frequency
0.4
Expected Cum Prob
5.0
0.2
0.0
0.0
0.0
0.2
0.4
0.6
0.8
1.0
.00
1.00
2.00
3.00
4.00
5.00
6.00
Observed Cum Prob
lnGTVrad
Parametric statistics
• Decisions.. Decisions....
Normal P-P Plot of TotalSUVmax
1.0
Normal P-P Plot of TotalSUVmax
1.0
0.8
0.8
0.6
0.6
0.4
0.4
Expected Cum Prob
Expected Cum Prob
0.2
0.2
0.0
0.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Observed Cum Prob
Observed Cum Prob
Transforms: natural logarithm
Non-parametric statistics
• No assumption about distribution • Example: rank tests - Rank all values
- Under null hypothesis: group 1 beats group 2 as often as vice versa. - Compare with coin flip > Less than 5% probability of result? - p-Value is probability of that or ‘more extreme’ result
Non-parametric statistics
• Examples
• Wilcoxon rank sum (signed or unsigned) • Log-rank test for survival • Cox proportional hazards
Parametric or non parametric?
Positive
Negative
Parametric
Provides description of distribution of data No assumption on distribution
Relies on assumption about distribution
Non-parametric
No information on distribution
Output
Parametric
P-value + distribution of data
Non-parametric
P-value
Random sampling techniques
• Non-parametric methods to quantify uncertainty
All patients give thick mean line Variability in patients transferred to variability in mean risk Random sampling with replacement -> large number of means 2.5%-97.5% of histogram is interpreted as 95% CI of mean
Random sampling techniques
• Bootstrap • Jackknife
• How to implement?
• In standard statistical packages • Matlab functions (bootstrp, bootci)
Types of error in hypothesis testing
Two types of errors in statistical inference
• The null hypothesis is true, but you reject it • An improbable event has occurred • The probability is controlled through the significance level - Type I error - False positive • The probability is a (usually set to 5%) • The null hypothesis is false, but you accept it • The result was not ‘extreme enough’ • Sample size not sufficient - Type II error - False negative • The probability is denoted b . • It is closely related to power , 1- b
Type I error and multiple testing
• The confidence level is typically set to 5% (p=0.05) • The risk of falsely rejecting the null hypothesis in one try is 5%
Type I error and multiple testing
• The confidence level is typically set to 5% (p=0.05) • The risk of falsely rejecting the null hypothesis in one try is 5% • The chance of accepting a true null hypothesis two tries is 0.95 2 =90.25% - The risk of falsely rejecting a true hypothesis in at least one of two tries is 1-0.95^2=9.75% • In 10 tries: 1-0.95 10 =40% • The confidence is eaten up by multiple testing • The effective a increases rapidly as we perform multiple tests • High probability of type I error
Bonferroni correction
• Adjust your confidence level to account for multiple testing: • 1 test: p=0.05 is significant • 2 tests: p=0.025 is significant • N tests: p=0.05/N is significant • Probability of a type I error maintained at 5% BUT • The probability of type II error increases!
http://prefrontal.org/files/posters/Bennett-Salmon-2009.jpg
Breakdown of the basic assumption: improbable events sometimes do occur
Clausen et al, in preparation
Bias
Definition
• An estimate is called biased if it is systematically different from the population based parameter of interest, regardless of sample size
Accurate, but not precise (unbiased):
Last but not least: Just Think!
Look at your data! Pattern of metastasis
Pelvic Primary Para Aortic
Para Aortic Relapse
Total
Stage 1A
0
1
1
2
1B
5
5
2
12
2A
1
0
0
1
2B
12
10
4
26
3A
2
0
0
2
3B
11
16
7
34
4A
2
2
0
4
4B
0
2
0
2
No. of Patients
33
36
14
83
Data from Henrik Hansen, MD
p=0.00002?
p=0.80?
Take home
• The basic assumption behind statistical inference is that improbable events do not occur • This is violated in multiple testing and post-hoc hypothesis unless corrected for • Point estimates without confidence intervals are meaningless • Wrong use of statistics can change a conclusion • It is not just details
Power and sample size
Ivan Vogelius
Background
• Hypoxic cell sensitizer trials in HNSCC
Estimated statistical resolution of 20 randomized controlled trials of hypoxic cell sensitizer in HNSCC. A meta-analysis showed a statistically significant absolute improvement in local control of 8.3%
Bentzen R&O 32: 1 (1994)
Statistical power
• Example: Two sample t-test (remember?) • Procedure for test • Verify normal distribution • Verify same variance in the two samples • Calculate test statistic
x 1
- x
2
t =
1
+ 1 n 2
2 (
s
)
1
n
1
Statistical power
• Example: Two sample t-test • Procedure for test • Verify normal distribution
• Verify same variance in the two samples • Calculate test statistic
x
- x
1
2
t =
1
1 2 ( 1 n 1
s
+
)
n
2
Sample estimate of variance, σ 2
Statistical power
• Example: Two sample t-test • Procedure for test • Verify normal distribution
• Verify same variance in the two samples • Calculate test statistic
x
- x
1
2
t =
1
1 2 ( 1 n 1
s
+
)
n
2
number of subjects
Statistical power
• Example: Two sample t-test • Procedure for test • Verify normal distribution
• Verify same variance in the two samples • Calculate test statistic
x
- x
1
2
t =
1
1 2 ( 1 n 1
s
+
)
n
2
The smaller, the better
Statistical power
• Example: Two sample t-test • Procedure for test • Verify normal distribution
• Verify same variance in the two samples • Calculate test statistic
Difference in means (effect size)
x
- x
1
2
t =
1
1 2 ( 1 n 1
s
+
)
n
2
x
- x 2
1
t =
Statistical power
+ 1 n 2
2 ( 1 n 1
s 1
)
• The larger the effect size, the better • The smaller the variance, the better
Easy
Hard
x
-x
=2, s
=0.75
x
-x
=2, s
=1.5
1
2
1
1
2
1
x
- x 2
1
t =
Statistical power
+ 1 n 2
2 ( 1 n 1
s 1
)
• The larger the effect size, the better • The smaller the variance, the better
Easy
Hard
x
-x
=7, s
=1.5
x
-x
=2, s
=1.5
1
2
1
1
2
1
x
- x 2
1
t =
1
+ 1 n 2
2 (
s
)
1
n
1
Easy
Hard
Large effect size
Small effect size
Small variance
Large variance
Choice of endpoint affects power
Bentzen et al, Sem. Rad. Oncol. 2003
What do we need to estimate power before starting a trial?
• Effect size • Variance
What do we need to estimate power before starting a trial?
• Effect size • Variance
What do we need to estimate power before starting a trial?
• Effect size • Variance
Design study
Conduct trial
Effect size
Terminology
• Null hypothesis, H 0
• The hypothesis of no difference
• Alternative hypothesis, H 1 • The hypothesis, that there is a difference • The expected difference is used to calculate β • Significance level, α • The risk of rejecting the null hypothesis if it is true • Normally 5% (p=0.05 is significant) • Statistical power, 1-β • The probability of rejecting H 0 if H 1 is true • Often 80%, recommended 90% • Depends on α and assumed effect size
Tools for estimating power
• Commercial tools
• PASS (www.ncss.com) • Matlab (sampsizepwr in statistics toolbox) • Free software • http://dceg.cancer.gov/tools/design/POWER • http://biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSi ze • Online tools • http://www.dssresearch.com/KnowledgeCenter/toolkitcalc ulators/statisticalpowercalculators.aspx
Comparing proportions
• The variance is given from binomial statistics
var = np (1 - p )
Example: Xerostomia with 3DCRT vs. IMRT
Nutting et al, Lancet Oncol. 2011
• Null hypothesis, H 0 • Alternative hypothesis, H 1
• Significance level, α • Statistical power, 1-β
We are planning a study of independent cases and controls with 1 control(s) per case. Prior data indicate that the failure rate among controls is 0.9 . If the true failure rate for experimental subjects is 0.6 , we will need to study 48 experimental subjects and 48 control subjects to be able to reject the null hypothesis that the failure rates for experimental and control subjects are equal with probability (power) 0.9. The Type I error probability associated with this test of this null hypothesis is 0.05. We will use a continuity-corrected chi-squared statistic or Fisher’s exact test to evaluate this null hypothesis.
Try p0=80%, p1=50%
Try p0=90%, p1=70%
New challenge: Design trial for non- inferiority of IMRT on local control • Assumptions • Local control 3DCRT: 75%@3 years (m1=7.23)
• Accrual time: 5 years • Additional FU: 2 years
It is HARD to demonstrate clinically relevant non-inferiority!
Common challenge:
• The number of patients is given
• Example: My PhD is based on 600 NSCLC patients
• How to write the statistical section for • Ethics/IRB • Funding • PhD enrollment
A good solution
Sample size requirements in Cox regression
Sample size requirements in Cox regression
Rule of thumb: At least 10 events per predictor in the multivariate model
Design process
Clinical question
Relevant Effect size
Feasible sample size
Not feasible sample size
Not feasible sample size
Option I • Convince yourself that the effect size is larger • Convince yourself that the variance is smaller • Use a one-sided test
Option II • Change the design to have larger expected effect • Implement methods to reduce variance
Not feasible sample size
Option I • Convince yourself that the effect size is larger • Convince yourself that the variance is smaller • Use a one-sided test
Option II • Change the design to have larger expected effect • Implement methods to reduce variance
Reduction of variance - pairing
Reduction of variance - pairing
Reduction of variance - pairing
Mean heart dose, standard fractionation
• Impossible to see difference • Wilcoxon rank sum: p=0.17
Mean heart dose, hypofractionation
Data from
Reduction of variance - pairing
Focus on difference
• Clear difference • Wilcoxon signed rank: p<0.0001
Data from
Example
• Heterogeneity leads to loss of power
Heterogeneity leads to loss of power
Trial design requires clinical input
The ‘right’ choice depends on the clinical consequence of a type I or type II error
Endpoints of treatment effect
ESTRO Course: Quantitative Research in Radiation Oncology Maastricht, 8 October 2017
Hans Langendijk Department of Radiation Oncology University Medical Center Groningen GRONINGEN The Netherlands
Introduction
Classification of endpoints • Continuous endpoints: • E.g.: laboratory value (e.g. Hb) • Categorical endpoints: • Binary endpoint: e.g. dead or alive, yes or no • Ordinal endpoint (logical order): e.g. toxicity grading – Nominal endpoint (no specific order): e.g. tumour site • Survival endpoints: • Binary endpoint with time interval/cenzored data
Types of endpoints • Endpoints related to treatment efficacy • E.g. locoregional tumour control • E.g. response • Endpoints related to adverse effects • Acute and late toxicity • Patient-reported outcome measures (PROM) • Symptoms and quality of life • Endpoints related to disease status • Diagnostic procedures (e.g. metastases present or not)
Endpoints related to efficacy
Endpoints of treatment efficacy
Endpoint
Definition of event
Local control
No evidence of disease at the primary site (T-position)
Local failure rate
Recurrence in T-position
Locoregional control
No evidence of disease in T- and N-position
(Overall) survival
Death irrespective of cause
Cause-specific survival
Death of cancer
Disease-free survival
Any recurrence or death from any cause, whichever comes first
Local recurrence-free survival
Local recurrence or death from any cause, whichever comes first
Disease-free rate
Any recurrence
Local relapse free rate
Local recurrence
Bentzen et al. Radiotherapy Oncology 1998; 46: 5-18
Endpoints of treatment efficacy
Events
Endpoint
Death
Last follow up alive
T-failure N-failure M-failure
Tumour
other
Local control
E
C C C E E E C C
C C C E C E C C
C C C C C C C C
Regional control
E E
Locoregional control
E
(Overall) survival
Cause-specific survival Disease-free survival
E E E
E E
E E
Disease-free rate
Local relapse free rate
Bentzen et al. Radiotherapy Oncology 1998; 46: 5-18
Endpoints of treatment efficacy
Endpoints of treatment efficacy
Key points
• Consistent use of
endpoints for treatment efficacy
• Main questions:
• Which events relevant for which endpoints ? • When should patients be censored ?
Endpoints related to toxicity
Toxicity endpoints in radiotherapy
90 days
5 years
Early effects
Late effects
Very late effects
Consequential late effects
Surrogate endpoints
Toxicity grading systems
Toxicity grading systems • RTOG/EORTC Acute Radiation Morbidity Scoring Criteria • RTOG/EORTC Late Radiation Morbidity Scoring Criteria • SOMA-LENT scoring system • Common Terminology Criteria for Adverse Events (CTCAE v4.0)
CTCAEv4.0 • Descriptive terminology for adverse events of cancer treatment • Features: • Adverse events independent of treatment modality • No difference between acute and late adverse event • Adverse effect may or may not be considered related to the medical treatment/procedure
• E.g. cardiac events after left-sided breast cancer RT • E.g. secondary tumour after CRT Hodgkin lymphoma
CTCAEv4.0
Grading
Descriptions
Mild; asymptomatic or mild symptoms; clinical or diagnostic observations only; intervention not indicated Moderate; minimal, local or noninvasive intervention indicated; limiting age-appropriate instrumental ADL Severe or medically significant but not immediately life-threatening; hospitalization or prolongation of hospitalization indicated; disabling; limited self care ADL Life-threatening consequences; urgent intervention indicated
Grade 1
Grade 2
Grade 3
Grade 4 Grade 5
Death related to adverse event
CTCAEv4.0
Grading
Descriptions
Mild; asymptomatic or mild symptoms; clinical or diagnostic observations only; intervention not indicated Moderate; minimal, local or noninvasive intervention indicated; limiting age-appropriate instrumental ADL Severe or medically significant but not immediately life-threatening; hospitalization or prolongation of hospitalization indicated; disabling; limited self care ADL Life-threatening consequences; urgent intervention indicated
Grade 1
Grade 2
Grade 3
Grade 4 Grade 5
Death related to adverse event
Instrumental ADL: • Activities performed by a person who is living independently • During the course of a normal day • Examples: managing money, shopping, telephone use, travel in community, housekeeping, preparing meals, and taking medications correctly • More complex and are learned during teens
CTCAEv4.0
Grading
Descriptions
Mild; asymptomatic or mild symptoms; clinical or diagnostic observations only; intervention not indicated Moderate; minimal, local or noninvasive intervention indicated; limiting age-appropriate instrumental ADL Severe or medically significant but not immediately life-threatening; hospitalization or prolongation of hospitalization indicated; disabling; limited self care ADL Life-threatening consequences; urgent intervention indicated
Grade 1
Grade 2
Grade 3
Grade 4 Grade 5
Death related to adverse event
Activities of Daily Living (self care ADL) • Activities usually performed in the course of a normal day in a person's lif • Examples: eating, toileting, dressing, bathing, or brushing the teeth. Generally these activities are rather simple and learned during childhood.
Grading according CTCAEv4.0
Simple and relatively straightforward endpoints EXAMPLE: Anemia Grading Descriptions Grade 1
Hemoglobin (Hgb) Hgb <10.0 - 8.0 g/dL; <6.2 - 4.9 mmol/L; <100 - 80g/L Hgb <8.0 g/dL; <4.9 mmol/L; <80 g/L; transfusion indicated Life-threatening consequences; urgent intervention indicated Grade 2 Grade 3 Grade 4 Grade 5 Death Grading according CTCAEv4.0 Endpoint with potential interobserver variability EXAMPLE: cheilitis Grading Descriptions Grade 1 Asymptomatic; clinical or diagnostic observations only; intervention not indicated Grade 2 Grade 3 Grade 4 Grade 5 Moderate symptoms; limiting instrumental ADL Severe symptoms; limiting self care ADL; intervention indicated - - 21 RTOG/EORTC vs. LENT/SOMA Van der Laan et al. Int J Radiat Oncol Biol Phys 2008;70:1138-1145 22 RTOG/EORTC vs. LENT/SOMA Multivariate NTCP-models for Grade II or more rectal toxicity relative to the V70 Van der Laan et al. Int J Radiat Oncol Biol Phys 2008;70:1138-1145 Key points • Definitions of toxicity grading in different toxicity grading systems do not correspond: • May lead to different frequency measures • May lead to different NTCP-models • May lead to different dose thresholds • May lead to different dose constraints Specificity versus relevance Analytic Objective signs Subjective symptoms Specificity Quality of life Patient relevance Bentzen et al. Seminars in Radiation Oncology 2003; 13: 189-202 Specificity versus relevance Salivary dysfunction and xerostomia Other prognostic factors Parotid gland dose Parotid flow Xerostomia Quality of life Submandibular gland dose Submandibular flow Sticky saliva Minor salivary gland dose Specificity Beetz et al. Radiotherapy Oncology 2012 Clinical relevance One factor model 1 Multi- factor model 2 Independent variables Wilks' Lambda p-value Wilks' Lambda p-value RTOG late toxicity RTOG xerostomia 0.897 0.923 0.798 0.922 0.934 0.956 0.973 0.950 0.956 0.915 0.965 p<0.001 p=0.019 p<0.001 p=0.018 0.948 0.965 0.859 0.974 0.971 0.984 0.973 0.940 0.983 0.969 0.945 p=0.003 RTOG mucosal RTOG swallowing RTOG subcutanuous ns p<0.001 ns ns ns RTOG larynx RTOG skin ns ns Other variables Sex ns ns Age p<0.001 p=0.001 UICC stage ns ns ns Primary tumor site Treatment modality p=0.022 ns p=0.002 Langendijk et al. J Clin Oncol 2008;26:3770-3776 Clinical relevance xerostomia Toxicity grading Quality of life scale P-value Grade 0 Grade 1 Grade 2 Grade 3-4 Physical functioning 81 82 75 71 P-0.001 Role functioning 74 74 67 67 P=0.044 Emotional functioning 84 80 74 69 P=0.001 Social functioning 88 85 79 64 P<0.001 Global quality of life 73 75 65 55 P<0.001 Fatigue 25 28 36 42 P<0.001 Little effect Moderate effect Strong effect Langendijk et al. J Clin Oncol 2008;26:3770-3776 Clinical relevance dysphagia Toxicity grading Quality of life scale P-value Grade 0 Grade 1 Grade 2 Grade 3-4 Physical functioning 81 4 82 75 2 71 68 P-0.001 < Role functioning 74 7 74 6 67 2 67 55 P=0.044 < 01 Emotional functioning 84 80 74 0 69 P=0.001 < Social functioning 88 91 85 3 79 3 64 8 P<0.001 Global quality of life 73 8 75 2 65 3 55 6 P<0.001 Fatigue 25 3 28 9 36 41 42 3 P<0.001 Little effect Moderate effect Strong effect Langendijk et al. J Clin Oncol 2008;26:3770-3776 Key points • Choice of endpoints – Strongly depends on research question: • Biological modelling: more specific • Clinical relevance: less specific • Relevance of endpoints not always acknowledged – e.g. rectal bleeding versus stool frequency – E.g. xerostomia versus dysphagia • Even grade I toxicity may have impact on QoL Composed endpoints • Endpoints that include different toxicity states in one endpoint • EXAMPLE: RTOG late small/large intestine Composed endpoints Grade 1 Grade 2 Grade 3 Grade 4 Grade 5 Mild diarrhea Mild cramping Bowel movement 5 times daily Slight rectal discharge or bleeding Moderate diarrhea and colic Bowel movement >5 times daily Excessive rectal mucus or intermittent bleeding Obstruction or bleeding requiring surgery Necrosis/ Perforation Fistula Death • PROBLEM: different endpoints in corresponding grade may reflect different biological mechanisms and/or organs at risk Composed endpoints Grade 1 Grade 2 Grade 3 Grade 4 Grade 5 Mild diarrhea Mild cramping Bowel movement 5 times daily Slight rectal discharge or bleeding Moderate diarrhea and colic Bowel movement >5 times daily Excessive rectal mucus or intermittent bleeding Obstruction or bleeding requiring surgery Necrosis/ Perforation Fistula Death Composed endpoints Peeters et al. Int J Radiat Oncol Biol Phys 2006;66:11-19 • Endpoints that include different toxicity states in one endpoint • EXAMPLE: CTCAEv4.0 Salivary duct inflammation Composed endpoints Grade 1 Grade 2 Grade 3 Grade 4 Slightly thickened saliva; slightly altered taste (e.g., metallic) Thick, ropy, sticky saliva; markedly altered taste; alteration in diet Acute salivary gland necrosis; severe secretion-induced Life-threatening consequences; urgent intervention indicated symptoms (e.g., thick saliva/oral secretions or gagging); tube feeding or TPN indicated; limiting self care ADL; disabling indicated; secretion- induced symptoms; limiting instrumental ADL Composed endpoints Patient-rated STICKY SALIVA Patient-rated ALTERED TASE No A bit Quite a lot Very much No 30.8% 8.2% 2.4% 1.9% A bit 13.3% 9.7% 5.8% 1.2% Quite a lot Very much 3.9% 5.3% 7.5% 1.5% 1.7% 2.4% 2.2% 2.2% Overall agreement: 75.4% (2 categories) Overall agreement: 50.2% (4 categories) Source: Prospective Data Registration Program Head and Neck UMCG Early side effects Frequency measures • Peak prevalence • The proportion of cases with an event in a given population at a specific time point • Period prevalence • The proportion of cases with an event in a given population in a certain period of time • Incidence • The proportion of NEW cases with an event in a given population in a certain period of time • Cumulative incidence • = incidence with censored data Acute toxicity scoring • Incidence grade 4: YES • Prevalence grade 4: YES (W7) Acute toxicity scoring • Incidence grade 4: YES • Prevalence grade 4: YES (W4, W5, W6, W7, W8) Acute toxicity scoring • Incidence grade 4: YES • Prevalence grade 4: YES (W4, W7, W8) Acute toxicity scoring • Incidence grade 4: NO • Prevalence grade 4: YES (T0, W1, W2) • Peak prevalence at different time points (e.g. weekly) during RT provides most accurate information on acute side effects • Essential information may be lost by using incidence and/or period prevalence • Assessment of baseline “toxicity” • Modelling studies on risk factors in general: • Include baseline “toxicity” as potential risk factor, OR • Exclude patients with baseline “toxicity” Key points Late side effects Cumulative incidence • Each patient that ever had the relevant endpoint is considered an event • Even if the event is not present anymore Cumulative incidence/prevalence Vergeer et al. Int J Radiat Oncol Biol Phys 2010;78:682-688 • Irreversible persistent • Remains at same level • Irreversible progressive • Already detectable or clinically manifest with further progression into higher grades • Transient persistent • Partial recovery after peak severity • Complete recovery • Intermittent Patterns of toxicity Patterns of toxicity Surrogate endpoints Toxicity endpoints in radiotherapy 90 days 5 years Early effects Late effects Very late effects Consequential late effects Surrogate endpoints • Definition surrogate marker: • Measurement of physical sign • Substitute for a clinically meaningful endpoints • Predict the effect of therapy • Example: • Tumour shrinkage as surrogate for survival • Definition biomarker • Measurement that reflects the current activity of a disease process Surrogate endpoints Example • Confluent mucositis • Head and neck radiotherapy • Good indicator for overall acute morbidity (biomarker) • Poor indicator for late effect (bad surrogate marker for late effects) Example of confluent mucositis (Grade III acute mucosal reaction) Example Acute toxicity Late toxicity 0% 20% 40% 60% 80% 100% CHART Observed/expected Conventional Observed/expected Toxicity Superficial and deep mucosal ulceration 0.64 1.53 CHART results in significant increase in acute mucosal reactions but protects against late mucosal reactions: Acute toxicity is NO surrogate marker for late mucosal reactions CHART Conventional No mucositis Patchy mucositis Confluent mucositis Dische et al. Radiotherapy Oncology 1997; 44: 123-136 Clinical trials in radiation oncology ESTRO Course: Quantitative Research in Radiation Oncology Maastricht, 8 November 2017 Hans Langendijk Department of Radiation Oncology University Medical Center Groningen GRONINGEN The Netherlands Study designs All studies Descriptive Analytic Study designs All studies Descriptive Analytic • Descriptive studies – What is happening in a population? • e.g. the prevalence or incidence of a group – Investigate feasibility of modelling studies • e.g. power analysis Descriptive studies (registry) 9546 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 8874 7458 6354 6739 6684 Lung cancer incidence in the Netherlands 1990 1995 2000 2005 2010 2015 Source: National Cancer Registry (The Netherlands) Study designs All studies Descriptive Analytic Survey (cross sectional) Qualitative • Descriptive studies – What is happening in a population? • e.g. the prevalence or incidence of a group – Investigate feasibility of modelling studies • e.g. power analysis Qualitative study Event Event Incidence rate: 4/16 (25%) Event Event Time è Cross sectional study Cross sectional study Event Event Event Event Time è Point prevalence: 3/12 (25%) Cross sectional study Cross sectional study Event Event Event Event Time è Period prevalence: 4/12 (33%) Study designs All studies Descriptive Analytic • Analytic studies Observational analytic Experimental – Attempts to quantify the relationship between factors: • Intervention Randomized controlled trial • Exposure • Outcome Randomized controlled trial Standard = event R Outcomes for both groups are measured Random allocation Experimental Randomized controlled trial • Advantages – Prevent bias between treatment arms • Similar settings in both arms ! – Prospective assessment of predictors and endpoints – Quality assurance – May allow for identifying predictive factors • Prognostic factor: – Factor that is associated with outcome • Predictive factor: – Factor that predicts whether certain treatment approach is beneficial Prognostic vs. predictive factor 13 The effect of age on the added value of concomitant chemoradiation in head and neck cancer EXAMPLE: predictive factor Pignol, et al. Radiother Oncol 2011 • Prognostic factor: – Factor that is associated with outcome • Predictive factor: – Factor that predicts whether certain treatment approach is beneficial • Obtained from RCT’s (hypothesis generating: power!) • Preferably obtained from meta-analysis – Factor that predicts whether certain prognostic factor is associated with endpoint • May negatively affect power and required number of patients in prognostic factor studies Prognostic vs. predictive factor EXAMPLE: predictive factor The effect of the mean parotid dose on RTOG xerostomia grade 2 or more (1 year) depends on RT technique (unilateral versus bilateral RT) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Bilateral irradiation NTCP (%) Unilateral irradiation 0 10 20 30 40 50 60 70 Mean dose parotid glands Source: Prospective Data Registration program UMCG Randomized controlled trial • Advantages – Prevent bias between treatment arms • Similar settings in both arms ! – Prospective assessment of predictors and endpoints – Quality assurance – May allow for identifying predictive factors • Disadvantage – Generalisibility • Very strict entry criteria • Burdensome assessments Study designs All studies Descriptive Analytic • Modelling studies are typically observational analytic studies – Retrospective cohort study – Prospective cohort study – Cross sectional study – Case control study Observational analytic Experimental Randomized controlled trial Cohort study Cross sectional (analytic) Case control study Retrospective cohort study Past Present Exposed Incidence: 6/14 Association measure? = Odds ratio = (6/14) / (3/15) = 2.14 Unexposed Population without event Incidence: 3/15 Retrospective cohort study • Most common type: – Single center retrospective study • Typical design: – Identification from hospital records – Fulfill predefined eligibility criteria – Specific treatment period – Followed over time for a certain period of time • Advantages: – Simplicity and high feasibility – Relatively cheap (data available) 20 • Disadvantages: – Correct identification of patients (retrospective) • Incorrect recording • Missing data – Assessment of outcome: • Straightforward endpoints → e.g. survival – Missing data • Difficult retrospective endpoints: Retrospective cohort study – Functional status (e.g. performance status): “is OK” – Toxicity – Limited numbers (low power) Missing data and bias EXAMPLE Patient chart (retroperitoneal sarcoma): OK. No changes. 6 months earlier: OK. No changes. CTCAEv4.0: Diarrhea Grade 1 Grade 2 Grade 3 Grade 4 Increase of <4 stools per day over baseline; mild increase in ostomy output compared to baseline Increase of 4 - 6 stools per day over baseline; moderate increase in ostomy output compared to baseline Increase of >=7 stools per day over baseline; incontinence; hospitalization indicated; severe increase in ostomy output compared to baseline; limiting self care ADL Life-threatening consequences; urgent intervention indicated Retrospective cohort study Past Present Exposed Incidence: 6/14 Association measure? = Hazard ratio = (6/14) / (3/15) = 2.14 Unexposed Population without event Incidence: 3/15 Prospective cohort study Present Future Exposed Incidence: 6/14 Association measure? = Hazard ratio = (6/14) / (3/15) = 2.14 Unexposed Population without event Incidence: 3/15 Prospective cohort study • Advantages: – Better control inclusion and exclusion criteria – Predefined and consistent definitions of candidate predictors – Predefined and consistent assessment of endpoints: • Fixed time points including baseline assessment • Additional guidelines/training for “difficult” assessments • Additional diagnostic procedures – Predefined guidelines for radiotherapy: • e.g. OAR delineation / fractionation / treatment planning – Permits quality assurance programs Prospective cohort study • Advantages: – Better control inclusion and exclusion criteria – Predefined and consistent definitions of candidate predictors – Predefined and consistent assessment of endpoints: • Fixed time points including baseline assessment • Additional guidelines/training for “difficult” assessments • Additional diagnostic procedures – Predefined guidelines for radiotherapy: • e.g. OAR delineation / fractionation / treatment planning – Permits quality assurance programs Rapid Learning Health Care Knowledge stage Multivariable NTCP model Most relevant DVH parameters Data stage Prospective data registration IMRT dose optimisation Evaluation stage Application stage IMRT photons Based on: Lambin, et al. Acta Oncol 2015 Single RCT • Within framework of RLHC system • Double blind RCT standard IMRT versus stem cell sparing IMRT Single RCT: EXAMPLE Single RCT • Within framework of RLHC system Case control study Past Present Exposed Non-exposed Non-cases Exposed Population Non-exposed Cases • Compare patients who have the outcome of interest (cases) with patients who do not have the outcome (non cases) • Compare how frequently the exposure to a risk factor is present in each group • Objective: – To retrospectively determine the exposure to the risk factor of interest from each of the two groups of individuals: cases and controls Case control study 33 Case control study: EXAMPLE Major Coronary Events Total number Breast radiotherapy Cases (n=963) Controls (n=1205) Left-sided breast cancer 543 (a) 601 (b) 1144 Right-sided breast cancer 420 (c) 604 (d) 1024 Total number 963 1205 2168 • Association can be tested by the Odds Ratio (OR): = ad / bc = 543*604 / 601*420 = 1.30 • Rare disease assumption (<5%) Darby, et al. New Engl J Med 2013: 987-998 34 Case control study Advantages • Good for studying rare conditions or diseases • Less time needed to conduct the study because the condition or disease has already occurred • Multiple risk factors • Useful as initial studies to establish an association • Can answer questions that could not be answered through other study designs 35 Case control study Disadvantages • Retrospective studies – Problems with data quality – Risk exposure assessment: risk of recall bias. • Not good for evaluating diagnostic tests – Clear that the cases have the condition and the controls do not • Difficult to find a suitable control group • Incidence (absolute risk) cannot be obtained The importance of quality assurance of radiation RCT Chemoradiation vs. chemoradiation + tirapazamine Richin, et al. J Clin Oncol 2010; 28: 2989-2995 Sources of non-compliance • GTV not properly defined and therefore inadequate dose coverage • Treatment planning itself was inappropriate and therefore inadequate dose to targets • Inappropriate prescribed dose • Protracted overall treatment time Peters, et al. J Clin Oncol 2010; 28: 2996-3001 Risks of non-compliance Peters, et al. J Clin Oncol 2010; 28: 2996-3001 Risks of non-compliance Number with major adverse impact Number of patients Enrolment bracket Percent 1-4 (26 centers) 57 17 29.8% 5-9 (22 centers) 130 28 21.5% 10-19 (22 centers) 279 33 11.8% > 20 (11 centers) 352 19 5.4% Better few centres with many patients, than many centres with few patients Peters, et al. J Clin Oncol 2010; 28: 2996-3001 RCT Chemoradiation vs. chemoradiation + tirapazamine Richin, et al. J Clin Oncol 2010; 28: 2989-2995 Guidelines: Do they work? Unilateral or bilateral elective irradiation Primary tumour site is: - Floor of mouth - Lateral tongue - Retromolar trigonum NO - Cheek - Tonsil YES Tumourextension across the midline? YES NO Ipsilateral pN0 neck Ipsilateral pN+ neck Contralateral pN0 neck Contralateral pN+ neck Contralateral cN0 neck Contralateral pN0 neck Tumourextension less than 1 cm from the midline? NO NO YES YES Bilateral irradiation of the neck is mandatory Unilateral irradiation of the neck is mandatory Guidelines: Do they work? Unilateral or bilateral elective irradiation Oral cavity carcinoma Stage and primary site Ipsilateral neck Contralateral neck, if inidicated (see figure x) Level Ia Level Ib Level II Level III Level IV Level V Level VI Level Ib Level II Level III Level IV Level V Level VI RP Level Ia RP pN0 and pN1 R1 R1 pN2a-pN2b and pN3 R1 R1 pN2c R1 R5 R1 R5 Oropharyngeal carcinoma Stage and primary site Ipsilateral neck Contralateral neck, if inidicated (see figure x) Level Ia Level Ib Level II Level III Level IV Level V Level VI Level Ib Level II Level III Level IV Level V Level VI RP Level Ia RP pN0 and pN1 R2 pN2a-pN2b and pN3 pN2c R5 R5 R5 R5 Hypopharyngeal carcinoma Stage and primary site Ipsilateral neck Contralateral neck, if inidicated (see figure x) Level Ia Level Ib Level II Level III Level IV Level V Level VI Level Ib Level II Level III Level IV Level V Level VI RP Level Ia RP pN0 R3 R2 R3 R2 pN1-pN2a-pN2b R3 R3 R2 pN2c R5 R5 R3 R5 R5 pN3 R3 Laryngeal carcinoma Stage and primary site Ipsilateral neck Contralateral neck, if inidicated (see figure x) Level Ia Level Ib Level II Level III Level IV Level V Level VI Level Ib Level II Level III Level IV Level V Level VI RP Level Ia RP pN0 and pN1 pN2a - pN2b R4 R4 R4 R4 pN2c R5 R5 R4 R5 R5 R4 pN3 R4 R4 R1: Include level Ia only in case of anterior tongue or anterior floor of mouth extension. R2: Include retropharyngeal nodes for posterior pharyngeal wall tumour extension. R3: include level VI in case of extension apex of prirform sinus or esophageal extension. R4: include level VI in case of trans- or subglottis extension. R5: according to N-stage on each side of the neck. RP: retropharyngeal nodes. NOTE: LEVELS ADJACENT TO POSITIVE LYMPH NODE AREAS SHOULD ALWAYS BE INCLUDED IN THE ELECTIVE CTV.
Made with FlippingBook - Online catalogs