医学全在线
搜索更多精品课程:
热 门:外科内科学妇产科儿科眼科耳鼻咽喉皮肤性病学骨科学全科医学医学免疫学生理学病理学诊断学急诊医学传染病学医学影像药 学:药理学药物化学药物分析药物毒理学生物技术制药生药学中药学药用植物学方剂学卫生毒理学检 验:理化检验 临床检验基础护 理:外科护理妇产科护理儿科护理 社区护理五官护理护理学内科护理护理管理学中 医:中医基础理论中医学针灸学刺法灸法学口 腔:口腔内科口腔外科口腔正畸口腔修复口腔组织病理生物化学:生物化学细胞生物学病原生物学医学生物学分析化学医用化学其 它:人体解剖学卫生统计学人体寄生虫学仪器分析健康评估流行病学临床麻醉学社会心理学康复医学法医学核医学危重病学中国医史学
您现在的位置: 医学全在线 > 精品课程 > 卫生统计学 > 南方医科大学 > 正文:医学统计学电子教材:Survival Analysis
    

医学统计学-电子教材:Survival Analysis

医学统计学:电子教材 Survival Analysis:ContentSurvivalAnalysisSurvivalanalysisKaplan-MeiersurvivalestimatesSurvivalplotFollow-uplifetableAbridgedlifetableLog-Rank&WilcoxontestsWei-LachintestCoxregressionSurvivalanalysis.1Kaplan-Meiersur

Content

 Survival Analysis

 Survival analysis

 Kaplan-Meier survival estimates

 Survival plot

 Follow-up life table

 Abridged life table

 Log-Rank & Wilcoxon tests

 Wei-Lachin test

 Cox regression

Survival analysis. 1

Kaplan-Meier survival estimates. 1

Survival plot. 10

Follow-up life table. 10

Abridged life table. 13

Log-Rank & Wilcoxon. 19

Wei-Lachin test. 25

Cox regression. 31

Survival analysis

·Kaplan-Meier

·Follow-uplife table

·Abridgedlife table

·Log-rankand Wilcoxon

·Wei-Lachin

·Coxregression

Menu location: Analysis_Survival.

This section provides methods forthe description and comparison of survival experience in different groups.

Note that StatsDirectsurvival analysis functions do not use separate variables for different groups.The groups are indicated by a groupidentifier variable that contains group numbers or text strings, i.e. for 2groups you might have a column of 1 and 2. Each value in the group identifiercolumn identifies its row with respect to time, death and censorship data in adjacent columns.

Copyright © 1990-2006 StatsDirectLimited, all rights reserved

Download a free 10 day StatsDirect trial

Kaplan-Meiersurvival estimates

Menu location: Analysis_Survival_Kaplan-Meier.

This function estimates survivalrates and hazard from data that may be incomplete.

The survival rate is expressed asthe survivor function (S):

- where t is a time period knownas the survival time, time to failure or time to event (such as death); e.g. 5years in the context of 5 year survival rates. Some texts present S as theestimated probability of surviving to time t for those alive just before tmultiplied by the proportion of subjects surviving to t.

The product limit (PL) method of Kaplan and Meier(1958) is used to estimate S:

- where ti is duration of study at point i,di is number of deaths up to point i and ni is number of individualsat risk just prior to ti. S is based upon theprobability that an individual survives at the end of a time interval, on thecondition that the individual was present at the start of the time interval. Sis the product (P) of these conditional probabilities.

If a subject is last followed upat time ti and then leaves the study for any reason(e.g. lost to follow up) ti is counted as their censorship time.

Assumptions:

·Censored individuals have the sameprospect of survival as those who continue to be followed. This can not betested for and can lead to a bias thatartificially reduces S.

·Survival prospects are the same for early as for late recruits tothe study (can be tested for).

·The event studied(e.g. death) happens at the specified time. Late recording of the event studiedwill cause artificial inflation of S.

The cumulative hazard function(H) is the risk of event (e.g. death) at time t, it isestimated by the method of Peterson (1977)as:

S and H with their standarderrors and confidence intervals can be saved to a workbook for further analysis(see below).

Median and mean survival time

The median survival time iscalculated as the smallest survival time for which the survivor function isless than or equal to 0.5. Some data sets may not get this far, in which casetheir median survival time is not calculated. A confidence interval for themedian survival time is constructed using a robust non-parametric method due toBrookmeyer andCrowley (1982). Another confidence interval for the median survival time isconstructed using a large sample estimate of the density function of thesurvival estimate (Andersen,1993). If there are many tied survival times then the Brookmeyer-Crowleylimits should not be used.

Mean survival time is estimatedas the area under the survival curve. The estimator is based upon the entirerange of data. Note that some software uses only the data up to the lastobserved event; Hosmerand Lemeshow (1999) point out that this biases the estimate of the meandownwards, and they recommend that the entire range of data is used. A largesample method is used to estimate the variance of the mean survival time andthus to construct a confidence interval (Andersen, 1993).

Samples of survival times arefrequently highly skewed, therefore, in survival analysis,the median is generally a better measure of central location than the mean.

Plots

StatsDirect can calculate S and H for more than one group at a time and plotthe survival and hazard curves for the different groups together. Fourdifferent plots are given and certain distributions are indicated if theseplots form a straight line pattern (Lawless, 1982;Kalbfleisch and Prentice, 1980). The plots and their associateddistributions are:

Plot Distribution indicated by a straight line pattern

H vs. t Exponential, throughthe origin with slope l

ln(H) vs. ln(t) Weibull,intercept beta and slope ln(l)

z(S) vs. ln(t) Log-normal

H/t vs. t Linearhazard rate

- wheret is time, ln is natural (base e) logarithm, z(p) isthe p quantile from the standardnormal distribution and l (lambda) is the real probability of event/death at time t.

For survival plots that displayconfidence intervals, save the results of this function to a workbook and usethe Survivalfunction of the graphics menu.

Note that censored times aremarked with a small vertical tick on the survival curve; you have the option toturn this off. If you want to use markers for observed event/death/failuretimes then please check the box when prompted.

Technical validation

The variance of S is estimatedusing the method of Greenwood (1926):

- The confidence interval for thesurvivor function is not calculated directly using Greenwood's variance estimate as this wouldgive impossible results (< 0 or > 1) at extremes of S. The confidenceinterval for S uses an asymptotic maximum likelihood solution by logtransformation as recommended by Kalbfleisch andPrentice (1980).

The cumulative hazard function isestimated as minus the natural logarithm of the product limit estimate of thesurvivor function as above (Peterson, 1977).Note that some statistical software calculates the simpler Nelson-Aalenestimate (Nelson,1972; Aalen, 1978):

A Nelson-Aalen hazard estimatewill always be less than an equivalent Peterson estimate and there is nosubstantial case for using one in favour of theother.

The variance of H hat isestimated as:

Further analysis

S and H do not assume specificdistributions for survival or hazard curves. If survival plots indicatespecific distributions then more powerful estimates of S and H might beachieved by modelling. The commonest model isexponential but Weibull, log-normal, log-logistic andGamma often appear. An expert Statistician and specialist software (e.g. GLIM, MLPand some of the SAS modules) should beemployed to pursue this sort of work. In most situations, however, you shouldconsider improving the estimates of S and H by using Coxregression rather than parametric models.

If H is constant over time then aplot of the natural log of H vs. time will resemble a straight line with slope l. If this istrue then:

Probability of survival beyond t= exponent(-l * t)

- thiseases the calculation of relative risk from the ratio of hazard functions attime t on two survival curves. When the hazard function depends on time thenyou can usually calculate relative risk after fitting Cox'sproportional hazards model. This model assumes that for each group thehazard functions are proportional at each time, itdoes not assume any particular distribution function for the hazard function.Proportional hazards modelling can be very useful, however, most researchers should seek statisticalguidance with this.

Example

Test workbook (Survivalworksheet: Group Surv, Time Surv,Censor Surv).

In a hypothetical example, deathfrom a cancer after exposure to a particular carcinogen was measured in twogroups of rats. Group 1 had a different pre-treatment régime to group 2. Thetime from pre-treatment to death is recorded. If a rat was still living at theend of the experiment or it had died from a different cause then that time isconsidered "censored". A censoredobservation is given the value 0 inthe death/censorship variable to indicate a "non-event".

Group 1: 143, 165, 188, 188, 190,192, 206, 208, 212, 216, 220, 227, 230, 235, 246, 265, 303, 216*, 244*

Group 2: 142, 157, 163, 198, 205,232, 232, 232, 233, 233, 233, 233, 239, 240, 261, 280, 280, 295, 295, 323,204*, 344*

* = censored data

To analysethese data in StatsDirect you must first prepare themin three workbook columns appropriately labelled:

Group Surv

Time Surv

Censor Surv

2

142

1

1

143

1

2

157

1

2

163

1

1

165

1

1

188

1

1

188

1

1

190

1

1

192

1

2

198

1

2

204

0

2

205

1

1

206

1

1

208

1

1

212

1

1

216

0

1

216

1

1

220

1

1

227

1

1

230

1

2

232

1

2

232

1

2

232

1

2

233

1

2

233

1

2

233

1

2

233

1

1

235

1

2

239

1

2

240

1

1

244

0

1

246

1

2

261

1

1

265

1

2

280

1

2

280

1

2

295

1

2

295

1

1

303

1

2

323

1

2

344

0

Alternatively, open the testworkbook using the file open function of the file menu. Then selectKaplan-Meier from the Survival Analysis section of the analysis menu. Selectthe column marked "Group Surv" when askedfor the group identifier, select "Time Surv"when asked for times and "Censor Surv" whenasked for deaths/events. Click on No when you are asked whether or not you wantto save various statistics to the workbook. Click on Yes when you are promptedabout plotting PL estimates.

For this example:

Kaplan-Meier survivalestimates

Group: 1 (Group Surv = 2)

Time

At risk

Dead

Censored

S

SE(S)

H

SE(H)

142

22

1

0

0.954545

0.044409

0.04652

0.046524

157

21

1

0

0.909091

0.061291

0.09531

0.06742

163

20

1

0

0.863636

0.073165

0.146603

0.084717

198

19

1

0

0.818182

0.08223

0.200671

0.100504

204

18

0

1

0.818182

0.08223

0.200671

0.100504

205

17

1

0

0.770053

0.090387

0.261295

0.117378

232

16

3

0

0.625668

0.105069

0.468935

0.16793

233

13

4

0

0.433155

0.108192

0.836659

0.249777

239

9

1

0

0.385027

0.106338

0.954442

0.276184

240

8

1

0

0.336898

0.103365

1.087974

0.306814

261

7

1

0

0.28877

0.099172

1.242125

0.34343

280

6

2

0

0.192513

0.086369

1.64759

0.44864

295

4

2

0

0.096257

0.064663

2.340737

0.671772

323

2

1

0

0.048128

0.046941

3.033884

0.975335

344

1

0

1

0.048128

0.046941

3.033884

0.975335

Median survival time = 233

Andersen 95% CI for mediansurvival time = 231.898503 to 234.101497

Brookmeyer-Crowley 95% CI for median survival time = 232 to 240

Mean survival time (95% CI)[limit: 344 on 323] = 241.283422 (219.591463 to 262.975382)

Group: 2 (Group Surv = 1)

Time

At risk

Dead

Censored

S

SE(S)

H

SE(H)

143

19

1

0

0.947368

0.051228

0.054067

0.054074

165

18

1

0

0.894737

0.070406

0.111226

0.078689

188

17

2

0

0.789474

0.093529

0.236389

0.11847

190

15

1

0

0.736842

0.101023

0.305382

0.137102

192

14

1

0

0.684211

0.106639

0.37949

0.155857

206

13

1

0

0.631579

0.110665

0.459532

0.175219

208

12

1

0

0.578947

0.113269

0.546544

0.195646

212

11

1

0

0.526316

0.114549

0.641854

0.217643

216

10

1

1

0.473684

0.114549

0.747214

0.241825

220

8

1

0

0.414474

0.114515

0.880746

0.276291

227

7

1

0

0.355263

0.112426

1.034896

0.316459

230

6

1

0

0.296053

0.108162

1.217218

0.365349

235

5

1

0

0.236842

0.10145

1.440362

0.428345

244

4

0

1

0.236842

0.10145

1.440362

0.428345

246

3

1

0

0.157895

0.093431

1.845827

0.591732

265

2

1

0

0.078947

0.072792

2.538974

0.922034

303

1

1

0

0

*

infinity

*

Median survival time = 216

Andersen 95% CI for mediansurvival time = 199.619628 to 232.380372

Brookmeyer-Crowley 95% CI for median survival time = 192 to 230

Mean survival time (95% CI) =218.684211 (200.363485 to 237.004936)

Below is the classical"survival plot" showing how survival declines with time. Theapproximate linearity of the log hazard vs. log time plot below indicates a Weibull distribution of survival.

At this point you might want torun a formal hypothesis test to see if there is any statistical evidence fortwo or more survival curves being different. This can be achieved usingsensitive parametric methods if you have fitted a particular distribution curveto your data. More often you would use the Log-rankand Wilcoxon tests which do not assume any particular distribution of thesurvivor function.

confidenceintervals

Copyright © 1990-2006 StatsDirectLimited, all rights reserved

Download a free 10 day StatsDirect trial

Survivalplot

Menu location: Graphics_Survival.

This provides a step plot fordisplaying survival curves. It is intended for use with variables for Time onthe x (horizontal) axis and S (the Kaplan-Meier product limit estimate ofsurvival / survivor function) on the Y (vertical) axis. You can displaymultiple series, each with a different marker style. Confidence intervals for Scan be displayed.

Note that censored times aremarked with a small vertical tick on the survival curve.

This is a good accompaniment to apresentation of survival analysis that compares survival (or time to event)data in different groups. See Kaplan-Meierfor more information on generating S.

Copyright © 1990-2006 StatsDirectLimited, all rights reserved

Download a free 10 day StatsDirect trial

Follow-up lifetable

Menu location: Analysis_Survival_Follow-Up Life Table.

This function provides afollow-up life table that displays the survival experience of a cohort.

The table is constructed by thefollowing definitions:

Interval For a Berkson and Gage survival table thisis the survival times in intervals.

 For anabridgedlife table this is ages in groups.

Deaths Number of individuals who die in the interval. [dx]

Withdrawn Number of individualswithdrawn or lost to follow up in the interval.[wx]

AtRisk Number of individuals alive at the startof the interval. [nx]

Adj. atrisk Adjusted number at risk (half ofwithdrawals of current interval subtracted). [n'x]

P(death) Probability that an individual who survived the last intervalwill die in the current interval. [qx]

P(survival) Probability that an individual who survived the last intervalwill survive the current interval. [px]

% Survivors (lx) Probability of anindividual surviving beyond the current interval.

 Proportion of survivors after the current interval.

 Life table survival rate.

Var(lx%) Estimated variance of lx.

*% CI forlx%  *% confidence interval for lx%.

- where lx is the product of all pxbefore x.

Technical validation

The Berksonand Gage method is used to construct the basic table (Berkson and Gage,1950; Armitage and Berry, 1994; Altman, 1991; Lawless, 1982; Kalbfleisch andPrentice, 1980; Le, 1997). The confidence interval for lx is not a simpleapplication of the estimated variance for Ix, insteadit uses a maximum likelihood solution from an asymptotic distribution by thetransformation of lx suggested by Kalbfleisch andPrentice (1980). This treatment of lx avoids impossible values (i.e. >1or <0).

Example

From Armitage and Berry(1994, p. 473).

Test workbook (Survivalworksheet: Year, Died, Withdrawn).

The following data represent thesurvival of a 374 patients who had one type of surgery for a particularmalignancy.

Years since operation

Died in this interval

Lost to follow up

1

90

0

2

76

0

3

51

0

4

25

12

5

20

5

6

7

9

7

4

9

8

1

3

9

3

5

10

2

5

To analysethese data in StatsDirect you must first prepare themin three workbook columns appropriately labelled.Alternatively, open the test workbook using the file open function of the filemenu. Then select Simple Life Table from the survival analysis section of theanalysis menu. Selectthe column marked "Year" when asked for the times, select"Died" when asked for deaths and "Withdrawn" when asked forwithdrawals. Select 374 (total deaths and withdrawals) as thenumber alive at the start.

For this example:

Follow-up life table

Interval

Deaths

Withdrawn

At risk

Adj. at risk

P(death)

0 to 1

90

0

374

374

0.240642

1 to 2

76

0

284

284

0.267606

2 to 3

51

0

208

208

0.245192

3 to 4

25

12

157

151

0.165563

4 to 5

20

5

120

117.5

0.170213

5 to 6

7

9

95

90.5

0.077348

6 to 7

4

9

79

74.5

0.053691

7 to 8

1

3

66

64.5

0.015504

8 to 9

3

5

62

59.5

0.05042

9 to 10

2

5

54

51.5

0.038835

10 up

21

26

47

*

*

Interval

P(survival)

Survivors (lx%)

SD of lx%

95% CI for lx%

0 to 1

0.759358

100

*

* to *

1 to 2

0.732394

75.935829

10.57424

71.271289 to 79.951252

2 to 3

0.754808

55.614973

7.87331

50.428392 to 60.482341

3 to 4

0.834437

41.97861

7.003571

36.945565 to 46.922332

4 to 5

0.829787

35.028509

6.747202

30.200182 to 39.889161

5 to 6

0.922652

29.066209

6.651959

24.47156 to 33.805

6 to 7

0.946309

26.817994

6.659494

22.322081 to 31.504059

7 to 8

0.984496

25.378102

6.700832

20.935141 to 30.043836

8 to 9

0.94958

24.984643

6.720449

20.552912 to 29.648834

9 to 10

0.961165

23.724913

6.803396

19.323326 to 28.39237

10 up

*

22.803557

6.886886

18.417247 to 27.483099

We conclude with 95% confidencethat the true population survival rate 5 years after the surgical operationstudied is between 24.5% and 33.8% for people diagnosed as having this cancer.

confidenceintervals

Copyright © 1990-2006 StatsDirectLimited, all rights reserved

Download a free 10 day StatsDirect trial

Abridgedlife table

Menu location: Analysis_Survival_Abridged life table

This function provides a currentlife table (actuarial table) that displays the survival experience of a givenpopulation in abridged form.

The table is constructed by thefollowing definitions of Greenwood (1922)and Chiang (1984):

- where qihat is the probability that an individual will die in the ithinterval, ni is the length of the interval, Mi is thedeath rate in the interval (i.e. the number of individuals dying in theinterval [Di] divided by the mid-year population [Pi], which is the number ofyears lived in the interval by those alive at the start of the interval, i.e.it is the person-time denominator for the rate), and aiis the fraction of the last age interval of life.

To explain ai:When a person dies at a certain age they have lived only a fraction of theinterval in which their age at death sits, the average of all of thesefractions of the interval for all people dying in the interval is call thefraction of the last age interval of life, ai. Infantdeaths tend to occur early in the first year of life (which is the usual firstage interval for abridged life tables). The ai valuefor this interval is around 0.1 in developed countries and higher where infant mortality ratesare higher. The values for young childhood intervals are around 0.4 and foradult intervals are around 0.5. The proper values for aican be calculated from the full death records. If the full records are notavailable then the WHO guidelines are to use the following aivalues for the first interval given the following infant mortality rates:

Infant mortality rate per 1000

ai

< 20

0.09

20 - 40

0.15

40 - 60

0.23

> 60

0.30

The rest of the calculationsproceed using the following formulae on a theoretical standard startingpopulation of 100,000 (the radix value) living at the start. In other words, weare constructing an artificial cohort of 100,000 and overlaying currentmortality experience on them in order to work out life expectancies.

- where w is the number ofintervals, di is the number out of the artificialcohort dying in the ith interval, liis the number out of the artificial cohort alive at the start of the interval,Li is the number of years lived in the interval by the artificial cohort, Ti isthe total number of years lived by those individuals from the artificial cohortattaining the age that starts the interval, and ei isthe observed expectation of life at the age that starts the interval.

Note that the value for the lastinterval length is not important, since this is calculated as an open intervalas above. When preparing your data you will therefore have one less row in theinterval column than in the columns for mid-year population in the interval andthe deaths in the interval. The conventional interval pattern is:

Interval length

Interval

1

0 to 1

4

1 to 4

5

5 to 9

5

10 to 14

5

15 to 19

5

20 to 24

5

25 to 29

5

30 to 34

5

35 to 39

5

40 to 44

5

45 to 49

5

50 to 54

5

55 to 59

5

60 to 64

5

65 to 69

5

70 to 74

5

75 to 79

5

80 to 84

85 up

- whichis extended to 90 nowadays.

Standard errors and confidenceintervals for q and e are calculated using the formulae given by Chiang (1984):

- where S squared e hat alpha isthe variance of the expectation of life at the age of the start of the intervalalpha, and S squared q hat i is the variance of theprobability of death for the ith interval.

If you want to test whether ornot the probability of death in one age interval is statistically significantlydifferent from another interval, or compare the probability of death in a givenage interval from two different populations (e.g. male vs. female), then youcan use the following formulae:

- whereZ is a standard normal test statistic and SE is the standard error of thedifference between the two (ith vs. jth) probabilities of death that you are comparing.

Comparison of two expectation oflife statistics can be made in a similar way to the above, but the standarderror for the difference between two e statistics is simply the square root ofthe sum of the squared standard errors of the e statistics being compared.

Adjusting life expectancy fora given utility

You can specify a weightingvariable for utility to be applied to each interval. This is used, for examplein the calculation of health adjusted life expectancy (HALE) by assuming thatthere is more health utility (sometimes defined by absence of disability) insome periods of life than in others. Wolfson (1996)describes the principles of health adjusted life expectancy.

StatsDirect simply multiplies Ti (the total number of years lived by thoseindividuals from the artificial cohort attaining the age that starts theinterval) by the given ith utility weight, thendivides as usual by li (the number out of theartificial cohort alive at the start of the interval) in order to computeadjusted life expectancy.

Data preparation

Prepare your data in four columnsas follows:

1. Lengthof age interval (w-1 rows corresponding to w intervals as described above)

2. Mid-year population, or number of years lived in the interval by those alive at its start (w rows)

3. Deathsin interval (w rows)

4. Fraction(a) of last age interval of life (w-1 rows)

5. (Utilityweight [optional], e.g. proportion of the interval of life spent withoutdisability in a given population)

If the fraction 'a' is notprovided then it is assumed to be 0.1 for the infant interval, 0.4 for theearly childhood interval and 0.5 for all other intervals. You should endeavour to supply the best estimate of 'a' possible.

Example

From Chiang (1984, p141):The total population of Californiain 1970.

Test workbook (Survivalworksheet: Interval, Population, Deaths, Fraction a).

Abridged life table

Interval

Population

Deaths

Death rate

0 to 1

340483

6234

0.018309

1 to 4

1302198

1049

0.000806

5 to 9

1918117

723

0.000377

10 to 14

1963681

735

0.000374

15 to 19

1817379

2054

0.00113

20 to 24

1740966

2702

0.001552

25 to 29

1457614

2071

0.001421

30 to 34

1219389

1964

0.001611

35 to 39

1149999

2588

0.00225

40 to 44

1208550

4114

0.003404

45 to 49

1245903

6722

0.005395

50 to 54

1083852

8948

0.008256

55 to 59

933244

11942

0.012796

60 to 64

770770

14309

0.018565

65 to 69

620805

17088

0.027526

70 to 74

484431

19149

0.039529

75 to 79

342097

21325

0.062336

80 to 84

210953

20129

0.095419

85 up

142691

22483

0.157564

Interval

Probability of dying [qx]

SE of qx

95% CI for qx

0 to 1

0.018009

0.000226

0.017566 to 0.018452

1 to 4

0.003216

0.000099

0.003022 to 0.00341

5 to 9

0.001883

0.00007

0.001746 to 0.00202

10 to 14

0.00187

0.000069

0.001735 to 0.002005

15 to 19

0.005638

0.000124

0.005395 to 0.005881

20 to 24

0.007729

0.000148

0.007439 to 0.00802

25 to 29

0.007079

0.000155

0.006776 to 0.007383

30 to 34

0.008022

0.00018

0.007669 to 0.008376

35 to 39

0.011193

0.000219

0.010764 to 0.011622

40 to 44

0.016888

0.000261

0.016376 to 0.0174

45 to 49

0.026639

0.000321

0.02601 to 0.027267

50 to 54

0.040493

0.000419

0.039671 to 0.041315

55 to 59

0.062075

0.00055

0.060997 to 0.063153

60 to 64

0.088863

0.000709

0.087474 to 0.090253

65 to 69

0.128933

0.000921

0.127129 to 0.130737

70 to 74

0.180519

0.001181

0.178204 to 0.182833

75 to 79

0.270386

0.001582

0.267286 to 0.273486

80 to 84

0.385206

0.002129

0.381034 to 0.389379

85 up

1

*

* to *

Interval

Living at start [lx]

Dying [dx]

Fraction of last interval of life [ax]

0 to 1

100000

1801

0.09

1 to 4

98199

316

0.41

5 to 9

97883

184

0.44

10 to 14

97699

183

0.54

15 to 19

97516

550

0.59

20 to 24

96966

749

0.49

25 to 29

96217

681

0.51

30 to 34

95536

766

0.52

35 to 39

94769

1061

0.53

40 to 44

93709

1583

0.54

45 to 49

92126

2454

0.53

50 to 54

89672

3631

0.53

55 to 59

86041

5341

0.52

60 to 64

80700

7171

0.52

65 to 69

73529

9480

0.51

70 to 74

64048

11562

0.52

75 to 79

52486

14192

0.51

80 to 84

38295

14751

0.5

85 up

23543

23543

*

Interval

Years in interval [Lx]

Years beyond start of interval [Tx]

0 to 1

98361

7195231

1 to 4

392051

7096870

5 to 9

488900

6704819

10 to 14

488075

6215919

15 to 19

486454

5727844

20 to 24

482921

5241390

25 to 29

479416

4758468

30 to 34

475840

4279052

35 to 39

471354

3803213

40 to 44

464903

3331858

45 to 49

454863

2866955

50 to 54

439827

2412091

55 to 59

417386

1972264

60 to 64

386289

1554878

65 to 69

344417

1168590

70 to 74

292493

824173

75 to 79

227663

531680

80 to 84

154596

304017

85 up

149421

149421

Interval

Expectation of life [ex]

SE of ex

95% CI for ex

0 to 1

71.952313

0.037362

71.879085 to 72.025541

1 to 4

72.270232

0.034115

72.203367 to 72.337097

5 to 9

68.498121

0.033492

68.432478 to 68.563764

10 to 14

63.623174

0.033231

63.558043 to 63.688305

15 to 19

58.737306

0.033025

58.672578 to 58.802034

20 to 24

54.053615

0.032466

53.989981 to 54.117248

25 to 29

49.45559

0.031785

49.393293 to 49.517888

30 to 34

44.790023

0.031151

44.728969 to 44.851077

35 to 39

40.131217

0.030436

40.071563 to 40.190871

40 to 44

35.555493

0.029616

35.497446 to 35.613539

45 to 49

31.119893

0.028788

31.06347 to 31.176317

50 to 54

26.899049

0.027963

26.844242 to 26.953856

55 to 59

22.922407

0.02697

22.869548 to 22.975266

60 to 64

19.267406

0.025794

19.216851 to 19.31796

65 to 69

15.892984

0.024469

15.845026 to 15.940942

70 to 74

12.867973

0.022957

12.822978 to 12.912969

75 to 79

10.129843

0.021419

10.087862 to 10.171824

80 to 84

7.938844

0.018833

7.901931 to 7.975756

85 up

6.346617

*

* to *

Median expectation of life (ageat which half of original cohort survives) = 75.876035

Copyright © 1990-2006 StatsDirectLimited, all rights reserved

Download a free 10 day StatsDirect trial

Log-Rank& Wilcoxon

Menu location: Analysis_Survival_Log-Rank & Wilcoxon.

This function provides methodsfor comparing two or more survival curves where some of the observations may becensored and where the overall grouping maybe stratified. The methods are nonparametric in that they do not makeassumptions about the distributions of survival estimates.

In the absence of censorship(e.g. loss to follow up, alive at end of study) the methods presented herereduce to a Mann-Whitney(two sample Wilcoxon) testfor two groups of survival times and a Kruskal-Wallistest for more than two groups of survival times. StatsDirectgives a comprehensive set of tests for the comparison of survival data that maybe censored (Taroneand Ware, 1977; Kalbfleisch and Prentice, 1980; Cox and Oakes, 1984; Le, 1997).

The null hypothesis tested hereis that the risk of death/event is the same in all groups.

Peto's log-rank test is generally the most appropriate method but thePrentice modified Wilcoxon test is more sensitivewhen the ratio of hazards is higher at early survival times than at late ones (Peto and Peto, 1972;Kalbfleisch and Prentice, 1980). The log-rank test is similar to the Mantel-Haenszeltest and some authors refer to it as the Cox-Mantel test (Mantel and Haenszel,1959; Cox, 1972).

Strata

An optional variable, strata,allows you to sub-classify the groups specified in the group identifiervariable and to test the significance of this sub-classification (Armitage and Berry,1994; Lawless, 1982; Kalbfleisch and Prentice, 1980).

Wilcoxon weights

StatsDirect gives you a choice of three different weighting methods for the generalised Wilcoxon test, these are Peto-Prentice, Gehan-Breslow and Tarone-Ware.The Peto-Prentice method is generally more robust thanthe others but the Gehan statistic is calculatedroutinely by many statistical software packages (Breslow, 1974;Tarone and Ware, 1977; Kalbfleisch and Prentice, 1980; Miller, 1981; Hosmer andLemeshow 1999). You should seek statistical guidance if you plan to use anyweighting method other than Peto-Prentice.

Hazard-ratios

An approximate confidenceinterval for the log hazard-ratio is calculated using the following estimate ofstandard error (se):

- where ei is the extent of exposure to risk of death(sometimes called expected deaths) for group iof k at the jth distinct observed time (sometimes called expected deaths) for group iof k (Armitageand Berry, 1994).

An exact conditional maximumlikelihood estimate of the hazard ratio is optionally given. The exact estimateand its confidence interval (Fisher or mid-P) should be routinely used inpreference to the above approximation. The exponents of Coxregression parameters are also exact estimators of the hazard ratio, butplease note that they are not exact if Breslow'smethod has been used to correct for ties in the regression. Please consult witha statistician if you are considering using Cox regression.

Trend test

If you have more than two groupsthen StatsDirect will calculate a variant of thelog-rank test for trend. If you choose not to enter group scores then they areallocated as 1,2,3 ... n in group order (Armitage and Berry,1994; Lawless, 1982; Kalbfleisch and Prentice, 1980).

Technical validation

The generaltest statistic is calculated around a hypergeometricdistribution of the number of events at distinct event times:

- where theweight wj for the log-rank test is equal to 1,and wj for the generalisedWilcoxon test is ni(Gehan-Breslow method); for the Tarone-Waremethod wj is the square root of ni; and for the Peto-Prenticemethod wj is the Kaplan-Meier survivorfunction multiplied by (ni divided by ni +1). eijis the expectation of death in group i at the jth distinct observed time where djevents/deaths occurred. nijis the number at risk in group i just beforethe jth distinct observed time. The teststatistic for equality of survival across the k groups (populationssampled) is approximately chi-square distributed on k-1 degrees offreedom. The test statistic for monotone trend is approximately chi-squaredistributed on 1 degree of freedom. c is avector of scores that are either defined by the user or allocated as 1 to k.

Variance isestimated by the method that Peto (1977) refers to as"exact".

Thestratified test statistic is expressed as (Kalbfleischand Prentice, 1980):

- where the statistics defined above are calculated withinstrata then summed across strata prior to the generalisedinverse and transpose matrix operations.

Example

From Armitage and Berry(1994, p. 479).

Test workbook (Survivalworksheet: Stage Group, Time, Censor).

The following data represent thesurvival in days since entry to the trial of patients with diffuswww.lindalemus.com/rencai/e histiocytic lymphoma. Two different groups of patients,those with stage III and those with stage IV disease,are compared.

Stage 3: 6, 19, 32, 42, 42, 43*,94, 126*, 169*, 207, 211*, 227*, 253, 255*, 270*, 310*, 316*, 335*, 346*

Stage 4: 4, 6, 10, 11, 11, 11,13, 17, 20, 20, 21, 22, 24, 24, 29, 30, 30, 31, 33, 34, 35, 39, 40, 41*, 43*,45, 46, 50, 56, 61*, 61*, 63, 68, 82, 85, 88, 89, 90, 93, 104, 110, 134, 137,160*, 169, 171, 173, 175, 184, 201, 222, 235*, 247*, 260*, 284*, 290*, 291*,302*, 304*, 341*, 345*

* = censored data (patient stillalive or died from an unrelated cause)

To analysethese data in StatsDirect you must first prepare themin three workbook columns as shown below:

Stage group

Time

Censor

1

6

1

1

19

1

1

32

1

1

42

1

1

42

1

1

43

0

1

94

1

1

126

0

1

169

0

1

207

1

1

211

0

1

227

0

1

253

1

1

255

0

1

270

0

1

310

0

1

316

0

1

335

0

1

346

0

2

4

1

2

6

1

2

10

1

2

11

1

2

11

1

2

11

1

2

13

1

2

17

1

2

20

1

2

20

1

2

21

1

2

22

1

2

24

1

2

24

1

2

29

1

2

30

1

2

30

1

2

31

1

2

33

1

2

34

1

2

35

1

2

39

1

2

40

1

2

41

0

2

43

0

2

45

1

2

46

1

2

50

1

2

56

1

2

61

0

2

61

0

2

63

1

2

68

1

2

82

1

2

85

1

2

88

1

2

89

1

2

90

1

2

93

1

2

104

1

2

110

1

2

134

1

2

137

1

2

160

0

2

169

1

2

171

1

2

173

1

2

175

1

2

184

1

2

201

1

2

222

1

2

235

0

2

247

0

2

260

0

2

284

0

2

290

0

2

291

0

2

302

0

2

304

0

2

341

0

2

345

0

Alternatively, open the testworkbook using the file open function of the file menu. Then select Log-rank& Wilcoxon from the Survival Analysis section ofthe analysis menu. Selectthe column marked "Stage group" when asked for the group identifier,select "Time" when asked for times and "Censor" for censorship. Click on the cancel button whenasked about strata.

For this example:

Logrank and Wilcoxon tests

Log Rank (Peto):

For group 1 (Stage group = 1)

Observed deaths = 8

Extent of exposure to risk ofdeath = 16.687031

Relative rate = 0.479414

For group 2 (Stage group = 2)

Observed deaths = 46

Extent of exposure to risk ofdeath = 37.312969

Relative rate = 1.232815

test statistics:

-8.687031, 8.687031

variance-covariance matrix:

0.088912

-11.24706

-11.24706

11.24706

Chi-square for equivalence ofdeath rates = 6.70971 P = 0.0096

Hazard Ratio, (approximate 95%confidence interval)

Group 1 vs. Group 2 = 0.388878,(0.218343 to 0.692607)

Conditional maximum likelihoodestimates:

Hazard Ratio = 0.381485

Exact Fisher 95% confidenceinterval = 0.154582 to 0.822411

Exact Fisher one sided P =0.0051, two sided P = 0.0104

Exact mid-P 95% confidenceinterval = 0.167398 to 0.783785

Exact mid-P one sided P = 0.0034,two sided P = 0.0068

Generalised Wilcoxon (Peto-Prentice):

test statistics:

-5.19836, 5.19836

variance-covariance matrix:

0.201506

-4.962627

-4.962627

4.962627

Chi-square for equivalence ofdeath rates = 5.44529 P = 0.0196

Both log-rank and Wilcoxon tests demonstrated a statistically significantdifference in survival experience between stage 3 andstage 4 patients in this study.

Stratified example

From Peto et al. (1977):

Group

Trial Time

Censorship

Strat

1

8

1

1

1

8

1

2

2

13

1

1

2

18

1

1

2

23

1

1

1

52

1

1

1

63

1

1

1

63

1

1

2

70

1

2

2

70

1

2

2

180

1

2

2

195

1

2

2

210

1

2

1

220

1

2

1

365

0

2

2

632

1

2

2

700

1

2

1

852

0

2

2

1296

1

2

1

1296

0

2

1

1328

0

2

1

1460

0

2

1

1976

0

2

2

1990

0

2

2

2240

0

2

Censorship 1 = death event

Censorship 0 = lost to follow-up

Stratum 1 = renal impairment

Stratum 2 = no renal impairment

The table above shows you how toprepare data for a stratified log-rank test in StatsDirect.This example is worked through in the second of two classic papers by Richard Peto and colleagues (Peto et al., 1977,1976). Please note that StatsDirect uses the moreaccurate variance formulae mentioned in the statistical notes section at theend of Peto etal. (1977).

P values

Copyright © 1990-2006 StatsDirectLimited, all rights reserved

Download a free 10 day StatsDirect trial

Wei-Lachin test

Menu location: Analysis_Survival_Wei-Lachin.

This function gives a two sampledistribution free method for the comparison of two multivariate distributionsof survival (time-to-event) data that may be censored(incomplete, e.g. alive at end of study or lost to follow up). Multivariatemethods such as this should be used only with expert statistical guidance.

Wei and Lachingeneralise the Log-rank and Gehangeneralised Wilcoxon tests(using a random censorship model) for multivariate survival data with two maingroups. (Makuch and Escobar,1991; Wei and Lachin, 1984).

Data preparation

StatsDirect asks you for a group identifier, this could be a column of 1 and 2representing the two groups. You then select k pairs of survival time (time-to-event)and censorship columns for k repeat times. Censored data are coded as 0 anduncensored data are coded as 1.

Repeat times may representseparate factors or the observation of the same factor repeated on k occasions.For example, time to develop symptoms could be analysedfor k different symptoms in a group of patients treated with drug x andcompared with a group of patients not treated with drug x.

Missing data can be code eitherby entering a missing data symbol * as the time, or by setting censored equalto 0 and time less than the minimum uncensored time in your data set.

For further details please referto Makuch andEscobar (1991) and Wei and Lachin(1984).

Technical Validation

Wei and Lachin's multivariate tests are calculated for the case totwo multivariate distributions, and the intermediate univariatestatistics are given. The algorithm used for the method is that given by Makuch and Escobar (1991).

The generalunivariate statistic for comparing the time to event(of component type k out of m multivariate components) of the twogroups is calculated as:

- where n1is the number of event times per component in group 1; n2 is the numberof event times per component in group 2; n is the total number of eventtimes per component; rik is the number at riskat time t(i) in the kthcomponent; D is equal to 0 if an observation is censoredor 1 otherwise; eik is the expected proportion ofevents in group i for the kthcomponent; and wj is equal to 1 for thelog-rank method or (r1k+r2k)/n for the Gehan-Breslow generalised Wilcoxon method.

The univariate statistic for the kthcomponent of the multivariate survival data is calculated as:

- where skkcaret is the kth diagonal element of theestimated variance-covariance matrix that is calculated as described by Makuch and Escobar (1991).

An omnibustest that the two multivariate distributions are equal is calculated as:

- where T' is the transpose of the vector of univariate test statistics and S-1 is the generalised inverse of the estimated variance-covariancematrix.

Astochastic ordering test statistic is calculated as:

Note thatthe P value given with the stochastic ordering (linear combination) statisticis two sided, some authors prefer one sided inference (Davis, 1994). If you make a one sided inference then youare considering only ascending or only descending ordering, and you areassuming that observing an order in the opposite direction to that expectedwould be unimportant to your conclusions.

The teststatistics are all asymptotically normally distributed.

Example

From Makuch and Escobar(1991).

Test workbook (Survival worksheet:Treatment Gp, Time m1, Censor m1, Time m2, Censor m2,Time m3, Censor m3, Time m4, Censor m4).

The following data represent thetimes in days it took in vitro cultures of lymphocytes to reach a level of p24antigen expression. The cultures where taken from patientsinfected with HIV-1 who had advanced AIDS or AIDS related complex. Theidea was that patients whose cultures took a short time to express p24 antigenhad a greater load of HIV-1. The two groups represented patients on twodifferent treatments. The culture was run for 30 days and specimens whichremained negative or which became contaminated were called censored (=0). Thetests were run over four 30 day periods.

Treatment Gp

time m1

censor m1

time m2

censor m2

time m3

censor m3

time m4

censor m4

1

8

1

0

0

25

0

21

1

1

6

1

4

1

5

1

5

1

1

6

1

5

1

28

0

18

1

1

14

0

35

0

23

1

19

0

1

7

1

0

0

13

1

0

0

1

5

1

4

1

27

1

8

1

1

5

1

21

0

6

1

14

1

1

6

1

10

1

14

1

18

1

1

7

1

4

1

15

1

8

1

1

6

1

5

1

5

1

5

1

1

4

1

5

1

6

1

3

1

1

5

1

4

1

7

1

5

1

1

21

0

5

1

0

0

6

1

1

13

1

27

0

21

0

8

1

1

4

1

27

0

7

1

6

1

1

6

1

3

1

7

1

8

1

1

6

1

0

0

5

1

5

1

1

6

1

0

0

4

1

6

1

1

7

1

9

1

6

1

7

1

1

8

1

15

1

8

1

0

0

1

18

0

27

0

18

0

9

1

1

16

1

14

1

14

1

6

1

1

15

1

9

1

12

1

12

1

2

4

1

5

1

4

1

3

1

2

8

1

22

1

25

0

0

0

2

6

1

6

1

8

1

5

1

2

7

1

10

1

10

1

18

1

2

5

1

14

1

17

0

6

1

2

3

1

5

1

8

1

6

1

2

6

1

11

1

6

1

13

1

2

6

1

0

0

15

1

7

1

2

6

1

12

1

19

1

8

1

2

6

1

25

0

0

0

22

0

2

4

1

7

1

5

1

7

1

2

5

1

7

1

4

1

6

1

2

3

1

9

1

7

1

6

1

2

9

1

17

1

0

0

21

0

2

6

1

4

1

8

1

14

1

2

5

1

5

1

7

1

16

0

2

12

1

18

0

14

1

0

0

2

9

1

11

1

15

1

18

0

2

6

1

5

1

9

1

0

0

2

18

0

8

1

10

1

13

1

2

4

1

4

1

5

1

10

1

2

3

1

10

1

0

0

21

0

2

8

1

7

1

10

1

12

1

2

3

1

6

1

7

1

9

1

To analysethese data in StatsDirect you must first prepare themin 9 workbook columns as shown above. Alternatively, open the test workbookusing the file open function of the file menu. Then select Wei-Lachin from the Survival Analysis section of the analysismenu. Selectthe column marked "Treatment GP" when asked for the group identifier.Next, enter the number of repeat times as four. Select "time m1" and"censor m1" for time and censorshipfor repeat time one. Repeat this selection process for the other three repeattimes.

For this example:

Wei-LachinAnalysis

Univariate Generalised Wilcoxon(Gehan)

total cases = 47 by group = 23 24

Observed failures by group = 2023

repeat time = 1

Wei-Lachint = -0.527597

Wei-Lachinvariance = 0.077575

z = -1.89427

chi-square = 3.588261, P = .0582

Observed failures by group = 1421

repeat time = 2

Wei-Lachint = 0.077588

Wei-Lachinvariance = 0.056161

z = 0.327397

chi-square = 0.107189, P = .7434

Observed failures by group = 1819

repeat time = 3

Wei-Lachint = -0.11483

Wei-Lachinvariance = 0.060918

z = -0.465244

chi-square = 0.216452, P = .6418

Observed failures by group = 2016

repeat time = 4

Wei-Lachint = 0.335179

Wei-Lachinvariance = 0.056281

z = 1.412849

chi-square = 1.996143, P = .1577

Multivariate Generalised Wilcoxon (Gehan)

Covariance matrix:

0.077575

0.026009

0.056161

0.035568

0.020484

0.060918

0.023525

0.016862

0.026842

0.056281

Inverse of covariance matrix:

19.204259

-5.078483

22.22316

-8.40436

-3.176864

25.857118

-2.497583

-3.020025

-7.867237

23.468861

repeat times = 4

chi squared omnibus statistic = 9.242916 P = .0553

stochastic ordering z = -0.30981 one sided P = 0.3784, two sided P = 0.7567

Univariate Log-Rank

total cases = 47 by group = 23 24

Observed failures by group = 2023

repeat time = 1

Wei-Lachint = -0.716191

Wei-Lachinvariance = 0.153385

z = -1.828676

chi-square = 3.344058, P = .0674

Observed failures by group = 14 21

repeat time = 2

Wei-Lachint = -0.277786

Wei-Lachinvariance = 0.144359

z = -0.731119

chi-square = 0.534536, P = .4647

Observed failures by group = 1819

repeat time = 3

Wei-Lachint = -0.372015

Wei-Lachinvariance = 0.150764

z = -0.9581

chi-square = 0.917956, P = .338

Observed failures by group = 2016

repeat time = 4

Wei-Lachint = 0.619506

Wei-Lachinvariance = 0.143437

z = 1.635743

chi-square = 2.675657, P = .1019

Multivariate Log-Rank

Covariance matrix:

0.153385

0.049439

0.144359

0.052895

0.050305

0.150764

0.039073

0.047118

0.052531

0.143437

Inverse of covariance matrix:

7.973385

-1.779359

8.69056

-1.892007

-1.661697

8.575636

-0.894576

-1.761494

-2.079402

8.555558

repeat times = 4

chi squared omnibus statistic = 9.52966, P = .0491

stochastic ordering z = -0.688754, one sided P = 0.2455, two sided P = 0.491

Here the multivariate log-ranktest has revealed a statistically significant difference between the treatmentgroups which was not revealed by any of the individual univariatetests. For more detailed discussion of each result parameter see Wei and Lachin(1984).

P values

Copyright © 1990-2006 StatsDirectLimited, all rights reserved

Download a free 10 day StatsDirect trial

Coxregression

Menu location: Analysis_Survival_Cox Regression.

This function fits Cox's proportionalhazards model for survival-time (time-to-event) outcomes on one or morepredictors.

Cox regression (or proportionalhazards regression) is method for investigating the effect of several variablesupon the time a specified event takes to happen. In the context of an outcomesuch as death this is known as Cox regression for survival analysis. The methoddoes not assume any particular "survival model" but it is not trulynon-parametric because it does assume that the effects of the predictorvariables upon survival are constant over time and are additive in one scale.You should not use Cox regression without the guidance of a Statistician.

Provided that the assumptions ofCox regression are met, this function will provide better estimates of survivalprobabilities and cumulative hazard than those provided by the Kaplan-Meierfunction.

Hazard and hazard-ratios

Cumulative hazard at a time t isthe risk of dying between time 0 and time t, and the survivor function at timet is the probability of surviving to time t (see also Kaplan-Meierestimates).

The coefficients in a Coxregression relate to hazard; a positive coefficient indicates a worse prognosisand a negative coefficient indicates a protective effect of the variable withwhich it is associated.

The hazards ratio associated witha predictor variable is given by the exponent of its coefficient; this is givenwith a confidence interval under the "coefficient details" option in StatsDirect. The hazards ratio may also be thought of asthe relative death rate, see Armitage and Berry(1994). The interpretation of the hazards ratio depends upon themeasurement scale of the predictor variable in question, see Sahai and Kurshid(1996) for further information on relative risk of hazards.

Time-dependent and fixedcovariates

In prospective studies, whenindividuals are followed over time, the values of covariates may change withtime. Covariates can thus be divided into fixed and time-dependent. A covariateis time dependent if the difference between its values for two differentsubjects changes with time; e.g. serum cholesterol. A covariate is fixed if itsvalues can not change with time, e.g. sex or race. Lifestyle factors andphysiological measurements such as blood pressure are usually time-dependent.Cumulative exposures such as smoking are also time-dependent but are oftenforced into an imprecise dichotomy, i.e. "exposed" vs."not-exposed" instead of the more meaningful "time of exposure".There are no hard and fast rules about the handling of time dependentcovariates. If you are considering using Cox regression you should seek thehelp of a Statistician, preferably at the design stage of the investigation.

Model analysis and deviance

A test of the overall statisticalsignificance of the model is given under the "model analysis" option.Here the likelihood chi-square statistic is calculated by comparing thedeviance (- 2 * log likelihood) of your model, with all of the covariates youhave specified, against the model with all covariates dropped. The individualcontribution of covariates to the model can be assessed from the significancetest given with each coefficient in the main output; this assumes a reasonablylarge sample size.

Deviance is minus twice the logof the likelihood ratio for models fitted by maximum likelihood (Hosmer and Lemeshow,1989 and 1999; Cox and Snell, 1989; Pregibon, 1981). The value of adding aparameter to a Cox model is tested by subtracting the deviance of the modelwith the new parameter from the deviance of the model without the newparameter, the difference is then tested against a chi-square distribution withdegrees of freedom equal to the difference between the degrees of freedom ofthe old and new models. The model analysis option tests the model you specifyagainst a model with only one parameter, the intercept; this tests the combinedvalue of the specified predictors/covariates in the model.

Some statistical packages offerstepwise Cox regression that performs systematic tests for differentcombinations of predictors/covariates. Automatic model building procedures suchas these can be misleading as they do not consider the real-world importance ofeach predictor, for this reason StatsDirect does notinclude stepwise selection.

Survival and cumulative hazardrates

The survival/survivorshipfunction and the cumulative hazard function (as discussed under Kaplan-Meier)are calculated relative to the baseline (lowest value of covariates) at eachtime point. Cox regression provides a better estimate of these functions thanthe Kaplan-Meier method when the assumptions of the Cox model are met and thefit of the model is strong.

You are given the option to‘centre continuous covariates’ – this makes survival and hazard functionsrelative to the mean of continuous variables rather than relative to theminimum, which is usually the most meaningful comparison.

If you have binary/dichotomouspredictors in your model you are given the option to calculate survival andcumulative hazards for each variable separately.

Data preparation

·Time-to-event, e.g. time a subject in atrial survived.

·Event / censor code - this must be ³1 (event(s) happened) or 0 (no event atthe end of the study, i.e. "right censored").

·Strata - e.g. centre code for amulti-centre trial. Be careful with your choice of strata; seek the advice of aStatistician.

·Predictwww.lindalemus.com/kuaiji/ors - these are also referred toas covariates, which can be a number of variables that are thought to berelated to the event under study. If a predictor is a classifier variable withmore than two classes (i.e. ordinal or nominal) then you must first use the dummyvariable function to convert it to a series of binary classes.

Technical validation

StatsDirect optimises the log likelihood associatedwith a Cox regression model until the change in log likelihood with iterationsis less than the accuracy that you specify in the dialog box that is displayedjust before the calculation takes place (Lawless, 1982;Kalbfleisch and Prentice, 1980; Harris, 1991; Cox and Oakes, 1984; Le, 1997;Hosmer and Lemeshow, 1999).

The calculation options dialog boxsets a value (default is 10000) for "SPLITTING RATIO"; this is theratio in proportionality constant at a time t above which StatsDirectwill split your data into more strata and calculate an extended likelihoodsolution, see Brysonand Johnson, (1981).

Ties are handled by Breslow's approximation (Breslow, 1974).

Cox-Snell residuals arecalculated as specified by Cox and Oakes (1984).Cox-Snell, Martingale and deviance residuals are calculated as specified by Collett (1994).

Baseline survival and cumulativehazard rates are calculated at each time. Maximum likelihood methods are used,which are iterative when there is more than one death/event at an observed time(Kalbfleisch andPrentice, 1973). Other software may use the less precise Breslow estimates for these functions.

Example

From Armitage and Berry(1994, p. 479).

Test workbook (Survivalworksheet: Stage Group, Time, Censor).

The following data represent thesurvival in days since entry to the trial of patients with diffuse histiocytic lymphoma. Two different groups of patients,those with stage III and those with stage IV disease,are compared.

Stage3: 6, 19, 32, 42, 42, 43*, 94, 126*, 169*, 207, 211*, 227*, 253, 255*, 270*,310*, 316*, 335*, 346*

Stage4: 4, 6, 10, 11, 11, 11, 13, 17, 20, 20, 21, 22, 24, 24, 29, 30, 30, 31, 33,34, 35, 39, 40, 41*, 43*, 45, 46, 50, 56, 61*, 61*, 63, 68, 82, 85, 88, 89, 90,93, 104, 110, 134, 137, 160*, 169, 171, 173, 175, 184, 201, 222, 235*, 247*,260*, 284*, 290*, 291*, 302*, 304*, 341*, 345*

* = censored data (patient stillalive or died from an unrelated cause)

To analysethese data in StatsDirect you must first prepare themin three workbook columns as shown below:

Stage group

Time

Censor

1

6

1

1

19

1

1

32

1

1

42

1

1

42

1

1

43

0

1

94

1

1

126

0

1

169

0

1

207

1

1

211

0

1

227

0

1

253

1

1

255

0

1

270

0

1

310

0

1

316

0

1

335

0

1

346

0

2

4

1

2

6

1

2

10

1

2

11

1

2

11

1

2

11

1

2

13

1

2

17

1

2

20

1

2

20

1

2

21

1

2

22

1

2

24

1

2

24

1

2

29

1

2

30

1

2

30

1

2

31

1

2

33

1

2

34

1

2

35

1

2

39

1

2

40

1

2

41

0

2

43

0

2

45

1

2

46

1

2

50

1

2

56

1

2

61

0

2

61

0

2

63

1

2

68

1

2

82

1

2

85

1

2

88

1

2

89

1

2

90

1

2

93

1

2

104

1

2

110

1

2

134

1

2

137

1

2

160

0

2

169

1

2

171

1

2

173

1

2

175

1

2

184

1

2

201

1

2

222

1

2

235

0

2

247

0

2

260

0

2

284

0

2

290

0

2

291

0

2

302

0

2

304

0

2

341

0

2

345

0

Alternatively, open the testworkbook using the file open function of the file menu. Then select Coxregression from the survival analysis section of the analysis menu. Selectthe column marked "Time" when asked for the times, select"Censor" when asked for death/censorship,click on the cancel button when asked about strata and when asked aboutpredictors and select the column marked "Stage group".

For this example:

Cox(proportional hazards) regression

80subjects with 54 events

Deviance(likelihood ratio) chi-square = 7.634383 df= 1 P = 0.0057

Stagegroup b1 = 0.96102 z = 2.492043 P = 0.0127

Coxregression - hazard ratios

Parameter

Hazard ratio

95% CI

Stage group

2.614362

1.227756 to 5.566976

Parameter

Coefficient

Standard Error

Stage group

0.96102

0.385636

Coxregression - model analysis

Loglikelihood with no covariates = -207.554801

Loglikelihood with all model covariates = -203.737609

Deviance(likelihood ratio) chi-square = 7.634383 df= 1 P = 0.0057

The significance test for thecoefficient b1 tests the null hypothesis that it equals zero and thus that itsexponent equals one. The confidence interval for exp(b1)is therefore the confidence interval for the relative death rate or hazardratio; we may therefore infer with 95% confidence that the death rate fromstage 4 cancers is approximately 3 times, and at least 1.2 times, the risk fromstage 3 cancers.

...
关于我们 - 联系我们 -版权申明 -诚聘英才 - 网站地图 - 医学论坛 - 医学博客 - 网络课程 - 帮助
医学全在线 版权所有© CopyRight 2006-2046,
皖ICP备06007007号
百度大联盟认证绿色会员可信网站 中网验证
Baidu
map