医学全在线
搜索更多精品课程:
热 门:外科内科学妇产科儿科眼科耳鼻咽喉皮肤性病学骨科学全科医学医学免疫学生理学病理学诊断学急诊医学传染病学医学影像药 学:药理学药物化学药物分析药物毒理学生物技术制药生药学中药学药用植物学方剂学卫生毒理学检 验:理化检验 临床检验基础护 理:外科护理妇产科护理儿科护理 社区护理五官护理护理学内科护理护理管理学中 医:中医基础理论中医学针灸学刺法灸法学口 腔:口腔内科口腔外科口腔正畸口腔修复口腔组织病理生物化学:生物化学细胞生物学病原生物学医学生物学分析化学医用化学其 它:人体解剖学卫生统计学人体寄生虫学仪器分析健康评估流行病学临床麻醉学社会心理学康复医学法医学核医学危重病学中国医史学
您现在的位置: 医学全在线 > 精品课程 > 卫生统计学 > 南方医科大学 > 正文:医学统计学电子教材:Data Preparation
    

医学统计学-电子教材:Data Preparation

医学统计学:电子教材 Data Preparation:ContentDatapreparationTransformationsFillSeriesMeasurementscalesanddatatypesConvertingbetweenunitsLogtransformLogittransformationProbittransformationAngulartransformationCumulativetransformationLadder

Content

PageData preparation

Page Transformations

Page Fill Series

Page Measurement scales and data types

Page Converting between units

Page Log transform

Page Logit transformation

Page Probit transformation

Page Angular transformation

Page Cumulative transformation

Page Ladder of powers transformation

Page Apply function

Page Date and time intervals

Page Ranking

Page Normal scores

Page Standardization

Page Sort data to new column

Page Sort within worksheet

Page Pairwise

Page Pairwise differences

Page Pairwise means

Page Pairwise slopes

Page Text to numbers

Page Random number generation

Page Rotate data block

Page Dummy variables

Page Search and replace

Page Clear missing data

Page Categorise

Page Extract a subset of data

Page Split by group identifier

Page Combine groups and form an identifier

Page Tabulate

Page Detabulate

Page Crosstabs

Arial;Courier New;Symbol;Helvetica;Courier;Tms Rmn;Helv;New York;System;Wingdings;MSMincho ;Batang ;SimSun ;PMingLiU ;MS Gothic ;Dotum ;SimHei ;MingLiU ;Mincho ;Gulim ;Century;AngsanaNew;CordiaNew;Mangal;Latha;Sylfaen;Vrinda;Raavi;Shruti;Sendnya;Gautami;Tunga;Estrangelo Edessa;Kartika;Arial Unicode MS;Tahoma;Verdana;TimesNew Roman CE;TimesNew Roman Cyr;Times New Roman Greek;Times New Roman Tur;Times New Roman (Hebrew);Times New Roman (Arabic);Times New RomanBaltic;Times New Roman (Vietnamese);Arial CE;Arial Cyr;Arial Greek;ArialTur;Arial (Hebrew);Arial (Arabic);Arial Baltic;Arial (Vietnamese);Courier NewCE;Courier New Cyr;Courier New Greek;Courier New Tur;Courier New(Hebrew);Courier New (Arabic);Courier New Baltic;Courier New(Vietnamese);Sylfaen Cyr;Sylfaen Greek;Verdana CE;Verdana Cyr;VerdanaGreek;Verdana Tur;Verdana Baltic;Verdana (Vietnamese);

Data preparation. 3

Transformations. 4

Fill series. 4

Measurement scales and data types. 5

Converting between units. 6

Logarithmic transformation. 8

Logarithmic transformation. 9

Probit transformation. 9

Angular transformation. 10

Cumulative transformation. 10

Ladder of powers transformations. 11

Apply function. 11

Date and time intervals. 13

Ranking.. 14

Normal scores. 15

Standardization. 16

Sort data to new column. 17

Sort within worksheet. 18

Pairwise. 19

Pairwise differences. 20

Pairwise means. 20

Pairwise slopes. 21

Text to numbers. 22

Random number generation. 22

Rotate data block. 26

Dummy variables. 27

Search and replace. 28

Clear missing data.. 31

Categorise. 32

Extract a subset of data.. 33

Split by group identifier.. 34

Combine groups and form an identifier.. 35

Tabulate. 36

Detabulate. 36

Crosstabs. 37

 

Datapreparation

·WORKBOOK

·EXCELLINKS

·MISSINGDATA

·TRANSFORMATIONS

·APPLYFUNCTION

·NORMALSCORES

·STANDARDIZATION

·CONVERTINGBETWEEN UNITS

·RANKING

·SORTINGDATA TO NEW COLUMNS IN WORKSHEETS

·SORTINGEXISTING CELLS WITHIN WORKSHEETS

·DATE& TIME INTERVALS

·PAIRWISE

·RANDOMNUMBERS

·ROTATEDATA BLOCK

·DUMMYVARIABLES

·TEXTTO NUMBERS

·SEARCH& REPLACE

·CLEARMISSING DATA

·CATEGORISE

·EXTRACTVARIABLES

·COMBINEGROUPS AND CREATE A GROUP IDENTDIFIER

·SPLITBY GROUP IDENTIFIER

·TABULATE

·DETABULATE

·CROSSTABS

·FREQUENCIES

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Transformations

Menu location: Data_Transformations.

This section deals with somecommon transformations:

·Log(natural)

·Log (base10)

·Logit

·Probit

·Angular

·Cumulative

·Ladderof powers

Please note that othertransformations can be achieved by applyinga function to a column of data as described below.

Transforming a variable changesits measurement scale. Reasons for transformation include stabilizing variance,linearizing relationships, normalizing distributions and making data easier tohandle numerically (Armitage and Berry,1994; Bland and Altman, 1996c). Details of the above transformations arediscussed under their own titles, here we consider how to deal with othertransformations, for example a power transform:

Power

Example, for square root applyfunction v1^.5 or sqr(v1), for standardized version (v1^.5-1)/.5

Limitation, v1 must be ³ 0.

Standardized form: y = (x^c-1) /c

For c = 0.5 (square root):

Variance of x can be stabilizedespecially with Poisson distributed data.

For c = -1 (reciprocal):

Variance of x can be stabilizedwith distant high outliers, e.g. very long survival times c.f. mainly shorterones.

For c = 2 (square):

Variance can be stabilized whenvar(x) decreases with increasing x and with negatively skewed x.

Fill series

Menu location: Data_FillSeries.

This function fills a series ofrows in a worksheet with numbers according to a formula that you specify.

The default formula is VX+1 witha starting value of 1, which will give a sequence of numbers from 1 to thelength of series that you specify. A formula of VX+10 would give a series ofintegers spaced at 10, i.e. 10, 20 30, 40 etc. Complex non-linear formulae canbe used; see calculatorfor a list of functions that you can use in a formula.

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Measurementscales and data types

It is important, in statisticalanalysis, to know about the different scales of measurement, these are:

·INTERVAL
Scale with a fixed and defined interval e.g.temperature or time.

·ORDINAL
Scale for ordering observations from low tohigh with any ties attributed to lack of measurement sensitivity e.g.score from a questionnaire.

·NOMINAL with order
Scale for grouping into categories with order e.g. mild, moderate orsevere. This can be difficult to separate from ordinal.

·NOMINAL without order
Scale for grouping into unique categoriese.g. eye colour.

·DICHOTOMOUS
As for nominal but two categories only e.g.male/female.

In addition to the classificationof measurement scales, other related terms are used to describe types of data:

·CATEGORICAL vs. NUMERICAL (quantitative vs. qualitative)
Data that represent categories, such as dichotomous (two categories) andnominal (more than two categories) observations, are collectively calledcategorical (qualitative). Data that are counted or measured using anumerically defined method are called numerical (quantitative).

·DISCRETE vs. ORDERED CATEGORICAL
Discrete data arise from observations that can only take certain numericalvalues, usually counts such as number of children or number of patientsattending a clinic in a year. Ordered categorical data are sometimes treated asdiscrete data, this is wrong. For example, using the Registrar General'sclassification of social class, it would be wrong to say that class I is fivetimes the socio-economic status as class V, as there is not a strict numericalrelationship between these categories. It follows, therefore, that averagesocial class is a meaningless statistic. Thus, ordered categorical data shouldnot be treated as discrete data for statistical analysis. Discrete data may betreated as ordered categorical data in statistical analysis, but someinformation is lost in doing so.

·CONTINUOUS
Continuous data are numerical data that can theoretically be measured ininfinitely small units. For example, blood pressure is usually measured to thenearest 2mm Hg, but could bemeasured with much greater resolution of difference. The interval measurementscale is intended for continuous data. Sometimes continuous data are givendiscrete values at certain thresholds, for example age a last birthday is adiscrete value but age itself is a continuous quantity; in these situations itis reasonable to treat discrete values as continuous. Remember that informationis lost when continuous data are recorded only in ranges (ordered categories),and the statistical analysis of continuous data is more powerful than that ofcategorical data.

·PERCENTAGES and RATIOS
Percentages or ratios summarise two pieces of information, namely theirconstituent numerator and denominator values. Simple ratios (0 to 1, i.e. thedenominator is the maximum possible value that the numerator can take) can be treatedas continuous data. More difficult to analyse data arise when the ratiorepresents a change, and the value can be negative. Ratios of observationscompared with reference values, e.g. height relative to the mean of a referencepopulation for a given sex and age, are difficult to handle as values may falleither side of 1 (100%).

Many statistical methods areappropriate only for data of certain measurement scales. When selectinga statistical method, it is essential to understand how the data to beanalysed were measured. The best stage of investigation for ponderingmeasurement scales is the design stage, at which the statistical limitationsimposed by certain measurement scales may influence your choice of observationsand methods of measurement.

Note that transforming datachanges their measurement scales.

See also:

Normalscores

Standardization

Ranking

Convertingbetween units

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Convertingbetween units

Converting from 1:

multiply by:

to give 1:

Atmospheres

76

CM of Mercury (0°C)

Atmospheres

29.921

Inches of Mercury (32°F)

Atmospheres

2116.32

Pounds/sq ft

Atmospheres

14.696

Pounds/sq ft

Barrels, oil

42

Gallon (US)

Barrels (API)

0.159

Meter³

BTU (15.56°C)

1055

Joule

BTU/Minute

0.0236

Horsepower

Calories (mean)

4.19

Joule

Centimetres

0.3937

Inches

Cubic centimetres

0.061

Cubic inches

Cubic Centimetres

0.00022

Gallons (British)

Cubic Centimetres

0.00026

Gallons (US)

Cubic Feet

0.037

Cubic Yards (US)

Cubic Feet

6.2288

Gallons (British)

Cubic Feet

7.4805

Gallons (US)

Cubic Feet

28.3162

Litres

Cubic Inches

16.3872

Cubic Cm

Cubic Meters

35.314

Cubic Feet

Cubic Meters

219.969

Gallons (British)

Cubic Meters

264.173

Gallons (US)

Degrees (F)

(Fahrenheit - 32) / 1.8

Degrees (C)

Degrees (C)

(Celsius * 1.8) + 32

Degrees (F)

Feet

0.3048

Meters

Feet of Water (39.2°F)

0.8826

Inches of Mercury (32°F)

Feet of Water (39.2°F)

62.427

Pounds/sq ft

Feet/minute

0.0183

Kilometres/hour

Feet/minute

0.005

Meters/second

Gallons (British)

0.1605

Cubic Ft

Gallons (British)

1.2009

Gallons(US)

Gallons (British)

4.5459

Litres

Gallons (US)

0.1337

Cubic Ft

Gallons (US)

0.8327

Gallons(British)

Gallons (US)

3.7853

Litres

Grams

0.03527

Ounces

Horsepower

1.014

Cheval-Vapeur

Horsepower

745.7

Watts

Inches

2.54

Centimetres

Inches of Mercury (32°F)

0.0334

Atmospheres

Kilograms

2.2046

Pounds

Kilometres

3280

Feet

Kilometres

0.6213

Miles

Kilowatts

56.884

BTU/Minute

Litres

0.0353

Cubic Feet

Litres

0.2199

Gallons (British)

Litres

0.2641

Gallons (US)

Meters

3.2808

Feet

Miles

5280

Feet

Miles

1.6093

Kilometres

Miles

320

Rods

Miles (nautical)

6080

Feet

Millilitres

0.035

Ounces (Fluid-British)

Millilitres

0.0338

Ounces (Fluid-US)

Millimetres

0.039

Inches

Ounces

28.3495

Grams

Ounces (British)

28.4130

Cubic Cm

Ounces (US)

29.5737

Cubic Cm

Pounds

453.5924

Grams

Pounds/sq ft

0.000472

Atmospheres

Pounds/sq in

0.0680

Atmospheres

Pounds/sq in

2.036

Inches of Mercury (32°F)

Quarts (British)

1136.521

Cubic Cm

Quarts (US)

946.3586

Cubic Cm

Tons (long)

1016.047

Kilograms

Tons (long)

2240

Pounds

Tons (long)

1.016

Tons (metric)

Tons (long)

1.12

Tons (short)

Yards

91.44

Centimetres

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Logarithmictransformation

Menu location: Data_Transformations_Log(Natural) and Data_Transformations_Log (Base 10).

Logarithmic transformation can beused to stabilise the variance of a sample (Bland, 2000).

The function named Log (natural)calculates the natural (Naperian, log to the base e) logarithm of the data youselect.

The function named Log (base 10)calculates the common (log to the base 10) logarithm of the data you select.

The results are stored in a newcolumn that is marked Log(natural):<name> or Log(base 10):<name>where <name> is the original column label. Indeterminable values (i.e.log of any data < 0) are marked as missing data unless you opt to add asuitable constant to all data (you are prompted if you need to do this). It isbest to consult with a statistician before using constants in logtransformations.

Effects of log transformation:

·Variance stabilisation.

·Increasing slopes in x in relation to another variable arelinearized.

·Positively skewed distributions of x are normalized.

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Logarithmictransformation

Menu location: Data_Transformations_Log(Natural) and Data_Transformations_Log (Base 10).

Logarithmic transformation can beused to stabilise the variance of a sample (Bland, 2000).

The function named Log (natural)calculates the natural (Naperian, log to the base e) logarithm of the data youselect.

The function named Log (base 10)calculates the common (log to the base 10) logarithm of the data you select.

The results are stored in a newcolumn that is marked Log(natural):<name> or Log(base 10):<name>where <name> is the original column label. Indeterminable values (i.e.log of any data < 0) are marked as missing data unless you opt to add asuitable constant to all data (you are prompted if you need to do this). It isbest to consult with a statistician before using constants in logtransformations.

Effects of log transformation:

·Variance stabilisation.

·Increasing slopes in x in relation to another variable arelinearized.

·Positively skewed distributions of x are normalized.

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Probittransformation

Menu location: Data_Transformations_Probit.

Probit is a common transformationfor linearizing sigmoid distributions of proportions (Armitage and Berry,1994). The probit is defined as 5 + the 1-p quantile from the standard normaldistribution, where p is a proportion. For example, the number of insectskilled by the log dose of an insecticide might describe a sigmoid relationship,which is a rectangular hyperbolic relationship to the non-log transformed dose.This sort of quantal response situation can be treated as a linear problemafter probit transformation. Probitanalysis uses these principles.

You can supply proportions ordiscrete data for logit transformation. If you specify discrete data thenStatsDirect converts these to proportions by taking each value as a proportionof the maximum of the supplied data. The results are stored in a new columnthat is marked Probit:<name> where <name> is the original columnlabel.

StatsDirect marks indeterminablevalues as missing data, i.e. p=0 or p=1. StatsDirect probit analysis, on theother hand, treats p=0 and p=1 ina more complex way that includes them in the overall analysis.

Copyright © 1990-2006 StatsDirect Limited, alwww.lindalemus.com/rencai/l rights reserved

Download a free 10 dayStatsDirect trial

Angulartransformation

Menu location: Data_Transformations_Angular.

Angular transformation can beused to linearize sigmoid distributions and equalize variances of proportions (Bland, 2000).The angle is defined as arcsin (sqr(p)), where sqr(p) is the square root of aproportion. Logit and probit transformations do not equalize variances butlogistic regression and probit analysis do give special treatment to variancesin the way that they handle expected error distributions. Angular regression isseldom used but angle transformed data can be useful for exploratory dataanalysis by experts.

You can supply proportions ordiscrete data for angular transformation. If you specify discrete data thenStatsDirect converts these to proportions by taking each value as a proportionof the maximum of the supplied data. The results are stored in a new columnthat is marked Angle:<name> where <name> is the original columnlabel.

Indeterminable values are markedas missing data.

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Cumulativetransformation

Menu location: Data_Transformations_Cumulative.

This function takes a set of dataand returns the cumulative set in an equal column marked with the labelCumulate:<name> where <name> is the column label of the originaldata. Cumulative data need to have some meaning to the order in which they arederived.

Example:

Data A, become --->

Cumulate:Data A

1

1

3

4

5

9

4

13

7

20

9

29

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Ladder ofpowers transformations

Menulocation: Data_Transformations_Ladder of Powers.

This function provides a seriesof transformations that have increasing power to pull in the right hand tail ofa distribution. The ladder consists of:

1/x²

reciprocal of square (power -2)

1/x

reciprocal (power -1)

ln(x)

natural logarithm

square root (power 0.5)

square (power 2)

For more information see "twww.lindalemus.comransformingdata".

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Apply function

Menu location: Data_ApplyFunction.

This applies an algebraicexpression to any selection of data. The workbook column(s) you select aregiven the variable names V1, V2 ... Vn in the order that they are selected fromleft to right across the workbook. You can then create a new column as afunction of those selected. You do this by entering an algebraic expressionthat can use any of the functions, operators and constants below. For exampleV1/SQR(V2) would produce a column where each row entry was that of the firstselected column divided by the square root of the corresponding row entry inthe second selected column. A maximum of 9 variables/columns can be selected.

The same results can be achievedusing intrinsic functions within a StatsDirect workbook. Large complexfunctions may be handled faster by the intrinsicworkbook functions of StatsDirect.

Constants

PI

3.14159265358979323846 (p)

EE

2.71828182845904523536 (e)

Arithmetic Functions

ABS

absolute value

CLOG

common (base 10) logarithm

CEXP

anti log (base 10)

EXP

anti log (base e)

LOG

natural (base e, Naperian) logarithm

LOGIT

logit: log(p/[1-p]), p=proportion

ALOGIT

antilogit: exp(l)/[1+exp(l)], l=logit

SQR or SQRT

square root

!

factorial (maximum 170.569)

LOG!

log factorial

IZ

normal deviate for a p value

UZ

upper tail p for a normal deviate

LZ

lower tail p for a normal deviate

TRUNC or FIX

integer part of a real number

CINT

real number rounded to nearest integer

INT

real number truncated to integer closest to zero

Please note that the largestfactorial allowed is 170.569398315538748, but you can work with Log factorialsvia the LOG! function, e.g. LOG!(171).

Arithmetic Operators

^

exponentiation (to the power of)

+

addition

-

subtraction

*

multiplication

/

division

\

integer division

Calculations give an order ofpriority to arithmetic operators; this must be considered when enteringexpressions. For example, the result of the expression "6 - 3/2" is4.5 and not 1.5 because division takes priority over subtraction.

Priority of arithmeticoperators in descending order

1. Exponentiation(^)

2. Negation(-X) (Exception = x^-y; i.e. 4^-2 is 0.0625 and not -16)

3. Multiplicationand Division (*, /)

4. IntegerDivision (\)

5. Additionand Subtraction (+, -)

Trigonometric Functions

ARCCOS

arc cosine

ARCCOSH

arc hyperbolic cosine

ARCCOT

arc cotangent

ARCCOTH

arc hyperbolic cotangent

ARCCSC

arc cosecant

ARCCSCH

arc hyperbolic cosecant

ARCTANH

arc hyperbolic tangent

ARCSEC

arc secant

ARCSECH

arc hyperbolic secant

ARCSIN

arc sine

ARCSINH

arc hyperbolic sine

ATN

arc tangent

COS

cosine

COT

cotangent

COTH

hyperbolic cotangent

CSC

cosecant

CSCH

hyperbolic cosecant

SIN

sine

SINH

hyperbolic sine

SECH

hyperbolic secant

SEC

secant

TAN

tangent

TANH

hyperbolic tangent

To convert degrees to radians,multiply degrees by pi/180. To convert radians to degrees, multiply radians by180/pi.

Logical Functions

AND

logical AND

NOT

logical NOT

OR

logical OR

less than

=

equal to

greater than

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Date andtime intervals

Menu location: Data_Date &Time Intervals.

This function enables you tocreate a column of intervals from a column of dates or times. You must havedate or time data in your workbook to do this. The StatsDirect workbookrecognises input such as 1/3/67 or 12:55 as datesand times.

You can use this function to workout the distance of each date or time entry from a point which you specify,this point can be a date, a time or a date with a time after it. You must also specifythe interval that you want StatsDirect to calculate, this can be years, months,weeks, days, hours, minutes or seconds. Positive intervals represent dates ortimes later than the reference point which you entered and negative intervalsreflect earlier dates/times.

Example

Test workbook (Other worksheet:Event).

To working out the interval indays from a reference point of 1/4/96to the dates in the following list you can use the "Date & TimeInterval" function of the "Data" menu.

Event ----->

Event~Days from 01/04/96

01/03/97

334

02/05/97

396

12/12/96

255

01/11/96

214

03/01/97

277

05/05/97

399

07/08/97

493

08/10/96

190

See also Dates and Times inStatsDirect.

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Ranking

Menu location: Data_Rank.

This function ranks the workbookdata you select and saves the rankings into a new workbook column labelledRank: Name where Name is the column label of the original data. You cancalculate a correction factor for ties in the ranking; five formulae areoffered for tie correction:

1.S(t^3 - t /12)

2.S(t * (t-1)/2)

3.S(t * (t-1) *(2t+5))

4.S(t * (t-1) *(t-2))

5.S(t * (t-1) *(t+1))

Here t is the number of data tiedat each tie and upper case sigma (S) is the summation across these ties.

The use of tie corrections is acomplex subject upon which learned statisticians sometimes disagree.

Example

Test workbook (Nonparametricworksheet: First Born).

Ranking the following agressivityscores for a sample of firstborn twins gives.

First Born ----->

Rank: First Born

86

8

71

3.5

77

6.5

68

1

91

11.5

72

5

77

6.5

91

11.5

70

2

71

3.5

88

10

87

9

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Normalscores

Menu location: Data_NormalScores.

This function saves the normalscores of workbook data you select into a new workbook column marked Nml Score:Name, where Name is the column label of the original data.

Three different methods forcalculating normal scores are provided:

1. vander Waerden's method (Conover, 1999):

- where s is the normal score for an observation, r is the rank for thatobservation, n is the sample size and F(p) is the pth quantile from the standardnormal distribution.

2. Blom's method (Altman, 1991):

- where s is the normal score for an observation, r is the rank for thatobservation, n is the sample size and F(p) is the pth quantile from the standardnormal distribution.

3. Expectednormal order scores (David, 1981;Royston, 1982; Harter, 1961)

- where s is the normal score for an observation, r is the rank for thatobservation, n is the sample size, f(p) is the standard normal density for p and F(p) is the pthquantile from the standard normal distribution. The solution is found bynumerical integration. Calculation of expected normal order scores is notpractical for very large samples, n of 2500 is the maximum permitted inStatsDirect.

Example

Test workbook (Nonparametricworksheet: First Born).

Scoring the following agressivityscores for a sample of firstborn twins using the first method above gives:

First Born ----->

Nml Score(vdW): First Born

86

0.2934

71

-0.6151

77

0

68

-1.4261

91

1.1984

72

-0.2933

77

0

91

1.1984

70

-1.0201

71

-0.6151

88

0.7363

87

0.5024

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Standardization

Menu location: Data_Standardization

This function saves standardizedvalues of workbook data you select into a new workbook column marked Std.(Method): Name, where Name is the column label of the original data and Methodis one of the methods listed below.

Three different methods forstandardization are provided:

1. (x -mean) / standard deviation

2. x /standard deviation

3. x -mean

Example

Test workbook (Nonparametricworksheet: First Born).

Standardizing the followingagressivity scores for a sample of firstborn twins using the first method abovegives.

First Born --->

Std ((x-mean)/SD): First Born 

86

0.7782 

71

-0.9095

77

-0.2344

68

-1.2470

91

1.3408 

72

-0.7970

77

-0.2344

91

1.3408 

70

-1.0220

71

-0.9095

88

1.0033 

87

0.8907 


Sort data to newcolumn

Menu location: Data_Sort_CreateSorted Columns.

StatsDirect enables you to createsorted copies of workbook columns or to manipulate data within a selected datarange of a workbook. This section deals with creating new sorted workbookcolumns based on existing ones. See also workbooksort.

This function sorts workbook datayou select and saves the sorted data into a new workbook column marked Sort:Name, where Name is the column label of the original data. Sorting may beascending or descending. The sort may also be tied to other data, i.e. the datain column b may be sorted in the in the order of sorting the data in column a.Tied sorting can be repeated for any number of columns.

Example 1

Test workbook (Nonparametricworksheet: First Born).

Sorting the following agressivityscores for a sample of firstborn twins in ascending order gives.

First Born ----->

Sort: First Born

86

68

71

70

77

71

68

71

91

72

72

77

77

77

91

86

70

87

71

88

88

91

87

91

Example 2

Test workbook (Nonparametricworksheet: First Born, Second Born).

Sorting the following agressivityscores for a sample of second born twins by the ascending order of the scoresfor firstborn twins gives.

First Born

Second Born ----->

Sr~Second Born~First Born

86

88

64

71

77

65

77

76

80

68

64

77

91

96

72

72

72

76

77

65

65

91

90

88

70

65

72

71

80

81

88

81

96

87

72

90

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Sort withinworksheet

Menu location: Data_Sort_ManipulateWorksheet.

StatsDirect enables you to createsorted copies of worksheet columns or to manipulate data within a selected datarange of a worksheet. This section deals with sorting data within existingranges of data that you select in a worksheet. See also sortingcolumns.

PLEASE NOTE that for thisfunction you should not select columns by clicking on the headers of thecolumns. Select only the data ranges that you want to sort by dragging themouse across them with the left hand mouse button held down.

This function manipulatesselected workbook data ranges by sorting the data. Sorting may be ascending ordescending and either up/down columns (by row) or from side to side (bycolumn). You select a key which is the cell where you want the sorting to beginand in the row or column upon which your sort order is based. Selecting morethan one key enables you to perform a nested sort, i.e. sort on key one (saythe data in column x) then if there are repeats of the same value in column xlook in the key two column (say column y) to order the repeated valuesaccording to the order of the data in this secondary key and so on.

Example 1

Sorting the following data rangesascending order by column with key as the first data cell (at value 86) gives:

86 ------>

68

71

70

77

71

68

71

91

72

72

77

77

77

91

86

70

87

71

88

88

91

87

91

Example 2

Sorting the following data rangesby the ascending order of the data in the first column by row with key as thefirst data cell in the first column (86) gives:

86

88 ------>

68

64

71

77

70

65

77

76

71

80

68

64

71

77

91

96

72

72

72

72

77

76

77

65

77

65

91

90

86

88

70

65

87

72

71

80

88

81

88

81

91

96

87

72

1

90

Pairwise

·Pairwisedifferences

·Pairwisemeans

·Pairwiseslopes

Menu location: Data_Pairwise.

These functions providetransformations based on pairwise calculations either between columns or withina column, they often provide intermediate steps in non-parametric methods.

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Pairwisedifferences

Menu location: Data_Pairwise_Differences.

This function creates a column ofall possible (n*n) pairwise differences from a pair of columns:

X

Y

1

2

2

3

3

6

Differences (X, Y)

-1

-2

-5

0

-1

-4

1

0

-3

All possible (n * n) Xi...(i=1 tok) are compared with Yj...(j=1 to k).

These contrasts form intermediatesteps in a number of non-parametric statistical methods.

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Pairwisemeans

Menu location: Data_Pairwise_Means.

This function creates a column ofall possible (n(n+1)/2) pairwise means within a column:

X

2

3

6

Means (Within X)

2

2.5

4

3

4.5

6

All possible (n(n+1)/2) Xi...(i=1to n) are averaged with Xj...(j=i to n).

These values form intermediatesteps in a number of non-parametric statistical methods.

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Pairwiseslopes

Menu location: Data_Pairwise_Slopes.

This function creates a column ofall (n(n-1)/2) pairwise slopes from a pair of columns (Y the response variableand X the predictor variable):

X

Y

1

2

2

3

3

6

Slopes (X, Y)

1

2

3

All possible (n(n-1)/2)XiYi...(i=1 to n) are compared with XjYj...(j=i to n) where i<>j. Slopeis (Yi-Yj)/(Xi-Xj).

These values form intermediatesteps in a number of non-parametric statistical methods. For example, innon-parametric linear regression (Theil type: Conover, 1999)the median slope would be 2 in theusual Y = bX +C, where C is the median intercept (i.e. median[Y - median b *X]).

StatsDirect will calculate themedian slope and its confidence interval for you if you select this option whenprompted. The results are stored in the column label of the relevant list ofpairwise slopes. The confidence interval is calculated using the distributionof Kendall's T as described in Conover (Conover, 1999).Please note that the results may differ slightly from Conover's text becauseStatsDirect calculates the inverse of Kendall'sT more accurately than the routines used to calculate Best's widely quoted 1974table for this statistic.

A fuller non-parametriclinear regression function is given in the non-parametric methods andregression sections.

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Text tonumbers

Menu location: Data_Text toNumbers.

This function codes text asnumbers to enable analysis. An example of where you might want to do this is aworkbook that has been created from a database where certain categories areentered as text:

Category

Data

TA

2.3

TA

3.4

TB

5.6

TB

4.5

TB

4.9

TA

3.9

TB

3.9

If you enter these data into twocolumns in a StatsDirect workbook, you can convert the Category column tonumbers using the Text to Numbers function from the Data menu. If you selectthe whole of this column by clicking on the header, the first row will have theword "Category" in it. To avoid counting "Category" as oneof your categories when it is really the column title, click on "No"when you are asked if the top row is to be included in the conversion. Theresults will be as follows:

Category

Data

1

2.3

1

3.4

2

5.6

2

4.5

2

4.9

1

3.9

2

3.9

StatsDirect can code textautomatically as in the example above. You may also specify your own numericcode for each different text string by entering any valid number alongside thematching text in a substitution table. The substitution table only appears ifyou click "Yes" when StatsDirect prompts you to specify numbers.

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Randomnumber generation

Menu location: Data_RandomNumbers.

This function enables you tocreate one or more series of random numbers from given distributions.

A robust generatorof uniform (pseudo)random numbers is used as the basis for generating deviatesfrom the probability distributions described below. You are given theopportunity to enter your own seed number to be used by the random numbergenerator but you should use the default seed (based upon your computer'sclock) in most cases. Please note that each seed generates its own series andthat series is the same if you use the seed again. You have very little chanceof using the same seed twice if you select the seed that StatsDirect suggests.

These functions are intended forsimulation work; they employ widely cited and debated algorithms (Gentle, 2003).

Uniform (continuousuniform, rectangular)

The continuous uniformdistribution has a constant density function of the interval (a, b) and thus arectangular shape from a to b:

- where a and b lie between minusand plus infinity.

The interval most commonly usedis 0 to 1

Normal (Gaussian)

The commonly used standard normaldistribution (mean of 0 and standard deviation of 1) is one of a family ofnormal distributions defined by the density function:

- where mean m lies betweenminus and plus infinity and standard deviation s is greater than zero.

Algorithm: inversion of thecumulative distribution function (Wichura, 1988;Gentle 2003)

Lognormal

The density of a lognormaldistribution is given by:

- where mean m lies betweenminus and plus infinity and standard deviation s is greater than zero.

The mean of the lognormaldistribution, as opposed to the mean of the underlying normal distribution, isequal to exp(m+s*s/2) and the variance is equal to exp(2m+2s*s)-exp(2m+ s*s).

Algorithm: transformed inversionof the cumulative distribution function (Wichura, 1988;Gentle 2003)

Exponential

The (negative) exponentialdistribution is a special case (shaping parameter of 1) of the gammadistribution. Its density is given by:

- where the parameter l must be greaterthan zero.

Algorithm: transformation (Ahrens & Dieter,1972).

Gamma

The density function of the gammadistribution is given by:

- where the parameters l and r (shapingparameter) must be greater than zero.

G(*) is the gamma function:

,x>0

Please note that gamma deviateswith a shaping parameter of 0.5 are half the square of normal deviates andgamma deviates with a shaping parameter of 1 are exponential deviates.

Algorithm: acceptance-rejectionmethods GD and GS (Ahrens& Dieter, 1974, 1982b).

Binomial

The density function of the binomialdistribution is given by:

- where p lies between 0 and 1 in n ranges.

Algorithm: acceptance-rejectionmethod BTPEC (Kachitvichyanukul& Schmeiser, 1988).

Poisson

The density function of the Poissondistribution is given by:

- where parameter l is greater thanzero.

Algorithm: acceptance-rejection (Ahrens & Dieter,1982a).

Chi-square

The density function of the chi-squaredistribution is given by:

The deviates are calculated asgamma deviates with parameters n/2 and 2, where n is degrees of freedom.

Algorithm: Transformed gammadeviates (Ahrens& Dieter, 1974,1982b; Gentle 2003).

F (variance ratio)

The density function of the F distribution is given by:

The deviates are calculated asnx/dz where n is the numerator degrees of freedom, d is the denominator degreesof freedom, x is a gamma deviate with parameters n/2 and 2 (chi-square with ndegrees of freedom), and z is a gamma deviate with parameters d/2 and 2(chi-square with d degrees of freedom).

Algorithm: Transformed gammadeviates (Ahrens& Dieter, 1974, 1982b; Gentle 2003).

Student's t

The density function of Student's tdistribution is given by:

The deviates are calculated as astandard normal deviate multiplied by the square root of the degrees of freedom(n) divided by a gamma deviate with parameters n/2 and 2 (a chi-square deviatewith n degrees of freedom).

Algorithm: Transformed standardnormal and gamma deviates (Ahrens & Dieter,1974, 1982b; Gentle 2003).

Beta

The density function of the beta distributionis given by:

Algorithm: Acceptance-rejectionmethods BB and BC (Cheng1978).

Logistic

The density function of the betadistribution is given by:

Algorithm: Transformed uniformdeviates (Gentle2003).

Cauchy

Algorithm: Transformed uniformdeviates (Gentle2003).

Weibull

Algorithm: Transformed uniformdeviates (Gentle2003).

Geometric

Algorithm: Transformed Poissonand exponential deviates (Devroye, 1986;Gentle 2003; Ahrens & Dieter, 1972, 1982a).

Negative binomial

Algorithm: Transformed Poissonand gamma deviates (Devroye, 1986;Gentle 2003; Ahrens & Dieter, 1974, 1982b, 1982a).

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Rotate datablock

Menu location: Data_RotateData Block.

This function rotates a selectedblock of data from a StatsDirect workbook in the following way:

Example

1..

2..

3..

1

1.1

0.7

1

2

1.5

0.6

2

3

1.6

0.6

3

4

1.8

0.5

4

...this would become:

1..

2..

3..

4..

1

1.1

1.5

1.6

1.8

2

0.7

0.6

0.6

0.5

3

1

2

3

4

... in other words rows becomecolumns and columns become rows.

The resulting rotation is bestsaved to a new workbook.

You are given the option to keepexisting formulae or to translate them into results before rotation. Pleasenote that if you keep formulae and save your rotated data to another worksheet,the references in the formulae are likely to be incorrect.

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Dummyvariables

Menu location: Data_DummyVariables

This function creates dummy (ordesign) variables from one categorical variable.

The reference cell coding modelis used (Kleinbaumet al., 1998):

- the source data may benumerical or text, representing categories. The coding scheme shown above isapplied to your data in reverse alphanumeric order for the k categoriesfound, so for three categories, say race equal to black, white or other, white(being the last in an alphabetical sorting) is coded 1,0,0 which reduces todummy variables X (3) = 1, X (2) = 0.

In order to represent acategorical variable with more than two levels in a regression model you maywish to convert it to a series of dummy variables using this function.

Say a linear regression model isspecified with three predictors; the first and third predictors are continuousdata, and the second predictor is a classifier (categorical data) with threelevels. The second predictor should be converted to two dummy dichotomousvariables (e.g. the example below) and put into a multiple linear regression astwo predictors.

The naming scheme for dummyvariables is the original variable name suffixed with (1) if there are only twocategories, or suffixed with (j+1) where there are j+1 categories giving riseto j dummy variables.

In general form, a regressionmodel where the jth predictor variable is a classifier with klevels can be interpreted as follows, provided the jth variable isconverted to dummy variables:

- where Y is the outcomevariable, b is a regression coefficient, D is a dummy variablefor a classifier variable of k levels and x is a non-classifierpredictor variable.

Example

Group ID --->

Group ID (2)

Group ID (3)

1

0

0

1

0

0

1

0

0

1

0

0

2

1

0

2

1

0

2

1

0

2

1

0

3

0

1

3

0

1

3

0

1

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Search andreplace

Menu location: Data_Search& Replace.

This function searches the activeworksheet for data that match the conditions you specify. When the condition,for example data > 20, is found then you have the option to replace orsimply count the data that match this condition.

Note that you can use the replacefunction to remove cells or rows from a worksheet by leaving the "replacewith" box blank.

Example

The following are data importedfrom an application which uses -99 as the missing data value:

12

1

2

3

4

-99

4

2

1

-99

-99

To replace -99 with StatsDirect missing data youcan use the search function with the "search for x" condition set toequal to -99 and the "replace with" entry set to *. To run thisexample, enter the above data into a workbook column and selectthat column. Then select search from the data menu. Check the "equalto" option, enter -99 inthe "search for x" box and enter * in the "replace with"box. When you click on the OK button the -99 values will be replaced with * andthese will also be displayed as an asterisk (missing data) in the workbook.

The match expression option ofthe StatsDirect search form enables you to select data for an expression whichevaluates as true. You use x to denote the data to match, i.e. x ³ 10 and x £ 20 matches allselected data between 10 and 20 inclusive, you can count or replace the matchesby clicking the relevant button. You can compose an expression using x and anyof the functions, operators and logical expressions below:

Constants

PI

3.14159265358979323846 (p)

EE

2.71828182845904523536 (e)

Arithmetic Functions

ABS

absolute value

CLOG

common (base 10) logarithm

CEXP

anti log (base 10)

EXP

anti log (base e)

LOG or LN

natural (base e, Naperian) logarithm

LOGIT

logit: log(p/[1-p]), p=proportion

ALOGIT

antilogit: exp(l)/[1+exp(l)], l=logit

SQR or SQRT

square root

!

factorial (maximum 170.569)

LOG!

log factorial

IZ

normal deviate for a p value

UZ

upper tail p for a normal deviate

LZ

lower tail p for a normal deviate

TRUNC or FIX

integer part of a real number

CINT

real number rounded to nearest integer

INT

real number truncated to integer closest to zero

Please note that the largestfactorial allowed is 170.569398315538748, but you can work with Log factorialsvia the LOG! function, e.g. LOG!(171).

Arithmetic Operators

^

exponentiation (to the power of)

+

addition

-

subtraction

*

multiplication

/

division

\

integer division

Calculations give an order ofpriority to arithmetic operators, this must be considered when enteringexpressions. For example, the result of the expression "6 - 3/2" is4.5 and not 1.5 because division takes priority over subtraction.

Priority of arithmetic operators in descending order

1.

Exponentiation (^)

2.

Negation (-X) (Exception = x^-y; i.e. 4^-2 is 0.0625 and not -16)

3.

Multiplication and Division (*, /)

4.

Integer Division (\)

5.

Addition and Subtraction (+, -)

Trigonometric Functions

ARCCOS

arc cosine

ARCCOSH

arc hyperbolic cosine

ARCCOT

arc cotangent

ARCCOTH

arc hyperbolic cotangent

ARCCSC

arc cosecant

ARCCSCH

arc hyperbolic cosecant

ARCTANH

arc hyperbolic tangent

ARCSEC

arc secant

ARCSECH

arc hyperbolic secant

ARCSIN

arc sine

ARCSINH

arc hyperbolic sine

ATN

arc tangent

COS

cosine

COT

cotangent

COTH

hyperbolic cotangent

CSC

cosecant

CSCH

hyperbolic cosecant

SIN

sine

SINH

hyperbolic sine

SECH

hyperbolic secant

SEC

secant

TAN

tangent

TANH

hyperbolic tangent

To convert degrees to radians,multiply degrees by pi/180. To convert radians to degrees, multiply radians by180/pi.

Logical Functions

AND

logical AND

NOT

logical NOT

OR

logical OR

less than

=

equal to

greater than

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Clearmissing data

Menu location: Data_ClearMissing Data

This function enables you toclear missing data values from a selection of cells, and to optionally preservethe matching of rows in that selection.

For example, you might want toignore incomplete records. Here, each row is a record for one individual, soyou would normally choose to preserve the matching of rows when prompted:

Sex

Age

Score

1

23

2.3

1

33

3.1

1

19

2

21

1.9

2

43

2

39

5.0

2

26

1.1

Clearing missing data with therow matching option ON would produce:

Sex

Age

Score

1

23

2.3

1

33

3.1

2

21

1.9

2

39

5.0

2

26

1.1

…and clearing missing data withthe row matching option OFF would produce:

Sex

Age

Score

1

23

2.3

1

33

3.1

1

19

1.9

2

21

5.0

2

43

1.1

2

39

2

26

StatsDirect treats the followingnumber, characters and text as missing data:

3E+300

*

.

missing

<just spaces>

You may additionally definemissing numbers or text codes when prompted. This is often useful when you havebeen given data with missing values coded in a specific way, e.g. a score of -1 in the example above.

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Categorise

Menu location: Data_Grouping_Categorise.

This function enables you tocategorise any set of data into groups that you specify, for example ages intoage groups.

Typically, a continuousvariable might be divided into categories or groups. Take the IgM variable inthe parametric sheet of the test workbook for example; this has 298observations which you might want to summarise in ranges of values. In order todo this, simply select the Data_Grouping_Categorise menu item then select theIgM column of data. You are presented with different ways to group your datainto bins (intervals) of counts:

·Quartiles
4 bins (< lower quartile; lower quartile to median; median to upperquartile; >= upper quartile)

·Quintiles
5 bins (< first quintile…; >= fourth quintile)

·Deciles
10 bins (<first decile…; >=ninth decile)

·Age groups
one of four common groupings:
<15; 15-19… five yearly bands to 85+
<15; 15-24… ten yearly bands to 85+
<1; 1-4 … five yearly bands to 85+
<1; 1-4… ten yearly bands to 75+

·User-defined
from minimum min, in k intervals of equal size = step:
<min+1*step; >= min+1*step to < min+2*step… in k intervals to >=min+ k*step

Using the IgM example inquartiles:

category

count

< 0.5

56

>= 0.5; < 0.7

67

>= 0.7; < 1

98

>= 1

77

Using the IgM example in 10 intervals of 0.5 from:

category

count

< 0.5

56

>= 0.5; < 1

165

>= 1; < 1.5

54

>= 1.5; < 2

14

>= 2; < 2.5

6

>= 2.5; < 3

2

>= 3; < 3.5

0

>= 3.5; < 4

0

>= 4; < 4.5

0

>= 4.5

1

A quick look at the counts aboveshows a similar picture to that you would see from a histogram,namely that the data are not evenly spreadinto ranges of values, i.e. they are skewed.The text-basedhistogram will give you counts, but note that the bin values in a histogramare the mid-point of the bin and not the cut-off value between bins, i.e. theyare the same as a user-defined bin cut-off values minus half of the step size.

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Extract asubset of data

Menu location: Data_Grouping_Extract.

This function enables you toextract subsets of data from a column of numbers, based upon an expression thatrefers to indicators in related rows of worksheet.

For example, you might want toselect only the rows from a score data column for which sex=2 and age>30;i.e. extract into one new variable, all data for females over 30:

Sex

Age

Score

1

23

2.3

1

33

3.1

1

19

3.2

2

21

1.9

2

43

4.2

2

39

5.0

2

26

1.1

Use the Data_Grouping_Extractmenu item with Score as the data variable and Sex and Age and the indicatorvariables to create a new variable with the expression "v1=1 andv2>30":

Score [Sex=2 AND Age>30]

4.2

5

You can compose an expressionusing v1, v2 etc. and any of the functions, operators and logical expressionsdescribed under search andreplace. Here are some examples of expressions for the example of the twoindicators mentioned above:

V1: Sex

V2 Age

"v1=1" (not all of theselected/listed indicator variables have to be used)

"v2<>21 andv2>16"

"v1=2 or v2>25"

"(v1=1 and v2<=25) or(v1=2 and v2<=30)"

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Split bygroup identifier

Menu location: Data_Grouping_Split.

This function splits a datacolumn into other data columns according to one or more group identifier columns.The split data are put into new workbook columns that are marked Name~G=m,Name~G=f etc, where Name is the original combined data column label and G=xrepresent the group identifiers that you selected. It is wise to keepidentifier schemes simple.

Example

Gp

Data --->

Data~Gp=1

Data~Gp=2

Data~Gp=3

1

1.1

1.1

0.7

1.9

1

1.3

1.3

1.0

2.2

1

0.9

0.9

0.6

1.7

2

0.7

1.5

1.1

2

1.0

1.3

3

1.9

1

1.5

1

1.3

2

0.6

3

2.2

3

1.7

2

1.1

Some other statistical packagesrequire group identifier columns for group comparison tests, where all data areexpected to be in a single column. StatsDirect does not do this for simplegroup comparison methods because separate data columns aid comprehension of theanalysis. If you need to share data between StatsDirect and statisticalpackages that require group identifiers then use this function and the combinegroups function to prepare data prior to import/export.

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Combinegroups and form an identifier

Menu location: Data_Grouping_Combine.

This function combines data fromthe workbook columns you select and puts them into a single new data columnwith a group identifier column alongside it.

Example

Data~G1

Data~G2

Data~G3 --->

Group ID

Data

1.1

0.7

1.9

Data~G1

1.1

1.3

1.0

2.2

Data~G1

1.3

0.9

0.6

1.7

Data~G1

0.9

1.5

1.1

Data~G1

1.5

1.3

Data~G1

1.3

Data~G2

0.7

Data~G2

1.0

Data~G2

0.6

Data~G2

1.1

Data~G1

1.9

Data~G1

2.2

Data~G1

1.7

Some other statistical packagesrequire group identifier columns for group comparison tests, where all data areexpected to be in a single column. StatsDirect does not do this for simplegroup comparison methods because separate data columns aid comprehension of theanalysis. If you need to share data between StatsDirect and statisticalpackages that require group identifiers then use this function and the groupsplit function to prepare data prior to import/export.

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Tabulate

Menu location: Data_Grouping_Tabulate.

This a simple two way crosstabulation function. If you have two columns of numbers that correspond todifferent classifications of the same individuals then you can use thisfunction to give a two way frequency table for the cross classification.

If you need to stratify by athird classification variable, or you want to perform analyses straight away onyou cross tabulation, then please see the Analysis_Crosstabsfunction instead.

Example

A database of test scores containstwo fields of interest, sex (M=1, F=0) and grade of skin reaction to an antigen(none = 0, weak + = 1, strong + = 2). Here is a list of those fields for 10patients:

Sex

Reaction

0

0

1

1

1

2

0

2

1

2

0

1

0

0

0

1

1

2

1

0

In order to get a crosstabulation of these from StatsDirect you should enter these data in twoworkbook columns. Then choose Tabulate from the Grouping section of the Datamenu.

For this example:

Reaction:0

reaction:1

reaction:2

0

1

2

sex:0

2

2

1

sex:1

1

1

3

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Detabulate

Menu location: Data_Grouping_Detabulate.

This a function to expand a twoway table of counts into a column indicator variable and a row indicatorvariable, i.e. to deconstruct a table back into individual level data. This isthe opposite of crosstabs or the tabulatefunction.

Example

Given the following table:

reaction:0

reaction:1

reaction:2

sex:0

2

2

1

sex:1

1

1

3

You can use theData_Grouping_Detabulate menu item on the two by three table of counts to getback to individual level data with each row representing the classification ofa subject as follows:

Row Classifier

Column Classifier

0

0

1

1

1

2

0

2

1

2

0

1

0

0

0

1

1

2

1

0

Copyright © 1990-2006 StatsDirect Limited, all rights reserved

Download a free 10 dayStatsDirect trial

Crosstabs

Menu location: Analysis_Crosstabs.

This a two or three way crosstabulation function. If you have two columns of numbers that correspond todifferent classifications of the same individuals then you can use thisfunction to give a two way frequency table for the cross classification. Thiscan be stratified by a third classification variable.

For two way crosstabs,StatsDirect offers a range of analyses appropriate to the dimensions of thecontingency table. For more information see chi-squaretests and exacttests.

For three way crosstabs, StatsDirectoffers either oddsratio (for case-control studies) or relative risk(for cohort studies) meta-analyses for 2 by 2 by k tables, and generalisedCochran-Mantel-Haenszel tests for r by c by k tables.

Example

A database of test scorescontains two fields of interest, sex (M=1, F=0) and grade of skin reaction toan antigen (none = 0, weak + = 1, strong + = 2). Here is a list of those fieldsfor 10 patients:

Sex

Reaction

0

0

1

1

1

2

0

2

1

2

0

1

0

0

0

1

1

2

1

0

In order to get a cross tabulationof these from StatsDirect you should enter these data in two workbook columns.Then choose crosstabs from the analysis menu.

For this example:

Reaction

0

1

2

Sex

0

2

2

1

1

1

1

3

We could then proceed to an r byc (2 by 3) contingencytable analysis to look for association between sex and reaction to thisantigen:

Contingency table analysis

Observed

2

2

1

5

% of row

40%

40%

20%

% of col

66.67%

66.67%

25%

50%

Observed

1

1

3

5

% of row

20%

20%

60%

% of col

33.33%

33.33%

75%

50%

Total

3

3

4

10

% of n

30%

30%

40%

TOTAL number of cells = 6

WARNING: 6 out of 6 cells haveEXPECTATION < 5

NOMINAL INDEPENDENCE

Chi-square = 1.666667, DF = 2, P= 0.4346

G-square = 1.726092, DF = 2, P =0.4219

Fisher-Freeman-Halton exact P =0.5714

ANOVA

Chi-square for equality of meancolumn scores = 1.5

DF = 2, P = 0.4724

LINEAR TREND

Sample correlation (r) = 0.361158

Chi-square for linear trend (M²)= 1.173913

DF = 1, P = 0.2786

NOMINAL ASSOCIATION

Phi = 0.408248

Pearson's contingency = 0.377964

Cramér's V = 0.408248

ORDINAL

Goodman-Kruskal gamma = 0.555556

Approximate test of gamma = 0: SE= 0.384107, P = 0.1481, 95% CI = -0.197281 to 1.308392

Approximate test of independence:SE = 0.437445, P = 0.2041, 95% CI = -0.301821 to 1.412932

Kendall tau-b = 0.348155

Approximate test of tau-b = 0: SE= 0.275596, P = 0.2065, 95% CI = -0.192002 to 0.888313

Approximate test of independence:SE = 0.274138, P = 0.2041, 95% CI = -0.189145 to 0.885455

...
关于我们 - 联系我们 -版权申明 -诚聘英才 - 网站地图 - 医学论坛 - 医学博客 - 网络课程 - 帮助
医学全在线 版权所有© CopyRight 2006-2046,
皖ICP备06007007号
百度大联盟认证绿色会员可信网站 中网验证
Baidu
map