Statistical Methods and Psychology!
In the preface to H. E. Garrett’s Statistics in Psychology and Education Woodworth distinguishes between different orders of statisticians. “There is, first in order, the mathematician who invents the method for performing a certain type of statistical job. His interest, as a mathematician, is not in the educational, social or psychological problems, but in the problem of devising instruments for handling such matters. He is the tool-maker. The latter are quite another order of statisticians who supply them with the mathematician’s formulas, etc. they will compute from your data the necessary averages, probable errors and correlation coefficients. Their interest, as computers, lies in the quick and accurate handling of the tools of the trade. But there is a statistician of yet another order, in between the two, his primary interest is psychological, perhaps, or it may be educational. It is he who has selected the scientific or practical problem, who has organised his attack upon the problem in such fashion that the data obtained can be handled in some sound statistical way. Such a one, in short, must have a discriminating knowledge of the kit of tools which the mathematician has handed him, as well as some skill in their actual use.”
The objective of measurement is accurate quantitative description of the data. Objectivity is the aim. Physical scientists do not want to accept measurement in psychology as measurement at all. But psychology and social sciences have applied statistical methods to depict the behaviour of the data. Difficulties of psychological measurement are great. Spearman said, “The path of science has paved with achievements of the allegedly unachievable. And in point of fact mathematical treatment is perhaps just the region where psychology has made its steadfast and most surprising advances.”
In the field of mental measurement we find today much application of the normal curve as developed, by Gauss (1777-1855). In investigating the causes of individual differences Galton applied the normal curve, and invented a number of additional statistical tools, the most important of which is the method of correlation, which he developed with the help of Karl Pearson.
The psychological scaling methods, like rating scales, paired comparisons, ranking method, equal-appearing intervals etc. have helped in preparation of psychological tests, like measurement of intelligence, aptitudes and attitudes, personality traits etc. The close relation between statistics and psychology has already led to the development of a separate science as psychometry.
A student of educational psychology, therefore, must have sufficient knowledge of elementary statistics in order to undertake construction and application of psychological tests and to validation of such tests. He must understand as how to operate multidimensional scaling or factor analysis, instead of putting composite variable on to a linear scale. A basic knowledge of a few statistical terms and method is essential for all students of educational psychology.
(i) Frequency distribution:
Data collected on tests must be organised and summarised. The first step is to classify the scores into groups. They are to be tabulated first into a frequency distribution:
Following are the steps to draw a frequency distribution:
(1) Determine the range or the gap between the highest and the lowest scores. An illustration is taken from the results of an admission test of 50 students. The highest score is 187 and the lowest 132. The range is 55.
(2) A group interval is to be selected. The commonest group intervals are 3, 5, 10 units. The number of intervals can be determined by dividing the range (55) by 5 (class interval arbitrarily chosen) which gives 11, add 1, which makes the number of class intervals 12.
(3) Tally each score in the appropriate interval. List the class intervals from the lowest at the bottom to the largest score at the top. (4) Determine the mid-point of an interval in a given frequency distribution.
The illustration of 50 ungrouped scores is:
The Above 50 scores grouped into a frequency distribution:
The frequency distribution can also be presented graphically in the form of a distribution curve, a histogram or a frequency polygon.
Measures of Central Tendency:
The central tendency represents the most typical or representative score to characterize the performance of the entire group, and it helps to compare the performance of two or more groups in terms of the typical performance. There are three measures of central tendency, the average or the mean (M), or arithmetical mean, the median and the mode.
(1) The average or arithmetical mean or M from an ungrouped data is found by adding all the scores, and dividing the sum by the number of scores (N). The formula is M = x/N (arithmetical mean calculated from ungrouped data).
(2) The arithmetical mean or M from a data grouped in a frequency distribution is found by a slightly different method.
In this method the fx column is to be calculated by multiplying the midpoint of each interval by the number of scores (f) on it. The mean is then the sum of the fx (namely 8040) divided by N (50).
Calculation of mean, median and crude mode from the above table of 50 scores:
Median refers to the midpoint of the series, when the data are arranged in order of size. It is the middle-most score that bisects the distribution, half the cases falling above it, and half below.
The formula for calculating median from a series of ungrouped data is:
where L = exact lower limit of the class interval where the median lies.
F = the sum of all scores on intervals below 1 (lower limit).
fm = frequency or number of scores within the interval upon which the median falls.
i = length of the class interval.
The most frequent score.
In an ungrouped series of data, the ‘crude’ or ’empirical’ mode is the single score which appears most frequently.
When the frequency distribution is symmetrical, the formula for calculating mode is: Mode = 3 mdn – 2 mean.
Arithmetical Mean by the Short Method:
Data from Table 1 of 50 scores:
The AM or the assumed mean, taken at the interval of the largest frequency = 162. 00. The column x1 refers to die deviations of midpoints of class-intervals from the assumed mean in terms of class-intervals. Thus 5 is the interval deviation of 187 from 162, the midpoint of the largest group.
The deviation from 162, the assumed mean from its class-interval is 0. Above 162 all deviations are positive, and below 162 all deviations are negative. The fx1 column is found by weighting the deviations by their appropriate fx in column 3. From fx1 column the correction is obtained in the following manner. Find the algebraic sum of the plus and minus fx1 (which is -12), divide the sum by
N (-12/50) = –.240
This gives C or the correction in units of class interval, C is multiplied by the class interval i (- .240 x 5 = -1.20), add this correction to assumed mean (162 + (-1.20)), the mean is 160.80.
Guidelines for the Use of Various Central Tendencies:
(1) Use the mean:
when the scores are distributed symmetrically around a central point, and when SD or correlation coefficient is to be measured.
(2) Use the median:
when the exact midpoint or 50% point is desired and when there are extreme scores.
(3) Use the mode:
when a quick and rough estimate of central tendency is wanted as the typical value of the group.
Measures of variability or the extent of individual difference around the central tendency.
Variability refers to the ‘scatter’ or ‘spread’ of the score around their central tendency.
There are 4 measures of variability:
(1) The range
(2) The quartile deviation or Q
(3) The average deviation or A.D. and
(4) The standard deviation or S.D or (Sigma)σ.
(1) The range is the interval between the lowest and the largest score in the group. But it is crude and unstable as it is determined by two scores only. A score unusually high or unusually low will affect its size.
(2) The quartile deviation is one half the scale distance between the 75th and 25th percentiles in a frequency distribution. The 1st quartile is Q1, the point below which lies 25% of the scores, the 75th quartile or Q3 is the point below which lies 75% of the scores. Hence the formula for quartile deviation is Q3-Q1/2.
(3) Average deviation or Mean deviation:
It is the mean of the deviations of all the separate scores in a series taken from their mean, median or mode. From an ungrouped series, the AD or MD will be ∑ │X│ / N from a frequency distribution, where the scores are grouped, AD will be ∑ (fx) /N. The AD is rarely used in modem statistics.
(4) It is the Standard deviation which is mostly used in construction of all tests. “The most serviceable measure of variability is the standard deviation, symbolized by SD or a (the Greek term sigma).” To take an illustration to show how to compute the SD from 10 ungrouped scores (Reference-Anastasi).
The original raw scores are designated by a capital X and a small x is used to refer to deviations of each score from the group mean. The Greek letter ∑ (sigma) denotes summation. The 1st column gives the data for the computation of mean and median. The mean is 40; the median is 40-5, falling midway between 40 and 41, five cases (50 per cent) are above the median and five cases below.
The second column shows how far each score deviates above or below the mean of 40. The sum of these deviations will always equal zero, since the positive and negative deviations around the mean necessarily balance, or cancel each other out (+ 20 – 20 = 0). If we ignore signs, the absolute deviations can be averaged, obtaining average deviation, which, however, is not suitable for mathematical analysis, because of the arbitrary disregarding of signs.
In the case of standard deviation, the negative signs are legitimately ignored by squaring each deviation. The sum of the last column divided by the number of cases (∑X2) / N is known as the variance or mean square deviation, and symbolized by σ2. The standard deviation is the square root of the variance. This measure is commonly used to compare the variability of different groups.
In order to understand Q or quartile deviation it is necessary to have an idea of percentiles. Percentile scores are expressed in terms of the percentage of persons in the standardised sample who fall below a certain raw score. For example, if 25% of the persons obtain less than 30 in a given test on arithmetic, the raw score of 39 corresponds to the 25 percentile (P25).
A percentile shows the individual’s relative position in the standardised sample. The 50th percentile corresponds to the median or the average. Percentiles above 50 represents above average performance, and percentiles below 50 mean inferior or below average performance.
The 25th and the 75th quartiles are known as the 1st and 3rd quartile points (Q1 and Q3), because they cut off the lowest and highest quarters of the distribution. It is a convenient way of describing a distribution and comparing it with other distributions. Percentiles are not the same as percentage scores which are raw scores; percentiles are derived scores, expressed in terms of percentage of persons, above or below certain scores.
The advantages of percentiles are many:
(1) They are easy to compute.
(2) They are universally applicable, whether in case of adults or children.
(3) They are suitable for any type of test —aptitude or personality variables.
When to use the various measures of variability:
(1) The Range:
(i) The range may be used when the data are too scanty or too scattered to justify computation of more precise measure of variability.
(ii) When only knowledge of the spread is wanted.
(2) The Q may be used:
When the median is the measure of central tendency and when the concentration round the median, i.e. 50% of the score is of interest.
(3) The AD may be used when exact deviation from the mean is to be assessed.
(4) The SD, is the most stable index of variability. It is used in almost in all experimental work and research.
(i) It is to be used when greatest stability or reliability is sought.
(ii) When coefficient of correlation and other statistics are to be computed.
A student of educational psychology should further have knowledge of concepts like standard score, standard error, null-hypothesis, T-test, and computation of correlation co-efficient. A very brief account of the above concepts is being given next.
Computation of standard score is, first, a method of scaling scores mostly used in constructing aptitude and achievement tests. This is to render scores on different tests comparable. Standard scores may be computed by either linear or non-linear transformation of the raw scores.
In case of linear transformation they are computed by subtracting a constant (mean) from each raw score and then dividing the result by another constant a, as such the standard score retain the exact numerical relations of the original raw scores.
Standard scores are designated as ‘Z’ scores. It is derived by finding a difference between the individual’s raw score and the mean of the normative group, and then dividing the difference by the SD of the normative group. Non-linear transformations are employed to achieve comparability of dissimilarly shaped distributions.
Normalized Standard Scores or T Scores:
T scores are normalized standard scores converted into a distribution with a mean of 50, and a SD of 10. In T scaling actually it is the percentile rank of the raw score which is being scaled. The Stanine (a contraction of standard nine) scale is a condensed form of the T scale, the stanine scores run from 1 to 9 along the base line of the normal curve in which the unit is ‘5a’and the median is 5.
The T scores should not be confused with standard scores. With respect to original scores T scores represent equivalent PRs in a normal distribution. Standard scores, on the other hand, have always the same form of distribution as raw scores, and are simply original scores expressed in a units. Standard scores correspond exactly to T scores when the distribution of raw scores is strictly normal. Standard error is an estimate formula to tell us how adequately or truly an obtained score represents its true score. It is a way of expressing the reliability of a test, other than the reliability coefficient.
The null or chance hypothesis is a tool used to test the significance of differences between population means. It is actually a logical and not statistical conclusion. But the rejection of a null hypothesis does not immediately force acceptance of a contrary view. In order to find out the significance of difference it is necessary to find the standard error of the difference between the two sample means.
The two formulas for the SE and the difference between uncorrected or independent means are:
In which σM1 = the SE of the mean of the 1St sample.
σM2 = The SE of the mean of the 2nd sample.
σP = the SE of the difference between the two sample means.
N1 and N2 = Sizes of the two samples.
T-test refers to the degrees of freedom, to be computed in case of small samples of population.
Correlation-coefficient, designated by the letter ‘r’:
Correlation is a measure of relationship. Relations among abilities can be studied by the method of correlation.
There are different methods of computing correlation:
(1) The “product-moment” coefficient of correlation (r) can be found when the relationship between two sets of measurement is “linear”, i.e., can be described by a straight line. Perfect relationship is expressed by a coefficient of 1.00 and no relationship by a coefficient of -00. Between these two limits fall positive relationships.
A coefficient of correlation between 1-00 to -00 always signifies a positive relationship. Relationship may also be negative, i.e., a high degree of performance in one test may be associated with a low degree in another. When negative or inverse relationship is perfect, the coefficient of correlation or r = –1.00.
(2) Rank correlation:
Correlation is rarely computed when the number falls below 25. In such cases rank correlation may be computed by arranging the scores in order of merit, and compared by drawing parallel or horizontal straight lines. The product moment coefficient correlation is actually a ratio which expresses the extent to which changes in one variable are accompanied by changes in a second variable.