SOFT NOTES: STATISTICAL TERMS

STATISTICAL TERMS

Statistics - the practice or science of collecting and analyzing numerical data in large quantities, especially for the purpose of inferring proportions in a whole from those in a representative sample.

Primary data - is information that you collect specifically for the purpose of your research project. An advantage of primary data is that it is specifically tailored to your research needs. A disadvantage is that it is expensive to obtain. Primary data are collected by the investigator conducting the research.

Secondary data - Common sources of secondary data for social science include censuses, information collected by government departments, organisational records and data that was originally collected for other research purposes.

Secondary data offers the following advantages:

(1) It is highly convenient to use information which someone else has compiled. There is no need for printing data collection forms, hiring enumerators, editing and tabulating the results, etc. Researchers alone or with some clerical assistance may obtain information from published records compiled by somebody else.

(2) If secondary data are available they are much quicker to obtain than primary data.

(3) Secondary data may be available on some subjects where it would be impossible to collect primary data. For example, census data cannot be collected by an individual or research organization, but can only be obtained from Government publications.

However, two major problems are encountered in using secondary data:

(1) The first is the difficulty of finding data which exactly fit is to the need of the present study.

(2) The second problem is finding data which are sufficiently accurate

Even if utmost care has been taken in selecting a sample, the results derived from a sample study may not be exactly equal to the true value in the population. The reason is that estimates of are based on a part of population and not on the whole and samples are seldom, if ever, perfect miniature of the population. Hence sampling gives rise to certain errors known as sampling errors (or sampling fluctuations). These errors would not be present in a complete enumeration survey. However, the errors can be controlled. The modern sampling theory helps in designing the survey in such a manner that the sampling errors can be made small.

Sampling errors are two types: biased and unbiased

1. Biased errors: These errors arise from any bias in selection, estimation, etc. For example, if in place of simple random sampling, deliberate sampling has been used in a particular case some bias is introduced in the result and hence such errors are called biased sampling errors.

2. Unbiased errors. These errors arise due to chance differences between the members of population included in the sample and those not included. An error in statistics is the difference between the value of a statistic and that of the corresponding parameter.

Thus the total sampling error is made up of errors due to bias, if any, and the random sampling error. The essence of bias is that it forms a constant component of error that does not decrease in a large population as the number in the sample increased. Such error is, therefore, also known as cumulative or non compensating error. The random sampling error, on the other hand, decreases on an average as the size of the sample increases. Such error is, therefore, also known as non-cumulative or compensating error.

Bias may arise due to:

(i) in appropriate process of selection

(ii) in appropriate work during the collection; and

(iii) in appropriate methods of analysis

(i) Faulty selection. Faulty selection of the sample may give rise to bias in a number of ways, such as:

(a) Deliberate selection of a ‘representative’ sample.

(b) Conscious or unconscious bias in the selection of a ‘random’ sample. The randomness of selection may not really exist, even though the investigator claims that he has a random sample if he allows his desire to obtain a certain result to influence his selection.

(c) Substitution. Substitution of an item in place of one chosen in random sample sometimes leads to bias. Thus, if it is decided to interview every 50^th householder in the street, it would be inappropriate to interview the 51^st or any other number in his place as the characteristics possessed by them differ from those who were originally to be included in the sample.

(d) Non-response. If all the items to be included in the sample are not covered there will be bias even though no substitution has been attempted. This fault occurs particularly in mailed questionnaires method, which are incompletely returned. Moreover, the information supplied by the informants may also be biased.

(e) An appeal to the vanity of the person questioned may give rise to yet another kind of bias. For example, the question ‘Are you a good student?’ is such that most of the students would succumb to vanity and answer ‘Yes;

(ii) Bias due to Faulty Collection of Data. Any consistent error in measurement will give rise to bias whether the measurement are carried out on a sample or on all the units of population. The danger of error is, however, likely to be greater in sampling work, since the units measured are often smaller. Bias may arise due to improper formulation of the decision, problem or wrongly defining the population, specifying the wrong decision, securing an inadequate frame, and so on. Biased observations may result from a poorly designed questionnaire, an ill-trained interviewer, failure of a respondent’s memory, etc. Bias in the flow of data may be due to unorganized collection procedure, faulty editing or coding of responses.

(iii) Bias in Analysis. In addition to bias which arises from faulty process of selection and faulty collection of information, faulty methods of analysis may also introduce bias. Such bias can be avoided by adopting the proper methods of analysis.

If possibilities of bias exist, fully objective conclusions cannot be drawn. The first essential of any sampling or census procedure must, therefore, be the elimination of all sourced of bias. The simplest and the only certain way of avoiding bias in the selection process is for the sample to be drawn either entirely at random, or at random subject to restrictions which, while improving the accuracy, are of such a nature that they do not introduce bias in the results. In certain cases, systematic selection may also be permissible.

Statistics - a set of concepts, rules, and procedures that help us to:

organize numerical information in the form of tables, graphs, and charts;
understand statistical techniques underlying decisions that affect our lives and well-being; and
make informed decisions.

Data - facts, observations, and information that come from investigations.

Measurement data sometimes called quantitative data -- the result of using some instrument to measure something (e.g., test score, weight);
Categorical data also referred to as frequency or qualitative data. Things are grouped according to some common property(ies) and the number of members of the group are recorded (e.g., males/females, vehicle type).

Variable - property of an object or event that can take on different values. For example, college major is a variable that takes on values like mathematics, computer science, English, psychology, etc.

Discrete Variable - a variable with a limited number of values (e.g., gender (male/female), college class (freshman/sophomore/junior/senior).
Continuous Variable - a variable that can take on many different values, in theory, any value between the lowest and highest points on the measurement scale.
Independent Variable - a variable that is manipulated, measured, or selected by the researcher as an antecedent condition to an observed behavior. In a hypothesized cause-and-effect relationship, the independent variable is the cause and the dependent variable is the outcome or effect.
Dependent Variable - a variable that is not under the experimenter's control -- the data. It is the variable that is observed and measured in response to the independent variable.
Qualitative Variable - a variable based on categorical data.
Quantitative Variable - a variable based on quantitative data.

Graphs - visual display of data used to present frequency distributions so that the shape of the distribution can easily be seen.

Bar graph - a form of graph that uses bars separated by an arbitrary amount of space to represent how often elements within a category occur. The higher the bar, the higher the frequency of occurrence. The underlying measurement scale is discrete (nominal or ordinal-scale data), not continuous.
Histogram - a form of a bar graph used with interval or ratio-scaled data. Unlike the bar graph, bars in a histogram touch with the width of the bars defined by the upper and lower limits of the interval. The measurement scale is continuous, so the lower limit of any one interval is also the upper limit of the previous interval.

Measures of Center - Plotting data in a frequency distribution shows the general shape of the distribution and gives a general sense of how the numbers are bunched. Several statistics can be used to represent the "center" of the distribution. These statistics are commonly referred to as measures of central tendency.

Mode - The mode of a distribution is simply defined as the most frequent or common score in the distribution. The mode is the point or value of X that corresponds to the highest point on the distribution. If the highest frequency is shared by more than one value, the distribution is said to be multimodal. It is not uncommon to see distributions that are bimodal reflecting peaks in scoring at two different points in the distribution.
Median - The median is the score that divides the distribution into halves; half of the scores are above the median and half are below it when the data are arranged in numerical order. The median is also referred to as the score at the 50^th percentile in the distribution. The median location of N numbers can be found by the formula (N + 1) / 2. When N is an odd number, the formula yields a integer that represents the value in a numerically ordered distribution corresponding to the median location. (For example, in the distribution of numbers (3 1 5 4 9 9 8) the median location is (7 + 1) / 2 = 4. When applied to the ordered distribution (1 3 4 5 8 9 9), the value 5 is the median, three scores are above 5 and three are below 5. If there were only 6 values (1 3 4 5 8 9), the median location is (6 + 1) / 2 = 3.5. In this case the median is half-way between the 3^rd and 4^th scores (4 and 5) or 4.5.
Mean - The mean is the most common measure of central tendency and the one that can be mathematically manipulated. It is defined as the average of a distribution is equal to the SX / N. Simply, the mean is computed by summing all the scores in the distribution (SX) and dividing that sum by the total number of scores (N). The mean is the balance point in a distribution such that if you subtract each value in the distribution from the mean and sum all of these deviation scores, the result will be zero.

Measures of Spread - Although the average value in a distribution is informative about how scores are centered in the distribution, the mean, median, and mode lack context for interpreting those statistics. Measures of variability provide information about the degree to which individual scores are clustered about or deviate from the average value in a distribution.

Range - The simplest measure of variability to compute and understand is the range. The range is the difference between the highest and lowest score in a distribution. Although it is easy to compute, it is not often used as the sole measure of variability due to its instability. Because it is based solely on the most extreme scores in the distribution and does not fully reflect the pattern of variation within a distribution, the range is a very limited measure of variability.
Interquartile Range (IQR) - Provides a measure of the spread of the middle 50% of the scores. The IQR is defined as the 75^th percentile - the 25^th percentile. The interquartile range plays an important role in the graphical method known as the boxplot. The advantage of using the IQR is that it is easy to compute and extreme scores in the distribution have much less impact but its strength is also a weakness in that it suffers as a measure of variability because it discards too much data. Researchers want to study variability while eliminating scores that are likely to be accidents. The boxplot allows for this for this distinction and is an important tool for exploring data.
Variance - The variance is a measure based on the deviations of individual scores from the mean. As noted in the definition of the mean, however, simply summing the deviations will result in a value of 0. To get around this problem the variance is based on squared deviations of scores about the mean. When the deviations are squared, the rank order and relative distance of scores in the distribution is preserved while negative values are eliminated. Then to control for the number of subjects in the distribution, the sum of the squared deviations, S(X - `X), is divided by N (population) or by N - 1 (sample). The result is the average of the sum of the squared deviations and it is called the variance.

Bias.

· A measurement procedure or estimator is said to be biased if, on the average, it gives an answer that differs from the truth. The bias is the average (expected) difference between the measurement and the truth. For example, if you get on the scale with clothes on, that biases the measurement to be larger than your true weight (this would be a positive bias). The design of an experiment or of a survey can also lead to bias. Bias can be deliberate, but it is not necessarily so.

Class Boundary.

· A point that is the left endpoint of one class interval, and the right endpoint of another class interval.

Class Interval.

· In plotting a histogram, one starts by dividing the range of values into a set of non-overlapping intervals, called class intervals, in such a way that every datum is contained in some class interval. See the related entries class boundary and endpoint convention.

Cluster Sample.

· In a cluster sample, the sampling unit is a collection of population units, not single population units. For example, techniques for adjusting the U.S. census start with a sample of geographic blocks, then (try to) enumerate all inhabitants of the blocks in the sample to obtain a sample of people. This is an example of a cluster sample.

Discrete Variable.

A quantitative variable whose set of possible values is countable. Typical examples of discrete variables are variables whose possible values are a subset of the integers, such as Social Security numbers, the number of people in a family, ages rounded to the nearest year, etc. Discrete variables are "chunky." C.f. continuous variable. A discrete random variable is one whose set of possible values is countable. A random variable is discrete if and only if its cumulative probability distribution function is a stair-step function; i.e., if it is piecewise constant and only increases by jumps.

Median.

"Middle value" of a list. The smallest number such that at least half the numbers in the list are no greater than it. If the list has an odd number of entries, the median is the middle entry in the list after sorting the list into increasing order. If the list has an even number of entries, the median is the smaller of the two middle numbers after sorting. The median can be estimated from a histogram by finding the smallest number such that the area under the histogram to the left of that number is 50%.

numbers does not make sense. Quantitative variables typically have units of measurement, such as inches, people, or pounds.

Quartiles.

There are three quartiles. The first or lower quartile (LQ) of a list is a number (not necessarily a number in the list) such that at least 1/4 of the numbers in the list are no larger than it, and at least 3/4 of the numbers in the list are no smaller than it. The second quartile is the median. The third or upper quartile (UQ) is a number such that at least 3/4 of the entries in the list are no larger than it, and at least 1/4 of the numbers in the list are no smaller than it. To find the quartiles, first sort the list into increasing order. Find the smallest integer that is at least as big as the number of entries in the list divided by four. Call that integer k. The kth element of the sorted list is the lower quartile. Find the smallest integer that is at least as big as the number of entries in the list divided by two. Call that integer l. The lth element of the sorted list is the median. Find the smallest integer that is at least as large as the number of entries in the list times 3/4. Call that integer m. The mth element of the sorted list is the upper quartile.

Random Sample.

A random sample is a sample whose members are chosen at random from a given population in such a way that the chance of obtaining any particular sample can be computed. The number of units in the sample is called the sample size, often denoted n. The number of units in the population often is denoted N. Random samples can be drawn with or without replacing objects between draws; that is, drawing all n objects in the sample at once (a random sample without replacement), or drawing the objects one at a time, replacing them in the population between draws (a random sample with replacement). In a random sample with replacement, any given member of the population can occur in the sample more than once. In a random sample without replacement, any given member of the population can be in the sample at most once. A random sample without replacement in which every subset of n of the N units in the population is equally likely is also called a simple random sample. The term random sample with replacement denotes a random sample drawn in such a way that every n-tuple of units in the population is equally likely.

Range.

The range of a set of numbers is the largest value in the set minus the smallest value in the set. Note that as a statistical term, the range is a single number, not a range of numbers.

A cumulative frequency (cumulative relative frequency) is obtained by summing the frequencies (relative frequencies) of all classes up to the specific class. In a case of qualitative variables, cumulative frequencies makes sense only for ordinal variables, not for nominal variables.

The qualitative data are presented graphically either as a pie chart or as a horizontal or vertical bar graph.

A pie chart is a disk divided into pie-shaped pieces proportional to the relative frequencies of the classes. To obtain angle for any class, we multiply the relative frequencies by 360 degrees, which corresponds to the complete circle.

A horizontal bar graph displays the classes on the horizontal axis and the frequencies (or relative frequencies) of the classes on the vertical axis. The frequency (or relative frequency) of each class is represented by vertical bar whose height is equal to the frequency (or relative frequency) of the class.

In a bar graph, its bars do not touch each other. At vertical bar graph, the classes are displayed on the vertical axis and the frequencies of the classes on the horizontal axis.

IMPORTANCE OF STATISTICS IN BUSINESS

There are three major functions in any business enterprise in which the statistical methods are useful. These are as follows:

(i) The planning of operations: This may relate to either special projects or to the recurring activities of a firm over a specified period.

(ii) The setting up of standards: This may relate to the size of employment, volume of sales, fixation of quality norms for the manufactured product, norms for the daily output, and so forth.

(iii) The function of control: This involves comparison of actual production achieved against the norm or target set earlier. In case the production has fallen short of the target, it gives remedial measures so that such a deficiency does not occur again.

A worth noting point is that although these three functions-planning of operations, setting standards, and control-are separate, but in practice they are very much interrelated.

Different authors have highlighted the importance of Statistics in business. For instance, Croxton and Cowden give numerous uses of Statistics in business such as project planning, budgetary planning and control, inventory planning and control, quality control, marketing, production and personnel administration. Within these also they have specified certain areas where Statistics is very relevant. Another author, Irwing W. Burr, dealing with the place of statistics in an industrial organisation, specifies a number of areas where statistics is extremely useful. These are: customer wants and market research, development design and specification, purchasing, production, inspection, packaging and shipping, sales and complaints, inventory and maintenance, costs, management control, industrial engineering and research.

Statistical problems arising in the course of business operations are multitudinous. As such, one may do no more than highlight some of the more important ones to emphasis the relevance of statistics to the business world. In the sphere of production, for example, statistics can be useful in various ways.

Statistical quality control methods are used to ensure the production of quality goods.

Identifying and rejecting defective or substandard goods achieve this. The sale targets can be fixed on the basis of sale forecasts, which are done by using varying methods of forecasting. Analysis of sales affected against the targets set earlier would indicate the deficiency in achievement, which may be on account of several causes: (i) targets were too high and unrealistic (ii) salesmen's performance has been poor (iii) emergence of increase in competition (iv) poor quality of company's product, and so on. These factors can be further investigated.

Another sphere in business where statistical methods can be used is personnel management. Here, one is concerned with the fixation of wage rates, incentive norms and performance appraisal of individual employee. The concept of productivity is very relevant here. On the basis of measurement of productivity, the productivity bonus is awarded to the workers. Comparisons of wages and productivity are undertaken in order to ensure increases in industrial productivity.

Statistical methods could also be used to ascertain the efficacy of a certain product, say, medicine. For example, a pharmaceutical company has developed a new medicine in the treatment of bronchial asthma. Before launching it on commercial basis, it wants to ascertain the effectiveness of this medicine. It undertakes an experimentation involving the formation of two comparable groups of asthma patients. One group is given this new medicine for a specified period and the other one is treated with the usual medicines. Records are maintained for the two groups for the specified period. This record is then analysed to ascertain if there is any significant difference in the recovery of the two groups. If the difference is really significant statistically, the new medicine is commercially launched.

LIMITATIONS OF STATISTICS

Statistics has a number of limitations, pertinent among them are as follows:

(i) There are certain phenomena or concepts where statistics cannot be used. This is because these phenomena or concepts are not amenable to measurement.

For example, beauty, intelligence, courage cannot be quantified. Statistics has no place in all such cases where quantification is not possible.

(ii) Statistics reveal the average behaviour, the normal or the general trend. An application of the 'average' concept if applied to an individual or a particular situation may lead to a wrong conclusion and sometimes may be disastrous.

For example, one may be misguided when told that the average depth of a river from one bank to the other is four feet, when there may be some points in between where its depth is far more than four feet. On this understanding, one may enter those points having greater depth, which may be hazardous.

(iii) Since statistics are collected for a particular purpose, such data may not be relevant or useful in other situations or cases. For example, secondary data

(i.e., data originally collected by someone else) may not be useful for the other person.

(iv) Statistics are not 100 per cent precise as is Mathematics or Accountancy.

Those who use statistics should be aware of this limitation.

CHARACTERISTICS OF THE ARITHMETIC MEAN

Some of the important characteristics of the arithmetic mean are:

1. The sum of the deviations of the individual items from the arithmetic mean is always zero. This means I: (x - x ) = 0, where x is the value of an item and x is the arithmetic mean. Since the sum of the deviations in the positive direction is equal to the sum of the deviations in the negative direction, the arithmetic mean is regarded as a measure of central tendency.

2. The sum of the squared deviations of the individual items from the arithmetic mean is always minimum. In other words, the sum of the squared deviations taken from any value other than the arithmetic mean will be higher.

3. As the arithmetic mean is based on all the items in a series, a change in the value of any item will lead to a change in the value of the arithmetic mean.

4. In the case of highly skewed distribution, the arithmetic mean may get distorted on account of a few items with extreme values. In such a case, it may cease to be the representative characteristic of the distribution.

MEDIAN

Median is defined as the value of the middle item (or the mean of the values of the two middle items) when the data are arranged in an ascending or descending order of magnitude. Thus, in an ungrouped frequency distribution if the n values are arranged in ascending or descending order of magnitude, the median is the middle value if n is odd. When n is even, the median is the mean of the two middle values.

Suppose we have the following series:

15, 19,21,7, 10,33,25,18 and 5

We have to first arrange it in either ascending or descending order. These figures are arranged in an ascending order as follows:

5,7,10,15,18,19,21,25,33

Now as the series consists of odd number of items, to find out the value of the middle item, we use the formula

Where n +1

Where n is the number of items. In this case, n is 9, as such,

n + 1 = 5

that , the size of the 5th item is the median. This happens to be 18.

Suppose the series consists of one more items 23. We may, therefore, have to include

23 in the above series at an appropriate place, that is, between 21 and 25. Thus, the series is now 5, 7, 10, 15, 18, 19, and 21,23,25,33. Applying the above formula, the

median is the size of 5.5th item. Here, we have to take the average of the values of 5th

and 6th item. This means an average of 18 and 19, which gives the median as 18.5.

It may be noted that the formula

n + 1

2 itself is not the formula for the median; it merely indicates the position of the median, namely, the number of items we have to count until we arrive at the item whose value is the median. In the case of the even

number of items in the series, we identify the two items whose values have to be

averaged to obtain the median. In the case of a grouped series, the median is

calculated by linear interpolation with the help of the following formula:

M =(L1) l2+l1(M-C)

Where M = the median

l1 = the lower limit of the class in which the median lies

12 = the upper limit of the class in which the median lies

f = the frequency of the class in which the median lies

m = the middle item or (n + 1)/2th, where n stands for total number of

items

c = the cumulative frequency of the class preceding the one in which the median lies.

The frequency of a particular data value is the number of times the data value occurs. For example, if four students have a score of 80 in mathematics, and then the score of 80 is said to have a frequency of 4. The frequency of a data value is often represented by f.

Frequency Distribution: values and their frequency (how often each value occurs).

Here is another example:

Example: Newspapers

These are the numbers of newspapers sold at a local shop over the last 10 days:

22, 20, 18, 23, 20, 25, 22, 20, 18, 20

Let us count how many of each number there is:

Papers Sold	Frequency
18	2
19	0
20	4
21	0
22	2
23	1
24	0
25	1

It is also possible to group the values. Here they are grouped in 5s:

Papers Sold	Frequency
15-19	2
20-24	7
25-29	1

HISTOGRAMS VS BAR GRAPHS

Bar Graphs are good when your data is in categories (such as "Comedy", "Drama", etc).

But when you have continuous data (such as a person's height) then use a Histogram.

It is best to leave gaps between the bars of a Bar Graph, so it doesn't look like a Histogram.

https://www.mathsisfun.com/data/images/bar-chart-vs-histogram.gif

Pie Chart: a special chart that uses "pie slices" to show relative sizes of data.

Imagine you survey your friends to find the kind of movie they like best:

*Table:* *Favorite Type of Movie*
Comedy	Action	Romance	Drama	SciFi
4	5	6	1	4

You can show the data by this Pie Chart:

It is a really good way to show relative sizes: it is easy to see which movie types are most liked, and which are least liked, at a glance.

You can create graphs like that using our Data Graphs (Bar, Line and Pie) page.

Or you can make them yourself ...

How to Make Them Yourself

First, put your data into a table (like above), then add up all the values to get a total:

*Table:* *Favorite Type of Movie*
Comedy	Action	Romance	Drama	SciFi	TOTAL
4	5	6	1	4	20

Next, divide each value by the total and multiply by 100 to get a percent:

Comedy	Action	Romance	Drama	SciFi	TOTAL
4	5	6	1	4	20
4/20 = 20%	5/20 = 25%	6/20 = 30%	1/20 = 5%	4/20 = 20%	100%

Now to figure out how many degrees for each "pie slice" (correctly called a sector).

A Full Circle has 360 degrees, so we do this calculation:

Comedy	Action	Romance	Drama	SciFi	TOTAL
4	5	6	1	4	20
20%	25%	30%	5%	20%	100%
4/20 × 360° = 72°	5/20 × 360° = 90°	6/20 × 360° = 108°	1/20 × 360° = 18°	4/20 × 360° = 72°	360°

Now you are ready to start drawing!

Draw a circle.

Then use your protractor to measure the degrees of each sector.

Here I show the first sector ...

Finish up by coloring each sector and giving it a label like "Comedy: 4 (20%)", etc.

(And dont forget a title!)

Another Example

You can use pie charts to show the relative sizes of many things, such as:

what type of car people have,
how many customers a shop has on different days and so on.
how popular are different breeds of dogs

Example: Student Grades

Here is how many students got each grade in the recent test:

A	B	C	D
4	12	10	2

And here is the pie chart:

Pie diagrams are at times less effective than bar diagrams for accurate reading and interpretation, particularly when series is divided into a large number of components or the difference among the components is very small. It is generally inadvisable to attempt to portray a series of more than five or six categories by means of a pie chart. If, for example, there are eight, ten or more categories it may be very confusing to differentiate the relative values portrayed specially when several small sectors are of approximately the same size. This type of diagram, although frequently used, appears upon comparison inferior to simple bar diagram, the divided bar diagram or a group of curves.

SIGNIFICANCE OF DIAGRAMS & GRAPHS

Diagrams and graphs are extremely useful because of the following reasons:

1. Diagrams and graphs are attractive, impressive and save time.

2. They make data representation simple and have universal utility.

3. They make comparison possible and give more information.

GENERAL RULES FOR CONSTRUCTING DIAGRAMS:

The construction of diagrams is an art, which can be acquired through practice. However, observance of some general guidelines can help in making them more attractive and effective. The diagrammatic presentation of statistical facts will be advantageous provided the following rules are observed in drawing diagrams.

A diagram should be neatly drawn and attractive.
The measurements of geometrical figures used in diagram should be accurate and proportional.
The size of the diagrams should match the size of the paper.
Every diagram must have a suitable but short heading.
The scale should always be mentioned in the diagram.
Diagrams should be neatly as well as accurately drawn with the help of drawing instruments.
Index must be given for identification so that the reader can easily make out the meaning of the diagram.
Economy in cost and energy should be exercised in drawing diagram.

Limitations of Diagrammatic Presentation

Diagrams do not present the small differences properly.
These can easily be misused.
Only artist can draw multi-dimensional diagrams.
In statistical analysis, diagrams are of no use.
Diagrams are just supplement to tabulation.
Only a limited set of data can be presented in the form of diagram.
Diagrammatic presentation of data is a more time consuming process.
Diagrams present preliminary conclusions.
Diagrammatic presentation of data shows only on estimate of the actual behavior of the variables.

TYPES OF DIAGRAMS

· (a) Line Diagrams

In these diagrams only line is drawn to represent one variable. These lines may be vertical or horizontal. The lines are drawn such that their length is the proportion to value of the terms or items so that comparison may be done easily.

· (b) Simple Bar Diagram

Like line diagrams these figures are also used where only single dimension i.e. length can present the data. Procedure is almost the same, only one thickness of lines is measured. These can also be drawn either vertically or horizontally. Breadth of these lines or bars should be equal. Similarly distance between these bars should be equal. The breadth and distance between them should be taken according to space available on the paper.

· (c) Multiple Bar Diagrams

The diagram is used, when we have to make comparison between more than two variables. The number of variables may be 2, 3 or 4 or more. In case of 2 variables, pair of bars is drawn. Similarly, in case of 3 variables, we draw triple bars. The bars are drawn on the same proportionate basis as in case of simple bars. The same shade is given to the same item. Distance between pairs is kept constant.

· (d) Sub-divided Bar Diagram

The data which is presented by multiple bar diagram can be presented by this diagram. In this case we add different variables for a period and draw it on a single bar as shown in the following examples. The components must be kept in same order in each bar. This diagram is more efficient if number of components is less i.e. 3 to 5.

· (e) Percentage Bar Diagram

Like sub-divide bar diagram, in this case also data of one particular period or variable is put on single bar, but in terms of percentages. Components are kept in the same order in each bar for easy comparison.

· (f) Duo-directional Bar Diagram

In this case the diagram is on both the sides of base line i.e. to left and right or to above or below sides.

· (g) Broken Bar Diagram

This diagram is used when value of some variable is very high or low as compared to others. In this case the bars with bigger terms or items may be shown broken.

MERITS OF THE MEAN:

Merits

i) It is rigidly defined

ii) It is easy to understand and easy to calculate

iii) It is based upon all the observations

iv) It is amenable to algebraic treatment. The mean of the composite series in terms of the means and sizes of the component series is given by

v) Of all the averages, arithmetic mean is affected least by fluctuations of sampling. This property is sometimes described by saying that mean is a stable average.

Thus, we see that arithmetic mean satisfies all the properties laid down by Prof. Yule for an ideal average.

Demerits

i) Arithmetic mean is affected very much by extreme values. In case of extreme items, arithmetic mean gives a distorted picture of the distribution and no longer remains representative of the distribution.

ii) Arithmetic mean may lead to wrong conclusions if the details of the data from which it is computed are not given. Let us consider the following marks obtained by two students A and B in three tests, viz, terminal test, half-yearly examination and annual examination respectively.

Marks in : I Test II Test III Test Average marks

A 50% 60% 70%

B 70% 60% 50%

Thus average marks obtained by each of the two students at the end of the year are 60%. If we are given the average marks alone we conclude that the level of intelligence of both the students at the end of the year is same. This is a fallacious conclusion since we find from the data that student A has improved consistently while student B has deteriorated consistently.

iii) Arithmetic mean cannot be calculated if the extreme class is open, e.g. below 10 or above 70. Moreover, even if a single observation is missing mean cannot be calculated.

MERITS OF THE MEDIAN:

i) It is rigidly defined

ii) It is easily understood and is easy to calculate. In some cases it can be located merely by inspection.

iii) It is not at all affected by extreme values

iv) It can be calculated for distributions with open-end classes.

Demerits:

i) In case of even number of observations median cannot be determined exactly. We merely estimate it by taking the mean of the two middle most terms.

ii) It is not based on all the observations. For example, the median of 10, 25, 50, 60 and 65 is 50. We can replace the observations 10 and 25 by any two values which are smaller than 50 and the observations 60 and 65 by any two values greater than 50, without affecting the value of median. This property is sometimes described by saying that median is insensitive.

iii) It is not amenable to algebraic treatment

iv) As compared with mean, it is affected much by fluctuations of sampling.

Uses:

i) Median is the only average to be used while dealing with qualitative data which cannot be measured quantitatively but still can be arranged in ascending or descending order of magnitude, e.g. to find the average intelligence or average honesty among a group of people.

ii) It is to be used for determining the typical value in problems concerning wages, distribution of wealth, etc.

MERITS AND DEMERITS OF THE MODE:

Merits

i) Mode is readily comprehensible and easy to calculate. Like median, mode can be found in some cases merely by inspection.

ii) Mode is not at all affected by extreme values

iii) Mode can be conveniently located even if the frequency distribution has classes of unequal magnitude provided the modal class and the classes preceding and succeeding it are of the same magnitude. Open-end classes also do not pose any problem in the determination of mode.

Demerits

i) Mode is ill-defined. It is not always possible to find a clearly defined mode. In some cases, we may come across distributions with two modes. Such distributions are called bi-modal. If a distribution has more than two modes, it is said to be multimodal.

ii) It is not based upon all the observations

iii) It is not capable of further mathematical treatment

iv) As compared with mean, mode is affected to a greater extent by fluctuations of sampling

Uses

Mode is the average to be used to find the ideal size, e.g. in business forecasting, in the manufacture of ready-made garments, shoes, etc.

MERITS AND DEMERITS OF GEOMETRIC MEAN

Merits:

i) It is rigidly defined

ii) It is based upon all the observations

iii) It is suitable for further mathematical treatment

iv) It is not affected much by fluctuations of sampling

v) It gives comparatively more weight to small items

Demerits:

i) Because of its abstract mathematical character, geometric mean is not easy to understand and to calculate for a non-mathematics student.

ii) If any one of the observations is zero, geometric mean becomes zero and if any one of the observations is negative, geometric mean becomes imaginary regardless of the magnitude of the other items.

Uses

Geometric mean is used

i) To find the rate of population growth and the rate of interest

ii) In the construction of index numbers

The following are the measures of dispersion.

i) Range

ii) Quartile deviation or semi-interquartile range

iii) Mean deviation and

iv) variance and Standard deviation

The range is the difference between two extreme observations of the data. If A and B are the greatest and the smallest observations respectively in a data, then its range is A-B.

Range is the simplest but a crude measure of dispersion. Since it is based on two extreme observations which themselves are subject to chance fluctuations, it is not at all a reliable measure of dispersion.

Quartile deviation or semi-interquartile range Q is given by

Q = ½ (Q₃-Q₁)

where Q₁ and Q₃ are the first and third quartiles of distribution respectively.

Quartile deviation is definitely a better measure than the range as it makes use of 50% of the data. But since it ignores the other 50% of the data, it cannot be regarded as a reliable measure.

INDEX NUMBERS

ndex numbers are commonly used statistical device for measuring the combined fluctuations in a group related variables. If we wish to compare the price level of consumer items today with that prevalent ten years ago, we are not interested in comparing the prices of only one item, but in comparing some sort of average price levels. We may wish to compare the present agricultural production or industrial production with that at the time of independence. Here again, we have to consider all items of production and each item may have undergone a different fractional increase (or even a decrease). How do we obtain a composite measure? This composite measure is provided by index numbers which may be defined as a device for combining the variations that have come in group of related variables over a period of time, with a view to obtain a figure that represents the ‘net’ result of the change in the constitute variables.

Index numbers may be classified in terms of the variables that they are intended to measure. In business, different groups of variables in the measurement of which index number techniques are commonly used are (i) price, (ii) quantity, (iii) value and (iv) business activity. Thus, we have index of wholesale prices, index of consumer prices, index of industrial output, index of value of exports and index of business activity, etc. Here we shall be mainly interested in index numbers of prices showing changes with respect to time, although methods described can be applied to other cases. In general, the present level of prices is compared with the level of prices in the past. The present period is called the current period and some period in the past is called the base period.

Index Numbers:

Index numbers are statistical measures designed to show changes in a variable or group of related variables with respect to time, geographic location or other characteristics such as income, profession, etc. A collection of index numbers for different years, locations, etc., is sometimes called an index series.

Simple Index Number:

A simple index number is a number that measures a relative change in a single variable with respect to a base.

Composite Index Number:

A composite index number is a number that measures an average relative changes in a group of relative variables with respect to a base.

Types of Index Numbers:

Following types of index numbers are usually used:

Price index Numbers:

Price index numbers measure the relative changes in prices of a commodities between two periods. Prices can be either retail or wholesale.

Quantity Index Numbers:

These index numbers are considered to measure changes in the physical quantity of goods produced, consumed or sold of an item or a group of items.

INDEX NUMBERS are a statistician's way of expressing the difference between two measurements by designating one number as the "base", giving it the value 100 and then expressing the second number as a percentage of the first.

Example: If the population of a town increased from 20,000 in 1988 to 21,000 in 1991, the population in 1991 was 105% of the population in 1988. Therefore, on a 1988 = 100 base, the population index for the town was 105 in 1991.

An "index", as the term is generally used when referring to statistics, is a series of index numbers expressing a series of numbers as percentages of a single number.

Example: the numbers
50 75 90 110
expressed as an index, with the first number as a base, would be
100 150 180 220

Indexes can be used to express comparisons between places, industries, etc. but the most common use is to express changes over a period of time, in which case the index is also a time series or "series". One point in time is designated the base period—it may be a year, month, or any other period—and given the value 100. The index numbers for the measurement (price, quantity, value, etc.) at all other points in time indicate the percentage change from the base period.

If the price, quantity or value has increased by 15% since the base period, the index is 115; if it has fallen 5%, the index is 95. It is important to note that indexes reflect percentage differences relative to the base year and not absolute levels. If the price index for one item is 110 and for another is 105, it means the price of the first has increased twice as much as the price of the second. It does not mean that the first item is more expensive than the second.

Each index number in a series reflects the percentage change from the base period. It is important not to confuse an index point change and a percentage change between two index numbers in a series.

Example: if the price index for butter was 130 one year and 143 the next year, the index point change would be:
143 – 130 = 13
but the percentage change for the index would be:
(143 – 130) x 100) ÷ 130 = 10%

Popular Posts

notes

STATISTICAL TERMS