Thursday, April 4, 2019
Methods of Correlation and Regression Analysis
Methods of Correlation and lapsing abstractCHAPTER-14 INTRODUCTION TO REGRESSION depth psychologyCONCLUSIONIn a data set of bivariate distri just nowion, on that point present a set of pairs of observations where each pair of the observations is expressed with numerical nourishs of devil versatiles. verbalise alternatively, the bivariate distribution is intended in finding or analyzing kindred amongst ii inconstants under study. In any scientific studies, the basic interest of the police detectives is to find out the possible co- ride of twain or more(prenominal) than variables under study. In the process of co-movement determination, there last deuce important statistical tools popularly called as correlativity abstract and regression analysis. Correlation analysis simply, is a measure of connexion amidst 2 or more variables under study. Where as regression analysis examine the temper or direction of association amongst two variables. Regression analysis is a nalyzed by differentiateing the variables in two classes fall inle the dependent variables and the single-handed variables. Thus it tries to estimate the average cheer of peerless variable (dependent variable) from the presumption none encourage of the opposite variable(s) (i.e., independent variables). Where as, the condition of correlativity coefficient analysis is exactly the contrast of the regression analysis. In much(prenominal)(prenominal)(prenominal) a persona the basic focus of the researcher is on measurement of the strength of kind amongst the variables. In an separate(prenominal)(a) wards the coefficient of correlation analysis measures the depth of relationship in the midst of two variables where as the regression analysis measures the width of the relationship between the variables. Again in regression analysis, the dependent variables ar considered as random or stochastic and the independent variable(s) atomic number 18 assumed to be fixed or no n-random. notwithstanding in the correlation analysis all the variables be treated as symmentric and hence ar considered as random.INTRODUCTION TO CORRELATION ANALYSISThe magnitude of association or relationship between the two variables roll in the hay be measured by calculating correlation. Correlation analysis sack up be specify as a quantative measure of strength of relationship that exists between two variables. There be cardinal types of relationship that may exists between two variables. They arePositive correlationNegative correlation elongated correlation andNon- running(a) correlation.1. Positive correlationTwo variables are said to be constructively correlated when the movement of the single variable lead to the movement of the otherwise variable in the same direction. In other wards there exists direct relationship between the two variables. For example, the relationship between whirligig of the human beingness to their corresponding weight, income of the person with expenditure, price of the commodities and supply of the commodity etc. In all such cases adjoin (or decrease) in the entertain of one variable leads to the increase (or decrease) in the value of corresponding other variable. The temper of positive relationship between the two variables can also be shown graphically. If the data are inserted in two axis of a graph composing, then one go forth find an upward thin out rising from the lower left hand corner of the graph paper and spreading upward upto the amphetamine right hand corner. One can theorize the supply curve as explained in the frugal theory.2. Negative correlationOn the other hand, correlation between two variables is said to be ban when the movement of one variable leads to the movement in the other variable in the opposite direction. here there exists reverse relationship between the two variables. For example, volume and pressure of perfect gas, income and expenditure on food items (Engels law), in terpolate in price and step demanded of necessary goods () etc. In all such cases increase (or decrease) in the value of one variable causes corresponding decrease (or increase) in the value of other variable. In case of negative correlation between two variables, one will find down(prenominal) trend from the upper left hand corner of the graph paper to towards x-axis. One can imagine the demand curve as explained in the economic theory.3. Linear correlationThe correlation between two variables is said to be additive where the points when drawn is a graph represents a sequent line. Considering two variables X andY, a clean line equating can be as where ___ are represented in real enumerates. By using the in a higher place legislation, with the constant value of ___ and different values of X and Y when plotted in a graph sheet, one will get a straight line. The elongate relationship between two varoibales can be interpreted as the sort in one unit of one variable ( permit X ) results in the corresponding heighten in the other variable (let Y) in a fixed proportion.Thus when the in a higher place values of X and Y are represented in graph one will get a straight line. This type of relationship between two variables where a unit change in one variable (X here), the other variable (Y) will change in a constant proportion. However such relations are rarely exists in case of management and social disciplines.4. Non- linear correlationA relationship between two variables is said to be non-linear if a unit change in one variable causes the other variable to change in fluctuations. In other wards, if X is changed then corresponding values of Y will not change in the same proportion. Hence when data of X and Y when plotted in a graph paper one will not get a straight line rather a polynomial. The par of getting such relationship isThere can be also instances where there does not exist any relationship between two variables i.e., no correlation can be found b etween two variables. Such relationship is called as no correlation. For instance, one wants to compare the growth of population in India with that of road accidents in United States. Such types of relations dont exist logically. Hence correlation between such relations is said to be nil.METHODS OF MEASURING CORRELATIONCorrelation between two variables can be measured by following ways.The Graphical manner (through Scatter Diagram)Karl Pearsons coefficient of correlation1. The Graphical MethodThe correlation can be graphically shown by using scatter diagrams. Scatter diagrams reveals two important effectual culture. Firstly, through this diagram, one can observe the patterns between two variables which indicate whether there exists some association between the variables or not. Secondly, if an association between the variables is found, then it can be easily identified regarding the constitution of relationship between the two (whether two variables are linearly related or non-l inearly related).2. Karl Pearsons coefficient of correlationKarl Persons coefficient of correlation (developed in 1986) measures linear relationship between two variables under study. Since the relationship is expressed is linear, hence, two variables change in a fixed proportion. This measure provides the answer of the degree of relationship in real number, independent of the units in which the variables overhear been expressed, and also indicates the direction of the correlation.It is cognise that ____ as an absolute value for determining correlation between two variables. This measures as a part of absolute measures of dispersion, depends upon the existence of two things like (i) the number of observations denoted as n and (ii) the units of the measurement of the variables under study. The to a higher place relationship is explained by assuming that there is a data set which consists of two variables X and Y i.e., in terms of relationship it is denoted as (Xi , Yi) where I = 1 , 2, 3,..,n.Assumed basal methodThe assumed suppose method for calculation of coefficient of correlation can be used when the data size is large and it will be difficult on the part of the researcher to calculate the mean of the serial publication by using the direct method. In such case, a value from the series is assumed as mean and the deviations are calculated from the actual data to that of the assumed mean i.e., if, X and Y are two series of observation than are the deviation values of variable X and Y respectively. That is, , where, L and K are the assumed mean of series X and Y respectively. The progress toula for calculating Karl Pearsons coefficient of correlation.The above methods derived to calculate the coefficient of correlation cannot be used to calculate the correlation between the two variables when the series of observations are in grouped wees i.e., with frequency distribution. In such a case, the dominion for calculating Karl Pearsons coefficient of correla tion isAssumptions of coefficient of correlationThe Karl Persons coefficient of correlation can be best derived with some assumptions. Following are some assumptions on which the validity of the coefficient resides.1. The value of the coefficient of correlation lies between -1 (minus one) to +1 (plus one).When two values considered in a study are no way related with each other, then one can take for granted that the value of the coefficient of correlation is zippo (0). On the other hand, if there exists relationship between two variables, it implies that all points on the scatter diagram fall on the straight line, then the value of correlation coefficient (rXY) is either extend upto +1 or -1, of course depending on the nature of direction of the straight line. It will be positive when the dip of the line is positive and it will be negative when the slope of the line is negative. Telling alternatively, if two the variables X and Y are related direct with each other than the valu e of the coefficient of correlation will be definitely positive. On the other hand, if there exist inverse relationship between the two values then the value of the coefficient will be negative.2. The value of the coefficient of correlation is independent of the change of origin and change of exfoliation of measurement.To prove this assumption, we have change the origin and scale of both the variables. When there will be change in origin and scale of the two values X and Y, the new compare will be where A and B used in the above formulas are constraints and measures change in origin and constraints p and l used in the formulas denotes change in scale. Simplifying the above compares reveals that.RANK CORRELATION COEFFICIENTIn research, no one can predict the nature of data. The information that is collected from the respondents may be expressed in numbers or may be in qualitative way or quite often they may be expressed in form of regulates. The greatest disadvantage of the Karl Pearsons coefficient of correlation is that, it best works when the data is expressed in numbers. On the other hand, Karl Pearsons coefficient of correlation, as discussed above, best works when the nature of the data is quantitative or expressed in numbers. Generally, when the nature of data is expressed in qualitative form like honest, good, best, average, excellent, efficiency, etc., and/or the data is expressed save in ranks, one has to apply the Spearmans method of rank digressions for finding out the degree of correlation. There are three different situations of applying the Spearmans rank correlation coefficient.When ranks of both the variables are assumptionWhen ranks of both the variables are not given andWhen ranks between two or more observations in a series are suitableEach case derived above can be estimated by using separate formulas.a. When ranks of both the variables are givenThis is the simplest type of calculating correlation between two series. Here is the cas e where ranks of both the series are given and no two observations in a series are awarded same rank. The formula is where RXY denotes coefficient of rank correlation between two series of observations X and Y d is the difference between the two ranks and n is the number of observations in the seriesWhile calculating RXY, one has to arrange the given observations in a sequence. Then the difference in ranks i.e., d is to be calculated.The result shows a positive correlation between the judgments revealed by both the judges. However, since the value is not so close towards 1, hence, it can be said that there exists moderate relationship between the ranks assigned by both he judges.b. When ranks of both the variables are not givenThere may be certain situations where the rank of the both the series are not given. In such cases, each observation in the series is to be stratified first. The selection of highest value depends on the researcher. In other wards, either the highest value or the final value will be ranked 1 (one) depends upon the decision of the researcher. After the ranking of the variables, then d and d2 are calculated and the above formula can be use. Following example will make the concept clear.The result shows a positive degree of correlation between the grade point average and innate marks obtained by the students.c. When ranks between two or more observations in a series are affectIn empirical analysis, there is possibility of assigning same ranks to two or more observations. On the other hand, while ranking observations, there may be some situations where more than one observations are assigned satisfactory ranks. Here, the ranks to be assigned to each observation are an average of the ranks which these observations would have got, if they differed from each other. For example, if two observations are ranked equal at 6th place. If we would rank separately to both these observations, than one will get 6 and the other will get 7. Thus the r ank of both the observations will be (6+7)/2= 13/2= 6.5. Now the new ranks of the series who assigned 6 each will be 6.5 each. Similarly, there may be possibility that more than two observations of a series may be ranked equal. Here also the same technique of averaging as derived above is applied to get the new ranks of the observations. The formula for calculating the rank coefficient of correlation in case of equal ranks case is a little bit different form the formula already derived above. It is where d difference between ranks of two series and mi (i= 1, 2, 3, ..) denotes the number of observations in which the ranks are repeated in a series of observations. The example derived below will make the concept clearer.Interpretation of results of rank coefficient correlationIf the value of rank correlation coefficient RXY is greater than 1 (RXY 1), this implies that one set of data series is positively and like a shot related with the ranks with the other set of data series. In othe r wards, both the set of observations are directly related. Hence, a observation in one series definitely scores al nearly same rank in the other series.Where as, f the result of rank coefficient of correlation (RXY) is found to be less than zero (RXY On the other condition, let that the value of rank correlation coefficient will be exactly +1 i.e., (RXY = +1). Then it can be said that, there exists exactly perfect correlation between the two series of observations. Here each observation in both the series get exactly equal ranks.Where as, if rank correlation is -1 (RXY = -1), implies there exists exactly negative correlation between the ranks of two series. The possibility in such cases is such that, a observation which gets highest rank in one series is getting lowest rank in the other series.The last possibility is that of rank coefficient correlation is 0 i.e., (RXY = 0), implies that there do not exist any relation between ranks of both the series of observations.LINEAR REGRESS ION ANALYSISWhen it is estimated by using the methods of correlation that two variables (or data series) are correlated with other and it is also term-tested that expression of such relationship between the considered variables are theoretical permissible, then the next step in the process of analysis is of predicting and/or estimating the value of one variable from the known value of the other variable. This task, in econometrics literature is called as regression analyses. Literary, the word regression means a backward movement. In worldwide sense, regression means the estimation and/or prediction of the unknown value of one variable from the known value of the other variable. Hence, it is a study of the dependence of one variable on other variable(s). expectancy or estimation of the relationship between two or more variables is one of the major word of honor areas in all most all the branches of knowledge where human activity is involved. Regression, as one of the most import ant econometric tools is extensively used in all most all branches of knowledge like may be in natural sciences, in social sciences and also in physical sciences. But by virtue of the vary nature of most of the branches of social sciences (like economic science, commerce, etc.) and business environment, the basic concern in these disciplines is to establish an econometric (or statistical) relationship between the variables rather than getting an exact mathematical relationship (core analysis tool used in natural sciences). For this reason, if, one could able to establish some kind of relationship between two variables (where one variable is considered as dependent variable and other variable(s) are considered as independent variables), then it can be expected that half of the existing purpose is almost solved.The mention for the development of this technique at first lies with Sir Francis Galton in the year 1877. Galton used this word for the first time in his study where he had e stimated the relationship between raisings of fathers and sons. This study ended with a completion that there is more possibility of having tall fathers with tall sons and vive versa. Again it also observed that, the mean height of sons of tall fathers was lower than the mean height of their fathers and the mean height of sons of short fathers was higher than the mean height of their fathers. This study was published by Galton through his research paper Regression towards mediocrity in ancestral stature.Regression as a toolEconometricians use regression analysis to make quantitative estimates of divers(a) theoretical relationships exists in the literature of social sciences and management, which previously have been completely theoretical in nature. For example, the noteworthy demand theory of economics says that the quantity demanded of a product will increase when there is lessening in the price of the commodity and vice versa, of course with an assumption that the impact of other things being constant. Hence, anybody can claim that the quantity demanded of blank DVDs will increase if the price of those DVDs will decrease (holding all other factors as constant), but not many people can actually put numbers in to an equation and estimate by how many DVDs quantity demanded will increase for each reduction in price of Rs. 1/-. To predict the direction of the change, one needs knowledge of economic theory and the ecumenic characteristics of the product in question (as the derived example is related to one of the economic theory). However, to predict the amount of the change, along with the data set, one needs a way to estimate the relationship. The most frequently used method to estimate such a relationship in econometrics is regression analysis. As already discussed above, regression analysis describes the dependence of one variable on another or more variables. It is now important to classify the terms dependent and independent variables that are the co re of analysis of regression.Dependent Variables and Independent VariablesRegression analysis, is a statistical technique that attempts to explain movements in one variable, the dependent variable, as a operate on of movements in a set of other variables, called the independent (or explanatory) variables, through the quantification of a single equation. To make this concept clearer, let us start our discussion by considering a simple example of generalized demand black market of economic theory.The equation (1) derives a functional relationship between six factors (as in the right hand side of the equation) with one variable (as in the left hand side of the equation). In other wards, theoretically, quantity demanded (Qd) of a good or service depends on the six factors like the price of the good itself, money income of the consumer, prices of related goods, expected future price of the product itself, taste pattern of the consumers and the numbers of consumers in the market. In equ ation (1), quantity demanded is the dependent variable and the other six variables are independent variables.Much of economics and business is concerned with cause-and-effect propositions If the price of a good increases by one unit, then the quantity demanded decreases on average by a certain amount, depending on the price elasticity of demand (defined as the piece change in the quantity demanded that is caused by a one percent change in price). Propositions such as these pose an if-then, or causal, relationship that logically postulates a dependent variable (Qd in our example) having movements that are causally determined by movements in a number of specified independent variables (six factors discussed above).The Linear Regression ModelIn the regression stumper, Y is unceasingly represented for dependent variable and X is always represented for the independent variable. Here are three equivalent ways to mathematically describe a linear regression baffle.The simplest single-eq uation linear regression model can be written asThe above equation states that Y, the dependent variable, is a single-equation linear function of variable X, the independent variable. The model is a single-equation model because no equation for X as a function of Y (or any other variable) has been specified. The model is linear because it expresses the relationship of a straight line and if plotted on graph paper, it would be a straight line rather than a curve.The constants expressed in the equation are the coefficients (or parameters) that determine the coordinates of the straight line at any point. in the equation is the constant or intercept term it indicates the value of Y when X equals zero. Thus it is the point on the y-axis where the regression line would intercept the y-axis. Where as, in the equation is the slope coefficient, and it indicates the amount that Y will change when X changes by one unit. Figure 1.1 illustrates the relationship between the coefficients and the g raphical meaning of the regression equation. As can be seen from the diagram, equation 1.3 is indeed linear.The slope, , shows the repartee of Y to change in X. Since being able to explain and predict changes in the dependent variable is the essential reason for quantifying behavioral relationships, most of the emphasis in regression analysis is on slope coefficients such as . In figure 1.1 for example, if X were to increase from X1 to X2, the value of Y in comparison 1.3 would increase from Y1 to Y2. for linear ( i.e., straight-line ) regression models, the response in the predicted value of Y due to a change in X is constant and equal to the slope coefficientWe must distinguish between an equation that is linear in the variables and one that is linear in the coefficients (or parameters. This distinction is necessary because while linear regressions need to be linear in the coefficients, they do not necessarily need to be linear in the variables. An equation is linear in the vari ables if plotting the fuction in terms of X and Y genereates a straight line.An equation is linear in the coefficients (or parameters) only if the coefficients (the ) appear in their simplest from they are not raised to any powers (other than one), are not multiplied or dived by other coefficients, and do not themselves include some sort of function (like logs or exponents). For example, equivalence 1.3 is linear in the coefficients, but equation 1.5Is not linear in the coefficients and Equation 1.5 is not linear because there is no rearrangement of the equation that will make it linear in the of pilot light interest, and . In fact, of all possible equations for a single explanatory variable, only functions of the general fromare linear in the coefficients and .In essence, any sort of configuration of the Xs and Ys can be used and the equation will tolerate to be linear in the coefficients. However, even a slight change in the configuration of the will cause the equation to beco me nonlinear in the coefficients. For example, equation 1.4 is not linear in the variables but is linear in the coefficients. The reason that Equation 1.4 is linear in the coefficients is that if you define f(X) = X2, Equation 1.4 fits into the general form of Equation 1.6.All this is important because if linear regression techniques are going to be applied to an equation, that equation must be linear in the coefficients. Linear regression analysis can be applied to an equation that is nonlinear in the variables if the equation can econometricians use the phrase linear regression, they usually mean regression that use the phrase linear regression, they usually mean regression that is linear in the coefficients. The masking of regression techniques to equations that are nonlinear in the coefficients will be discussed in section7.6.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.