/ / Correlation analysis method: an example. Correlation analysis is ...

Method of correlation analysis: an example. Correlation analysis is ...

In scientific research,the need to find a link between the productive and factor variables (the yield of a culture and the amount of precipitation, the height and weight of a person in homogeneous groups by sex and age, the pulse rate and body temperature, etc.).

The second are the signs that contribute to the change of those associated with them (the first).

The concept of correlation analysis

There are many definitions of the term. Proceeding from the foregoing, it can be said that correlation analysis is a method used to test a hypothesis about the statistical significance of two or more variables, if the researcher can measure them, but not change them.

There are other definitions of theconcepts. Correlation analysis is a method of processing statistical data, which consists in studying correlation coefficients between variables. This compares the correlation coefficients between one pair or a set of pairs of characteristics to establish statistical relationships between them. Correlation analysis is a method for studying the statistical dependence between random variables with an optional presence of a strict functional character, in which the dynamics of one random variable leads to the dynamics of the mathematical expectation of the other.

The concept of falsity of correlation

When conducting a correlation analysis,take into account that it can be carried out in relation to any set of characteristics, often absurd in relation to each other. Sometimes they have no causal relationship with each other.

In this case, they speak of a false correlation.

The problems of correlation analysis

Based on the above definitions, you canto formulate the following tasks of the described method: to obtain information about one of the unknown variables by means of another; determine the tightness of the relationship between the variables being studied.

Correlation analysis involves determining the relationship between the features being studied, and therefore the tasks of correlation analysis can be supplemented with the following:

  • identification of factors that have the greatest impact on the result;
  • identification of previously unexplained causes of links;
  • the construction of a correlation model with its parametric analysis;
  • study of the significance of communication parameters and their interval estimation.

The correlation between correlation analysis and regression

Correlation-regression analysis method
The method of correlation analysis is often notis limited to finding the tightness of the connection between the investigated quantities. Sometimes it is supplemented by the formulation of regression equations, which are obtained with the help of the same name analysis, and are a description of the correlation between the resulting and factor (feature) characteristics. This method, together with the analysis under consideration, is the method of correlation-regression analysis.

Terms of use

Effective factors depend on one toseveral factors. The method of correlation analysis can be used if there is a large number of observations about the magnitude of the productive and factor indicators (factors), while the factors studied must be quantitative and reflected in specific sources. The first can be determined by the normal law - in this case the Pearson correlation coefficients are the result of the correlation analysis, or, if the signs do not obey this law, the Spearman rank correlation coefficient is used.

Correlation analysis is

Rules for the selection of correlation analysis factors

When applying this method, it is necessarydetermine the factors that affect the performance indicators. They are selected taking into account the fact that causality should be present between the indicators. In the case of creating a multifactorial correlation model, those that significantly affect the resultant index are selected, with the interdependent factors with a coefficient of pair correlation greater than 0.85 in the correlation model preferably not included, as well as those in which the relationship with the resultant parameter is non-linear or functional character.

Display Results

The results of the correlation analysis can be presented in text and graphic forms. In the first case, they are represented as a correlation coefficient, in the second - in the form of a scatter diagram.

The results of the correlation analysis

If there is no correlation between the parameters of the pointon the diagram are located chaotically, the average degree of communication is characterized by a greater degree of ordering and is characterized by more or less uniform remoteness of the marked marks from the median. A strong link tends to a straight line, and for r = 1 a point chart is an even line. The inverse correlation is different from the direction of the graph from the upper left to the lower right, the straight line from the lower left to the upper right corner.

Three-dimensional representation of the scatter (scattering) diagram

In addition to the traditional 2D representation of the scatter diagram, a 3D mapping of the graphical representation of the correlation analysis is currently used.

Correlation analysis factors

A matrix of the scattering diagram is also used,which displays all the paired graphs in one figure in the matrix format. For n variables, the matrix contains n rows and n columns. The diagram located at the intersection of the i-th row and the j-th column is a graph of the variables Xi in comparison with Xj. Thus, each row and column is one dimension, a single cell displays a scatter diagram of two dimensions.

Correlation Analysis Solution

Estimation of tightness of communication

The tightness of the correlation relation is determined fromcorrelation coefficient (r): strong - r = ± 0.7 to ± 1, mean - r = ± 0.3 to ± 0.699, weak - r = 0 to ± 0.299. This classification is not strict. The figure shows a slightly different scheme.

Method of correlation analysis

An example of the application of the method of correlation analysis

In the UK, a curious study was undertaken. It is devoted to the relationship of smoking with lung cancer, and was carried out by correlation analysis. This observation is presented below.

Initial data for the correlation analysis

Professional group

smoking

mortality

Farmers, foresters and fishermen

77

84

Miners and quarry workers

137

116

Manufacturers of gas, coke and chemicals

117

123

Manufacturers of glass and ceramics

94

128

Workers of furnaces, forge, casting and rolling mills

116

155

Workers of electrical engineering and electronics

102

101

Engineering and related professions

111

118

Woodworking production

93

113

Leather goods

88

104

Textile workers

102

88

Manufacturers of work clothes

91

104

Food, drink and tobacco workers

104

129

Manufacturers of paper and printing

107

86

Manufacturers of other products

112

96

Builders

113

144

Artists and Decorators

110

139

Drivers of stationary engines, cranes, etc.

125

113

Workers not included elsewhere

133

146

Transport and Communications Workers

115

128

Warehouse workers, storekeepers, packers and workers of filling machines

105

115

Office workers

87

79

Sellers

91

85

Employees of the sport and recreation service

100

120

Administrators and managers

76

60

Professionals, technicians and artists

66

51

We begin the correlation analysis. The solution is better to start for clarity with the graphical method, for which we construct a scatter diagram (spread).

Correlation analysis example

It demonstrates a direct connection. However, based on only the graphic method, it is difficult to make an unambiguous conclusion. Therefore, we continue to perform the correlation analysis. An example of calculating the correlation coefficient is presented below.

Using software (for example, MSExcel will be described below), we determine the correlation coefficient, which is 0.716, which means a strong relationship between the parameters studied. We determine the statistical reliability of the obtained value from the corresponding table, for which we need to subtract 25 pairs of values ​​of 2, resulting in 23 and, on this line in the table, we find r critical for p = 0.01 (since this is medical data, a more rigorous in the remaining cases, p = 0.05), which is 0.51 for a given correlation analysis. The example demonstrated that r is greater than r critical, the value of the correlation coefficient is considered statistically reliable.

The use of software in conducting a correlation analysis

The described type of statistical data processingcan be implemented using software, in particular, MS Excel. Correlation analysis in Excel involves the calculation of the following parameters using functions:

1. The correlation coefficient is determined using the CORREL function (array1, array2). Array1,2 is a cell of the range of values ​​of productive and factor variables.

The linear correlation coefficient is also called the Pearson correlation coefficient, and therefore, starting with Excel 2007, you can use the PEARSON function with the same arrays.

Graphical representation of the correlation analysis in Excel is made using the "Diagrams" panel with the selection "Spot chart".

After indicating the initial data, we obtain a graph.

2. Evaluation of the significance of the pair correlation coefficient using Student's t-test. The calculated value of the t-test is compared with the tabular (critical) valueof this indicator from the corresponding table of values ​​of the considered parameter taking into account a given level of significance and the number of degrees of freedom. This assessment is carried out using the function TIRE (probability, degree_freedom).

3. Matrix of coefficients of pair correlation. The analysis is performed using the "Data analysis" tool, in which "Correlation" is selected. Statistical evaluation of the coefficients of pair correlation is carried out when comparing its absolute value with the tabular (critical) value. If the calculated coefficient of pair correlation is exceeded above this critical one, we can say, with a given degree of probability, that the null hypothesis about the significance of the linear connection is not rejected.

Finally

Use in scientific research methodcorrelation analysis allows you to determine the relationship between various factors and performance indicators. In this case, it is necessary to take into account that a high correlation coefficient can be obtained from an absurd pair or a set of data, and this kind of analysis must be performed on a sufficiently large data set.

After obtaining the calculated value of r, itit is desirable to compare it with r critical to confirm the statistical certainty of a certain value. Correlation analysis can be performed manually using formulas, or with the help of software tools, in particular MS Excel. Here it is also possible to construct a scatter (scattering) diagram for the purpose of visualizing the relationship between the studied correlation analysis factors and the resultant trait.

Read more: