# correlation
Calculate confidence intervals for correlation coefficients, including Pearson's R, Kendall's tau, Spearman's rho, and customized correlation measures.
## Methodology
Two approaches are offered to calculate the confidence intervals, one parametric approach based on normal approximation, and one non-parametric approach based on bootstrapping.
### Parametric Approach
Say r\_hat is the correlation we obtained, then with a transformation
```
z = ln((1+r)/(1-r))/2,
```
z would approximately follow a normal distribution,
with a mean equals to z(r\_hat),
and a variance sigma^2 that equals to 1/(n-3), 0.437/(n-4), (1+r_hat^2/2)/(n-3) for the Pearson's r, Kendall's tau, and Spearman's rho, respectively (read Ref. [1, 2] for more details). n is the array length.
The (1-alpha) CI for r would be
```
(T(z_lower), T(z_upper))
```
where T is the inverse of the transformation mentioned earlier
```
T(x) = (exp(2x) - 1) / (exp(2x) + 1),
```
```
z_lower = z - z_(1-alpha/2) sigma,
```
```
z_upper = z + z_(1-alpha/2) sigma.
```
This normal approximation works when the absolute values of the Pearson's r, Kendall's tau, and Spearman's rho are less than 1, 0.8, and 0.95, respectively.
### Nonparametric Approach
For the nonparametric approach, we simply adopt a naive bootstrap method.
* We sample a pair (x\_i, y\_i) with replacement from the original (paired) samples until we have a sample size that equals to n, and calculate a correlation coefficient from the new samples.
* Repeat this process for a large number of times (by default we use 5000),
* then we could obtain the (1-alpha) CI for r by taking the alpha/2 and (1-alpha/2) quantiles of the obtained correlation coefficients.
## References
[1] Bonett, Douglas G., and Thomas A. Wright. "Sample size requirements for estimating Pearson, Kendall and Spearman correlations." Psychometrika 65, no. 1 (2000): 23-28.
[2] Bishara, Anthony J., and James B. Hittner. "Confidence intervals for correlations when data are not normal." Behavior research methods 49, no. 1 (2017): 294-309.
## Installation:
```
pip install correlation
```
or
```
conda install -c wangxiangwen correlation
```
## Example Usage:
```python
>>> import correlation
>>> a, b = list(range(2000)), list(range(200, 0, -1)) * 10
>>> correlation.corr(a, b, method='spearman_rho')
(-0.0999987624920335, # correlation coefficient
-0.14330929583811683, # lower endpoint of CI
-0.056305939127336606, # upper endpoint of CI
7.446171861744971e-06) # p-value
```