معرفی شرکت ها


bootstrap-CIHT-0.1.1


Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Confidence intervals & hypothesis testing using bootstrap samples
ویژگی مقدار
سیستم عامل -
نام فایل bootstrap-CIHT-0.1.1
نام bootstrap-CIHT
نسخه کتابخانه 0.1.1
نگهدارنده []
ایمیل نگهدارنده []
نویسنده Erik Sorensen
ایمیل نویسنده ens59327@gmail.com
آدرس صفحه اصلی https://github.com/ensst6/bootstrap_CIHT
آدرس اینترنتی https://pypi.org/project/bootstrap-CIHT/
مجوز -
# bootstrap-CIHT A Python module to calculate confidence intervals and perform hypothesis tests on the mean or proportion of one or two groups using bootstrap sampling. Selected (hopefully useful) visualizations of the data and null distribution are provided via [matplotlib](https://matplotlib.org). ## Index [Description](#description) [Parameters](#parameters) [Attributes](#attributes) [Methods](#methods) [Usage](#usage) [Dependencies](#depend) [Installation](#install) [Example](#example) [Additional Tests](#addtests) [History](#history) [License](#license) ### <a id="description"></a>Description *class* bootstrap_CIHT.**Bootstrap_CIHT**_(data, data_col, num_vars=1, null_mean=0.0, group_col=None, group1=None, group2=None, samples=10000, alpha=0.05, h_sides=2, h1_dir=None)_ #### <a id="parameters"></a>Parameters **data (Pandas dataframe):** Unprocessed user data. Should contain at least a column with the data to analyze. For proportions, data should be coded as 0 or 1. If a two-group comparison, should also have a column containing group membership labels. **data_col (string):** Name of column containing the data to be analyzed **num_vars (int):** Number of groups (1 or 2); default 1 **null_mean(float):** For one-group test, comparison mean or proportion; default 0.0 Should be left at 0.0 for two-group comparisons. **group_col (string):** For two-group comparison, name of column with group labels **group1 (int, bool, string):** For two-group comparison, label for first (control) group **group2 (int, bool, string):** For two-group comparison, label for second (experiment) group **samples (int):** Number of bootstrap samples; default 10000 **alpha (float):** Sets size of confidence interval (100*(1-alpha)); default 0.05 **h_sides (int):** One- or two-sided confidence interval & hypothesis test; default 2 **h1_dir (string):** Inequality direction for 1-sided alternative hypothesis: 'greater' -> experiment parameter > control parameter 'less' -> experiment parameter < control parameter None (default) -> 2-sided comparison #### <a id="attributes"></a>Attributes: **df (Pandas dataframe):** Dataframe derived from user-supplied `data` for use in analysis #### <a id="methods"></a>Methods: **fill_data**_()_ Extract data for analysis from user-supplied dataframe. Performed automatically on instantiation. _**Args:**_ None _**Returns:**_ **df (Pandas dataframe):** Dataframe containing data to be analyzed ("data" column) and group membership ("group" column) for two-group comparison **get_bootstrap_sample**_()_ Create sampling distributions using bootstrapping. _**Args:**_ None _**Returns:**_ _For one-group analysis:_ **experiment_mean (numpy array):** Bootstrapped means of the group of interest _For two-group comparison:_ **experiment_mean (numpy array):** Bootstrapped means for experiment group **control_mean (numpy array):** Bootstrapped means for control group **diffs (numpy array):** Differences in means of bootstrapped samples **calculate_CI**_(means)_ Compute and print 100\*(1-alpha) confidence intervals for the relevant sampling distribution. For a one-group case, this is for the mean of the data. For a two-group case, it is for the difference in means. **_Args:_** **means (numpy array):** Contains either the bootstrapped mean (for a single group) or difference in means (for two groups) _**Returns:**_ **CI (tuple):** Upper and lower confidence limits **calculate_hypothesis_test**_(means)_ Calculate p-value for a 1- or 2-side hypothesis test comparing the sample mean to the null mean. For two-group comparisons, difference in means is compared to a null mean of zero. Plots the observed statistic on a histogram of the null distribution. **_Args:_** **means (numpy array):** Contains either the bootstrapped mean (for a single group) or difference in means (for two groups) _**Returns:**_ **p_value (float):** Probability of the observed statistic given the null hypothesis **plot_hist_CI**_(means, CI, bins=10)_ Plot the sampling distribution with confidence intervals as vertical red lines. For the single-group case, also plots the null mean as a vertical green line. **_Args:_** **means (numpy array):** Contains either the bootstrapped mean (for a single group) or difference in means (for two groups) **CI (tuple):** Confidence limits. **bins (int):** Number of bins in the histogram; default 10 **_Returns:_** None **plot_distribs**_(experiment_mean, control_mean, bins=10)_ For a two-group comparison, plot histograms of the control and experimental sampling distributions on a single graph. **_Args:_** **experiment_mean (np array):** Bootstrapped samples of mean for experiment group **control_mean (np array):** Bootstrapped samples of mean for control group **bins (int):** Number of bins in the histogram; default 10 **_Returns:_** None ### <a id="usage"></a>Usage #### <a id="depend"></a>Dependencies `numpy : 1.19.1` `pandas : 1.1.3` `matplotlib : 3.3.1` additionally for unit tests: `scipy 1.5.2` Developed & tested in `Python 3.8.5` #### <a id="install"></a>Installation Assuming working `numpy`, `pandas`, and `matplotlib`: `> pip install bootstrap_CIHT` #### <a id="example"></a>Example This uses a 'toy' dataset (available in the `tests` folder) with two groups: "success" and "failure". The success group is 1000 samples from a normal distribution with mean 0.0 and standard deviation 0.5. The failure groups is 1000 samples from a normal distribution with a mean of 1.0 and standard deviation of 0.5. Note that each pop-up plot window will need to be closed for the script to continue. ```python >>> import pandas as pd >>> import numpy as np >>> from bootstrap_CIHT import Bootstrap_CIHT as BS >>> np.random.seed(42) #for reproducibility >>> df = pd.read_csv('nml2.csv') >>> bs1 = BS(df, 'numbers', num_vars=2, group_col='outcome', group1='failure', group2='success') >>> bs1.df.head() #the processed data data group 0 0.248357 success 1 -0.069132 success 2 0.323844 success 3 0.761515 success 4 -0.117077 success >>> bs1.df.tail() data group 1995 1.535075 failure 1996 0.986739 failure 1997 0.559063 failure 1998 0.918467 failure 1999 0.627549 failure >>> exp_mean, ctrl_mean, diffs = bs1.get_bootstrap_sample() >>> CIs = bs1.calculate_CI(diffs) 2-sided CI limits: 2.5 97.5 95% 2-sided CI for differences in means of numbers: (-1.0697323775784746, -0.9825440488702097) >>> p_val = bs1.calculate_hypothesis_test(diffs) experimental group: success; mean: 0.0097 control group: failure; mean: 1.0354 difference of means, sampling distribution SD: -1.0258, 0.0223 H0: Difference in means success vs failure = 0 H1: Difference in means success vs failure <> 0 p-value: 0.000000 >>> bs1.plot_hist_CI(diffs, CIs, bins=20) >>> bs1.plot_distribs(exp_mean, ctrl_mean, bins=20) ``` The plot produced by `calculate_hypothesis_test` is: ![Figure 1](./images/Null+mean.png) The plot from `plot_hist_CI` is: ![Figure 2](./images/diffs.png) Finally, `plot_distribs` produces: ![Figure 3](./images/distribs.png) #### <a id="addtests"></a>Additional Tests The `tests` folder contains unit tests in `test.py`. These use the same toy data as above and are run via: `>python -m unittest test.py` There are also three other files (`test0.py`, `test1.py`, `test2.py`, `test3.py`) that reproduce exercises from a Udacity Introduction to Statistics course I previously took. The source `.csv` files are included, as are the Jupyter notebooks with the original exercises and results. Note that the exercise in `test0.py` appears to produce results that vary a fair amount with each re-sampling, so you may get inconsistent CIs and p-values. ### <a id="history"></a>History Created April 12, 2021 ### <a id="license"></a>License [Licensed](license.md) under the [MIT License](https://spdx.org/licenses/MIT.html).


نیازمندی

مقدار نام
- numpy
- pandas
- matplotlib


زبان مورد نیاز

مقدار نام
>=3.6 Python


نحوه نصب


نصب پکیج whl bootstrap-CIHT-0.1.1:

    pip install bootstrap-CIHT-0.1.1.whl


نصب پکیج tar.gz bootstrap-CIHT-0.1.1:

    pip install bootstrap-CIHT-0.1.1.tar.gz