معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

examines the relationships between a new feature and everyother feature in a dataset

ویژگی	مقدار
سیستم عامل	-
نام فایل	feature-test-0.1.21
نام	feature-test
نسخه کتابخانه	0.1.21
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	Thomas Wilmean
ایمیل نویسنده	twileman@redhat.com
آدرس صفحه اصلی	https://gitlab.cee.redhat.com/twileman/feature_test
آدرس اینترنتی	https://pypi.org/project/feature-test/
مجوز	MIT License

# feature_test This package provides tools to test one feature against all other features in a dataset. It is intended to determine whether a new feature is a good candidate for addition to an established dataset used in a machine learning model. The benefit of this package is that instead of adding a feature to a dataset, running it through a long training and evaluation process, and then interpreting results, the feature can be quickly tested using feature_test; enabling users to quickly discard unuseful features and move forward with potentially impactful ones. This package provides the following tools to aid this determination. - Correlation analysis - Calculate the correlations - Identify highly correlated feature - Categorize correlation - Chi-Square tests - Calculate the chi-square statistic and p-vlaue - Categorize the chi-square result - Recursive feature elimination (RFE) - Rank features by their importance - Lasso regularization coefficients - Calculate the coefficients of a linear model using Lasso regularization - Ridge regularization coefficients - Calculate the coefficients of a linear model using Ridge Regularization - Decision tree coefficients - Calculate the coefficients for each feature using a decision tree model * Special thanks to the AREN team for their guidance and Annie Tran for the open sourcing her feature test code. ## Installation python3 -m pip install feature-test ## utils.Utils class utils.Utils(X, columns) A collection of functions that enables users to get data on a dataframe or adjust it for testing. Parameters: <ul> <li>X: pandas.DataFrame <ul> <li>A pandas.DataFrame containing the dataset </ul> <li> columns: List <ul> <li>A list of strings correlating to features in a dataset. </ul></ul> Methods _______________________________________________________________________________________ <table> <tr> <td>get_columns(X: pd.DataFrame)</td> <td>Returns a list of column names in a dataframe.</td> </tr> <tr> <td>exclude_columns(X: pd.DataFrame, columns: List)</td> <td>Exclude a list of columns from a dataframe.</td> </tr> </table> ## tests.Correlation class tests.correlation(X, new_feature) A suite of functions that calculates and reports the correlation coefficient between features and all other features in a dataset. Parameters: <ul> <li>X: pandas.DataFrame <ul> <li>A pandas.DataFrame containing the dataset </ul> <li> new_feature: string <ul> <li>A string indicating the column in X to test against all other features </ul></ul> Methods _______________________________________________________________________________________ <table> <tr> <td>calc_corr(X: pd.DataFrame, new_feature: str)</td> <td>Returns a dataframe with the correlation values for each new_feature/feature combination.</td> </tr> <tr> <td>similar_corr(X: pd.DataFrame, new_feature: str)</td> <td>Returns a list of features highly correlated with the new_feature.</td> </tr> <tr> <td>categorize_correlations(X: pd.DataFrame, correlation_threshold: float = 0.6)</td> <td>Returns a dataframe with the correlations categorized. Possible values are high, medium, and low.</td> </tr> <tr> <td>get_correlations(X: pd.DataFrame, new_feature: str)</td> <td>Returns a dataframe of new_feature/feature combinations, their correlations, and their correlation category.</td> </tr> </table> ## tests.ChiSquare class tests.ChiSquare(X, new_feature) Calculates the chi-squared statistic of the new feature against each categorical feature in a dataset. Also categorizes the chi-square result based on the p-value and effect size as measured by cramers v. Parameters: <ul> <li>X: pandas.DataFrame <ul> <li>A pandas.DataFrame containing the dataset </ul> <li> new_feature: string <ul> <li>A string indicating the column in X to test against all other features </ul></ul> Methods _______________________________________________________________________________________ <table> <tr> <td>calc_chi_sq(X: pd.DataFrame, new_feature: str)</td> <td>Returns a dataframe that includes the chi-square result categorization.</td> </tr> </table> ## tests.FeatureSelection class tests.FeatureSelection(X, target) Parameters: <ul> <li>X: pandas.DataFrame <ul> <li>A pandas.DataFrame containing the dataset </ul> <li> target: string <ul> <li>A string indicating the prediction column </ul></ul> Methods _______________________________________________________________________________________ <table> <tr> <td>rfe_rankings(X: pd.DataFrame, target: str, classifier=None)</td> <td>Returns a dataframe that includes the recursive feature elimination feature ranking.</td> </tr> <tr> <td>lasso_rankings(X: pd.DataFrame, target: str)</td> <td>Returns a dataframe that includes the linear model coefficients for features after lasso regularization.</td> </tr> <tr> <td>ridge_coefficients(X: pd.DataFrame, target: str)</td> <td>Returns a dataframe that includes the linear model coefficients for features after ridge regularization.</td> </tr> <tr> <td>dtree_coefficients(X: pd.DataFrame, target: str)</td> <td>Returns a dataframe that includes the decision tree model coefficients for features.</td> </tr> <tr> <td>run_feature_classifiers(X: pd.DataFrame, target: str)</td> <td>Returns a dataframe that includes the the results for rfe_rankings, lasso_rankings, ridge_rankings, and dtree_rankings.</td> </tr> </table> ## Examples _______________________________________________________________________________________ ```python from pandas import util from feature_test.utils import Utils from feature_test.feature_tests import Correlation, FeatureSelection, ChiSq # Create a test dataset df= util.testing.makeDataFrame() df.head() ``` | feature | A | B | C | D | |---|---|---|---|---| | lRhANYYD2r | 0.572559 | -1.409978 | 0.687618 | -0.923502 | | YzYG07kY1O | 0.145629 | -1.446946 | -0.003526 | 0.304385 | | cT3KK078Gt | -1.007378 | 1.263980 | 1.107897 | 0.844689 | | JW4Kg2EGVo | 0.536701 | -1.477372 | -0.866873 | 1.539458 | | 2mucO1cf2Z | -1.101875 | 0.518555 | 0.384916 | -0.031403 | ```python c = Utils.get_columns(df) ``` ['A', 'B', 'C', 'D'] ```python corr_df = Correlation.calc_corr(df, 'A') corr_df ``` | feature_1 | feature_2 | corr | |-----------|-----------|------| | A | B | 0.081662 | | A | C | 0.203858 | | A | D | 0.064999 | ```python rep_df = Correlation.get_correlations(df, 'D') rep_df ``` | feature_1 | feature_2 | corr | corr_cat | |-----------|-----------|------|-----------| | A | B | 0.071466 | low | | A | C | 0.105306 | low | | A | D | 0.121130 | low | ```python ChiSq.calc_chi_sq(df, 'A') ``` | feature_1 | feature_2 | chi_sq_cat | |-----------|-----------|-----------| | A | B | NOT SIGNIFICANT | | A | C | NOT SIGNIFICANT | | A | D | NOT SIGNIFICANT | ```python FeatureSelection.rfe_rankings(df, 'A') ``` | feature | rfe_rank | |-----------|-----------| | B | 3 | | C | 1 | | D | 2 | ```python FeatureSelection.lasso_rankings(df, 'A') ``` | feature | lasso_coef | lasso_importance | |-----------|-----------|-----------| | B | 0.0 | 2.0 | | C | 0.0 | 2.0 | | D | -0.0 | 2.0 | ```python FeatureSelection.ridge_rankings(df, 'A') ``` | feature | ridge_coef | ridge_importance |-----------|-----------|-----------| | B | 0.050871 | 3.0 | | C | 0.096362 | 1.0 | | D | -0.095220 | 2.0 | ```python FeatureSelection.dtree_rankings(df, 'A') ``` | feature | random_forest_coefficient | random_forest_importance | |-----------|-----------|-----------| | B | 0.323091 | 2.0 | | C | 0.269673 | 3.0 | | D | 0.407236 | 1.0 | ```python FeatureSelection.dtree_rankings(df, 'A') ``` | feature | rfe_rank | lasso_coef | lasso_importance | ridge_coef | ridge_importance | random_forest_coefficient | random_forest_importance | |-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------| | B | 3 | 0.0 | 2.0 | 0.050871 | 3.0 | 0.323091 | 2.0 | C | 1 | 0.0 | 2.0 | 0.096362 | 1.0 | 0.269673 | 3.0 | D | 2 | -0.0 | 2.0 | -0.095220 | 2.0 | 0.407236 | 1.0

نیازمندی

مقدار	نام
>=1.21.5	numpy
>=1.1.2	pandas
>=1.5.2	scipy
-	scikit-learn

زبان مورد نیاز

مقدار	نام
>=3.6	Python

نحوه نصب

نصب پکیج whl feature-test-0.1.21:

pip install feature-test-0.1.21.whl

نصب پکیج tar.gz feature-test-0.1.21:

pip install feature-test-0.1.21.tar.gz