معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Intuitive way of using fuzz matching in pandas

ویژگی	مقدار
سیستم عامل	-
نام فایل	a-pandas-ex-fuzz-0.13
نام	a-pandas-ex-fuzz
نسخه کتابخانه	0.13
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	Johannes Fischer
ایمیل نویسنده	<aulasparticularesdealemaosp@gmail.com>
آدرس صفحه اصلی	https://github.com/hansalemaos/a_pandas_ex_fuzz
آدرس اینترنتی	https://pypi.org/project/a-pandas-ex-fuzz/
مجوز	MIT

## Intuitive way of using fuzz matching in pandas ### Updates **05.10.2022** - Added compare rows ### Installation ```python #Try it first like this: #rapidfuzz is a lot faster than fuzzywuzzy, but I had some problems installing it, #even with Visual C++ 2019 redistributable installed a-pandas-ex-fuzz will try to import this module first pip install a-pandas-ex-plode-tool pip install a-pandas-ex-df-to-string pip install rapidfuzz #https://github.com/maxbachmann/RapidFuzz pip install --no-deps a-pandas-ex-fuzz #if rapidfuzz does not work, use: pip install a-pandas-ex-plode-tool pip install a-pandas-ex-df-to-string pip install fuzzywuzzy pip install --no-deps a-pandas-ex-fuzz #Or if you want to try to install everything: pip install a-pandas-ex-fuzz ``` ### Compare values in column against each other: pandas.Series.s_fuzz_all_values_in_col() ```python from a_pandas_ex_fuzz import pd_add_fuzzy_matching pd_add_fuzzy_matching() #adds three new methods to pd. import pandas as pd df = pd.read_csv( "https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv" ) df11 = df.Name.s_fuzz_all_values_in_column( limit=5, merge_with_series=True, partial_full_weighted="weighted" ) df22 = df.Name.s_fuzz_all_values_in_column( limit=2, merge_with_series=False, partial_full_weighted="full" ) df33 = df.Name.s_fuzz_all_values_in_column( limit=1, merge_with_series=True, partial_full_weighted="partial" ) df22 0 Braund... 70.833333 477 Cann, ... 63.829787 1 Angle,... 55.445545 518 Astor,... 53.061224 2 Sinkko... 79.069767 747 Honkan... 77.272727 3 Futrel... 77.142857 137 Potter... 52.873563 4 Gilles... 84.615385 722 Saunde... 77.777778 5 Bracke... 77.777778 221 Scanla... 76.470588 6 O'Brie... 65.116279 552 Maisne... 58.536585 7 Goodwi... 68.852459 386 Palsso... 67.857143 8 Rosblo... 62.068966 254 Hockin... 59.52381 9 Nasser... 74.074074 122 Astor,... 58.536585 fuzz_index_1 0 37 1 700 2 216 3 879 4 12 5 468 6 464 7 374 8 774 9 700 Parameters: df: [pd.Series] limit: int How many results do you want to have? Each result will have 3 columns [string, match, position in column] (default=5) partial_full_weighted: str weighted = fuzz.WRatio full = fuzz.ratio partial = fuzz.partial_ratio (default="weighted") merge_with_series: str (default=True) Returns: pd.DataFrame ``` ### Compare values in column against list: pandas.Series.s_fuzz_from_list() ```python from a_pandas_ex_fuzz import pd_add_fuzzy_matching pd_add_fuzzy_matching() #adds three new methods to pd. import pandas as pd df = pd.read_csv( "https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv" ) df111 = df.Name.s_fuzz_from_list( list_to_compare=["Johannes", "Paulo", "Kevin"], limit=2, merge_with_series=True, partial_full_weighted="partial", ) df222 = df.Name.s_fuzz_from_list( list_to_compare=["John", "Johannes", "Paulo", "Kevin"], limit=3, merge_with_series=False, partial_full_weighted="full", ) df333 = df.Name.s_fuzz_from_list( list_to_compare=["Maria", "Anna"], limit=1, merge_with_series=False, partial_full_weighted="partial", ) df333 fuzz_string_0 fuzz_match_0 fuzz_index_0 0 Maria 60.0 0 1 Maria 44.444444 0 2 Anna 75.0 1 3 Maria 40.0 0 4 Maria 40.0 0 .. ... ... ... 886 Maria 40.0 0 887 Maria 80.0 0 888 Maria 60.0 0 889 Maria 40.0 0 890 Maria 60.0 0 [891 rows x 3 columns] Parameters: df: [pd.Series] list_to_compare: list The strings you want to be compared limit: int How many results do you want to have? Each result will have 3 columns [string, match, position in column] (default=5) partial_full_weighted: str weighted = fuzz.WRatio full = fuzz.ratio partial = fuzz.partial_ratio (default="weighted") merge_with_series: str (default=True) Returns: pd.DataFrame ``` ### Compare values in column against list: pandas.Series.s_fuzz_one_word() ```python from a_pandas_ex_fuzz import pd_add_fuzzy_matching pd_add_fuzzy_matching() #adds three new methods to pd. import pandas as pd df = pd.read_csv( "https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv" ) df1 = df.Name.s_fuzz_one_word( word_to_search="Karolina", partial_full_weighted="weighted" ) df2 = df.Name.s_fuzz_one_word(word_to_search="Karolina", partial_full_weighted="full") df3 = df.Name.s_fuzz_one_word( word_to_search="Karolina", partial_full_weighted="partial" ) df1 Name fuzz_string_0 \ 0 Braund, Mr. Owen Harris Karolina 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) Karolina 2 Heikkinen, Miss. Laina Karolina 3 Futrelle, Mrs. Jacques Heath (Lily May Peel) Karolina 4 Allen, Mr. William Henry Karolina 5 Moran, Mr. James Karolina 6 McCarthy, Mr. Timothy J Karolina 7 Palsson, Master. Gosta Leonard Karolina 8 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) Karolina 9 Nasser, Mrs. Nicholas (Adele Achem) Karolina fuzz_match_0 0 41.538462 1 33.750000 2 60.000000 3 33.750000 4 42.750000 5 30.000000 6 27.692308 7 45.000000 8 45.600000 9 42.750000 df2 Name fuzz_string_0 \ 0 Braund, Mr. Owen Harris Karolina 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) Karolina 2 Heikkinen, Miss. Laina Karolina 3 Futrelle, Mrs. Jacques Heath (Lily May Peel) Karolina 4 Allen, Mr. William Henry Karolina 5 Moran, Mr. James Karolina 6 McCarthy, Mr. Timothy J Karolina 7 Palsson, Master. Gosta Leonard Karolina 8 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) Karolina 9 Nasser, Mrs. Nicholas (Adele Achem) Karolina fuzz_match_0 0 32.258065 1 17.241379 2 33.333333 3 15.686275 4 31.250000 5 25.000000 6 19.354839 7 31.578947 8 21.428571 9 23.809524 df3 Name fuzz_string_0 \ 0 Braund, Mr. Owen Harris Karolina 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) Karolina 2 Heikkinen, Miss. Laina Karolina 3 Futrelle, Mrs. Jacques Heath (Lily May Peel) Karolina 4 Allen, Mr. William Henry Karolina 5 Moran, Mr. James Karolina 6 McCarthy, Mr. Timothy J Karolina 7 Palsson, Master. Gosta Leonard Karolina 8 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) Karolina 9 Nasser, Mrs. Nicholas (Adele Achem) Karolina fuzz_match_0 0 46.153846 1 37.500000 2 66.666667 3 37.500000 4 46.153846 5 33.333333 6 30.769231 7 50.000000 8 50.000000 9 40.000000 Parameters: df: [pd.Series] word_to_search: str partial_full_weighted: str weighted = fuzz.WRatio full = fuzz.ratio partial = fuzz.partial_ratio (default="weighted") Returns: pd.DataFrame ``` ### pandas.Series.ds_fuzz_compare_row_to_others/ pandas.DataFrame.ds_fuzz_compare_row_to_others ```python from a_pandas_ex_fuzz import pd_add_fuzzy_matching pd_add_fuzzy_matching() import pandas as pd df = pd.read_csv("https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv") df.ds_fuzz_compare_row_to_others(2,loc_or_iloc='iloc', partial_full_weighted='full', sort_values=True) Out[4]: PassengerId Survived Pclass ... Cabin Embarked aa_fuzz_match 2 3 1 3 ... NaN S 100.000000 216 217 1 3 ... NaN S 90.816327 816 817 0 3 ... NaN S 88.118812 382 383 0 3 ... NaN S 83.769634 400 401 1 3 ... NaN S 83.769634 .. ... ... ... ... ... ... ... 745 746 0 1 ... B22 S 54.450262 556 557 1 1 ... A16 C 53.744493 581 582 1 1 ... C68 C 53.456221 669 670 1 1 ... C126 S 52.132701 307 308 1 1 ... C65 C 51.612903 [891 rows x 13 columns] ```

نیازمندی

مقدار	نام
-	a-pandas-ex-df-to-string
-	a-pandas-ex-plode-tool
-	fuzzywuzzy
-	pandas
-	rapidfuzz

نحوه نصب

نصب پکیج whl a-pandas-ex-fuzz-0.13:

pip install a-pandas-ex-fuzz-0.13.whl

نصب پکیج tar.gz a-pandas-ex-fuzz-0.13:

pip install a-pandas-ex-fuzz-0.13.tar.gz