معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Multi-threaded matrix multiplication and cosine similarity calculations.

ویژگی	مقدار
سیستم عامل	-
نام فایل	chunkdot-0.2.4
نام	chunkdot
نسخه کتابخانه	0.2.4
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	Rodrigo Agundez
ایمیل نویسنده	rragundez@gmail.com
آدرس صفحه اصلی	https://github.com/rragundez/chunkdot
آدرس اینترنتی	https://pypi.org/project/chunkdot/
مجوز	-

# ChunkDot Multi-threaded matrix multiplication and cosine similarity calculations for dense and sparse matrices. Appropriate for calculating the K most similar items for a large number of items by chunking the item matrix representation (embeddings) and using Numba to accelerate the calculations. ## Related blog posts - [Cosine Similarity for 1 Trillion Pairs of Vectors ](https://pub.towardsai.net/cosine-similarity-for-1-trillion-pairs-of-vectors-11f6a1ed6458) - [Bulk Similarity Calculations for Sparse Embeddings ](https://pub.towardsai.net/scale-up-bulk-similarity-calculations-for-sparse-embeddings-fb3ecb624727) ## Usage ```bash pip install -U chunkdot ``` ### Dense embeddings Calculate the 50 most similar and dissimilar items for 100K items. ```python import numpy as np from chunkdot import cosine_similarity_top_k embeddings = np.random.randn(100000, 256) # using all you system's memory cosine_similarity_top_k(embeddings, top_k=50) # most dissimilar items using 20GB cosine_similarity_top_k(embeddings, top_k=-50, max_memory=20E9) ``` ``` <100000x100000 sparse matrix of type '<class 'numpy.float64'>' with 5000000 stored elements in Compressed Sparse Row format> ``` ```python # with progress bar cosine_similarity_top_k(embeddings, top_k=50, show_progress=True) ``` ``` 100%|███████████████████████████████████████████████████████████████| 129.0/129 [01:04<00:00, 1.80it/s] <100000x100000 sparse matrix of type '<class 'numpy.float64'>' with 5000000 stored elements in Compressed Sparse Row format> ``` Execution time ```python from timeit import timeit import numpy as np from chunkdot import cosine_similarity_top_k embeddings = np.random.randn(100000, 256) timeit(lambda: cosine_similarity_top_k(embeddings, top_k=50, max_memory=20E9), number=1) ``` ``` 58.611996899999994 ``` ### Sparse embeddings Calculate the 50 most similar and dissimilar items for 100K items. Items represented by 10K dimensional vectors and an embeddings matrix of 0.005 density. ```python from scipy import sparse from chunkdot import cosine_similarity_top_k embeddings = sparse.rand(100000, 10000, density=0.005) # using all you system's memory cosine_similarity_top_k(embeddings, top_k=50) # most dissimilar items using 20GB cosine_similarity_top_k(embeddings, top_k=-50, max_memory=20E9) ``` ``` <100000x100000 sparse matrix of type '<class 'numpy.float64'>' with 5000000 stored elements in Compressed Sparse Row format> ``` Execution time ```python from timeit import timeit from scipy import sparse from chunkdot import cosine_similarity_top_k embeddings = sparse.rand(100000, 10000, density=0.005) timeit(lambda: cosine_similarity_top_k(embeddings, top_k=50, max_memory=20E9), number=1) ``` ``` 51.87472256699999 ```

نیازمندی

مقدار	نام
>=0.56.4,<0.57.0	numba
>=1.23,<2.0	numpy
>=1.10.1,<2.0.0	scipy
>=0.0.4,<0.0.5	numba-progress

زبان مورد نیاز

مقدار	نام
>=3.8,<3.11	Python

نحوه نصب

نصب پکیج whl chunkdot-0.2.4:

pip install chunkdot-0.2.4.whl

نصب پکیج tar.gz chunkdot-0.2.4:

pip install chunkdot-0.2.4.tar.gz