معرفی شرکت ها


elasticmagic-qf-attrs-0.1.4


Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

-
ویژگی مقدار
سیستم عامل -
نام فایل elasticmagic-qf-attrs-0.1.4
نام elasticmagic-qf-attrs
نسخه کتابخانه 0.1.4
نگهدارنده ['Alexander Koval']
ایمیل نگهدارنده ['kovalidis@gmail.com']
نویسنده Alexander Koval
ایمیل نویسنده kovalidis@gmail.com
آدرس صفحه اصلی https://github.com/anti-social/elasticmagic-qf-attrs
آدرس اینترنتی https://pypi.org/project/elasticmagic-qf-attrs/
مجوز Apache-2.0
[![Test and build](https://github.com/anti-social/elasticmagic-qf-attrs/actions/workflows/python.yaml/badge.svg)](https://github.com/anti-social/elasticmagic-qf-attrs/actions/workflows/python.yaml) Elasticmagic query filters for attributes ========================================= Library to store, filter and build facets for custom attributes The problem ----------- Each attribute pair can be stored in the index as a nested document. We can use following mapping for that: ```yaml attrs: type: nested properties: attr_id: type: integer # usually only one of the next fields should be populated value_int: type: integer value_bool: type: boolean value_float: type: float ``` This makes it possible to filter documents by an attribute id and its value (for example we want to find all the documents with `attr_id = 1234` and `value = 5678`): ```yaml query: bool: filter: - nested: path: attrs query: bool: must: - term: attrs.attr_id: 1234 - term: attrs.value_int: 5678 ``` It is also possible to build a facets for all attributes at once: ```yaml aggs: attrs_nested: nested: path: attrs aggs: attrs: terms: field: attrs.attr_id aggs: values: field: attrs.value_int ``` or for a single attribute: ```yaml aggs: attrs_nested: nested: path: attrs aggs: attr_1234: filter: term: attrs.attr_id: 1234 aggs: values: field: attrs.value_int ``` But nested documents have some drawbacks. Every nested document is stored in the index as different document. For instance, next document will be stored as 5 lucene documents: ```yaml name: "I'm a document with nested attributes" attrs: - attr_id: 1 value_int: 42 - attr_id: 2 value_int: 43 - attr_id: 3 value_bool: true - attr_id: 4 value_float: 99.9 ``` Nested queries are slow by itself: > In particular, joins should be avoided. nested can make queries several times slower and parent-child relations can make queries hundreds of times slower. > - https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-search-speed.html#_document_modeling But what is worse regular queries are also slower when there are nested documents in the index. It is because of all the fields of main documents becomes sparse. This in turn degrades performance of all filters and accesses to doc_values. The solution ------------ The idea is to encode pair of an attribute id and a corresponding value into a single value. If our attribute ids are 32-bit integers and all value types also fit into 32 bits we can store them as a single 64-bit value. So our mapping can be: ```yaml attrs: type: object properties: int: type: long bool: type: long float: type: long ``` Document with encoded attributes: ```yaml name: "I'm a document with packed attributes" attrs: # (1 << 32) | 42 - int: 0x1_0000002a # (2 << 32) | 43 - int: 0x2_0000002b # (3 << 1) | 1 - bool: 0x7 # (4 << 32) | {integer representation of 99.9} # (4 << 32) | struct.unpack('=I', struct.pack('=f', 99.9))[0] - float: 0x4_42c7cccd ``` Now with a bit of bit magic we can emulate nested queries. Filtering by attribute id `1234` with value `5678`: ```yaml query: bool: filter: - term: attrs.int: 0x4d2_0000162e ``` Building facet for all attribute values: ```yaml aggs: attrs_int: terms: field: attrs.int # specify big enough aggregation size # so all flat attrite values should fit size: 10000 ``` One more step that client should do is to decode and group values by attribute id. How to use it in python ----------------------- ```python from elasticsearch import Elasticsearch from elasticmagic import Cluster, Document, Field from elasticmagic.types import List, Long from elasticmagic.ext.queryfilter import QueryFilter from elasticmagic_qf_attrs import AttrBoolFacetFilter from elasticmagic_qf_attrs import AttrIntFacetFilter from elasticmagic_qf_attrs import AttrRangeFacetFilter from elasticmagic_qf_attrs.util import merge_attr_value_bool from elasticmagic_qf_attrs.util import merge_attr_value_float from elasticmagic_qf_attrs.util import merge_attr_value_int # Specify document class AttrsDocument(Document): __doc_type__ = 'attrs' ints = Field(List(Long)) bools = Field(List(Long)) floats = Field(List(Long)) # Create an index index_name = 'test-attrs' client = Elasticsearch() client.indices.create(index=index_name) cluster = Cluster(client) index = cluster.get_index(index_name) index.put_mapping(AttrsDocument) # Index example document index.add([ AttrsDocument( ints=[ merge_attr_value_int(1, 42), merge_attr_value_int(2, 43), ], bools=[merge_attr_value_bool(3, True)], floats=[merge_attr_value_float(4, 99.9)], ), ], refresh=True) # Define a query filter class AttrsQueryFilter(QueryFilter): ints = AttrIntFacetFilter(AttrsDocument.ints, alias='a') bools = AttrBoolFacetFilter(AttrsDocument.bools, alias='a') ranges = AttrRangeFacetFilter(AttrsDocument.floats, alias='a') # Now we can build facets qf = AttrsQueryFilter() sq = index.search_query() sq = qf.apply(sq, {}) res = sq.get_result() assert res.total == 1 qf_res = qf.process_result(res) # And finally lets print results for attr_id, facet in qf_res.ints.facets.items(): print(f'> {attr_id}:') for facet_value in facet.all_values: print(f' {facet_value.value}: ({facet_value.count_text})') for attr_id, facet in qf_res.bools.facets.items(): print(f'> {attr_id}:') for facet_value in facet.all_values: print(f' {facet_value.value}: ({facet_value.count_text})') for attr_id, facet in qf_res.ranges.facets.items(): print(f'> {attr_id}: ({facet.count})') # Also we can filter documents: qf = AttrsQueryFilter() sq = index.search_query() sq = qf.apply( sq, { 'a1': '42', 'a3': 'true', 'a4__lte': '100', } ) res = sq.get_result() assert res.total == 1 qf = AttrsQueryFilter() sq = index.search_query() sq = qf.apply( sq, { 'a4__gte': '100', } ) res = sq.get_result() assert res.total == 0 ``` This script should print: ``` > 1: 42: (1) > 2: 43: (1) > 3: True: (1) > 4: (1) ```


نیازمندی

مقدار نام
>=0.1.0-beta.1,<0.2.0 elasticmagic


زبان مورد نیاز

مقدار نام
>=3.7,<4.0 Python


نحوه نصب


نصب پکیج whl elasticmagic-qf-attrs-0.1.4:

    pip install elasticmagic-qf-attrs-0.1.4.whl


نصب پکیج tar.gz elasticmagic-qf-attrs-0.1.4:

    pip install elasticmagic-qf-attrs-0.1.4.tar.gz