معرفی شرکت ها


ftm-columnstore-0.1.1


Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Column store implementation for ftm data based on clickhouse
ویژگی مقدار
سیستم عامل -
نام فایل ftm-columnstore-0.1.1
نام ftm-columnstore
نسخه کتابخانه 0.1.1
نگهدارنده []
ایمیل نگهدارنده []
نویسنده =?utf-8?q?Simon_W=C3=B6rpel?=
ایمیل نویسنده simon.woerpel@medienrevolte.de
آدرس صفحه اصلی https://github.com/simonwoerpel/ftm-columnstore
آدرس اینترنتی https://pypi.org/project/ftm-columnstore/
مجوز MIT
# ftm-columnstore This library provides methods to store, fetch and list entities formatted as [`followthemoney`](https://github.com/alephdata/followthemoney) data as datasets stored in a column store backend using [clickhouse](https://clickhouse.com/) This roughly follows the functionality and features from [followthemoney-store](https://github.com/alephdata/followthemoney-store) but with a huge performance benefit on writing and querying data. `FtM` data is stored in one table in [statements](#statements) format. **Minimum Python version: 3.10** ## Usage Set up a running clickhouse instance (pointed to via `DATABASE_URI` env var, default: `localhost`), for developing purposes this could work: make clickhouse Then initialize the required table schema: ftmcs init Or drop existing data and recreate: ftmcs init --recreate To test if it's working, run a raw query: ftmcs query "SHOW TABLES" When using the `make clickhouse` command, you can play around with SQL queries in your browser: http://127.0.0.1:8123/play ### Command-line usage ```bash # Insert a bunch of FtM entities into a store: cat ftm-entities.ijson | ftmcs write -d my_dataset # Re-create the entities in aggregated form: ftmcs iterate -d my_dataset | alephclient write-entities -f my_dataset ``` ``` Usage: ftmcs [OPTIONS] COMMAND [ARGS]... Store FollowTheMoney object data in a column store (Clickhouse) Options: --log-level TEXT Set logging level [default: info] --uri TEXT Database connection URI [default: localhost] --table TEXT Database table [default: ftm] --help Show this message and exit. Commands: delete Delete dataset or complete store fingerprints Generate fingerprint statements as csv from json entities... flatten Turn json entities from `infile` into statements in csv... init Initialize database and table. iterate Iterate entities list List datasets in a store query Execute raw query and print result (csv format) to outfile statements Dump all statements as csv write Write json entities from `infile` to store. ``` ### Python Library ```python from ftm_columnstore import get_dataset dataset = get_dataset("us-ofac") dataset.store.put(entity) entity = dataset.store.get("entity-id") ``` Bulk writer behaves the same like in [`followthemoney-store`](https://github.com/alephdata/followthemoney): ```python from ftm_columnstore import get_dataset dataset = get_dataset("us-ofac") bulk = dataset.store.bulk() for entity in many_entities: bulk.put(entity) bulk.flush() ``` #### Querying entities There is some weird and unintuitive stuff going on building these queries as turning the statements back into `FtM` entities is a bit hacky here, but from top-level, it feels quite nice: ```python from ftm_columnstore.query import EntityQuery q = EntityQuery().where(schema="Person") # queries are always streaming result iterator entities = [e for e in q] # querying for properties: q = EntityQuery().where(schema="Payment", amount__gte=1000) ``` ## benchmarks [OpenSanctions](https://opensanctions.org) dataset **tl;dr** For a small dataset, the original `followthemoney-store` implementation is faster, as the columnstore implementation has the statements overhead. Benefits come from the explicit querying features. ```bash curl -s https://data.opensanctions.org/datasets/latest/all/entities.ftm.json\?`date '+%s'` > opensanctions.ftm.ijson ``` All tests run on the same Dell XPS Laptop, 16 GB, 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz ### write `ftm-columnstore`: 99s ```bash date +"%H:%M:%S" ; ftmcs write -d opensanctions -i opensanctions.ftm.ijson ; date +"%H:%M:%S" 15:44:51 ftm_columnstore.dataset [INFO] [opensanctions] Write: 100000 entities with 532100 statements. # ... ftm_columnstore.dataset [INFO] [opensanctions] Write: 536131 entities with 5565069 statements. 15:46:10 ``` `followthemoney-store` (postgres): 50s ```bash date +"%H:%M:%S" ; ftm store write -d opensanctions -i opensanctions.ftm.ijson ; date +"%H:%M:%S" 15:46:55 ftmstore [INFO] Write [opensanctions]: 10000 entities # ... ftmstore [INFO] Write [opensanctions]: 530000 entities 15:47:45 ``` ### iterate `followthemoney-store` (postgres): 46s ```bash date +"%H:%M:%S" ; ftm store iterate -d opensanctions > /dev/null ; date +"%H:%M:%S" 15:48:34 15:49:20 ``` `ftm-columnstore`: 87s ```bash date +"%H:%M:%S" ; ftmcs iterate -d opensanctions > /dev/null ; date +"%H:%M:%S" 15:49:27 15:50:54 ```


نیازمندی

مقدار نام
- Click
- banal
- clickhouse-driver[numpy]
- followthemoney
- libindic-soundex
- libindic-utils
- metaphone
- nomenklatura
<1.24 numpy
- orjson
- pandas
- pyicu
- structlog


نحوه نصب


نصب پکیج whl ftm-columnstore-0.1.1:

    pip install ftm-columnstore-0.1.1.whl


نصب پکیج tar.gz ftm-columnstore-0.1.1:

    pip install ftm-columnstore-0.1.1.tar.gz