معرفی شرکت ها


colmet-0.6.8


Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

A utility to monitor the jobs ressources in a HPC environment, espacially OAR
ویژگی مقدار
سیستم عامل OS Independent
نام فایل colmet-0.6.8
نام colmet
نسخه کتابخانه 0.6.8
نگهدارنده ['Salem Harrache']
ایمیل نگهدارنده ['salem.harrache@inria.fr']
نویسنده Philippe Le Brouster, Olivier Richard
ایمیل نویسنده philippe.le-brouster@imag.fr, olivier.richard@imag.fr
آدرس صفحه اصلی http://oar.imag.fr/
آدرس اینترنتی https://pypi.org/project/colmet/
مجوز GNU GPL
# Colmet - Collecting metrics about jobs running in a distributed environnement ## Introduction: Colmet is a monitoring tool to collect metrics about jobs running in a distributed environnement, especially for gathering metrics on clusters and grids. It provides currently several backends : - Input backends: - taskstats: fetch task metrics from the linux kernel - rapl: intel processors realtime consumption metrics - perfhw: perf_event counters - jobproc: get infos from /proc - ipmipower: get power metrics from ipmi - temperature: get temperatures from /sys/class/thermal - infiniband: get infiniband/omnipath network metrics - lustre: get lustre FS stats - Output backends: - elasticsearch: store the metrics on elasticsearch indexes - hdf5: store the metrics on the filesystem - stdout: display the metrics on the terminal It uses zeromq to transport the metrics across the network. It is currently bound to the [OAR](http://oar.imag.fr) RJMS. A Grafana [sample dashboard](./graph/grafana) is provided for the elasticsearch backend. Here are some snapshots: ![](./screenshot1.png) ![](./screenshot2.png) ## Installation: ### Requirements - a Linux kernel that supports - Taskstats - intel_rapl (for RAPL backend) - perf_event (for perfhw backend) - ipmi_devintf (for ipmi backend) - Python Version 2.7 or newer - python-zmq 2.2.0 or newer - python-tables 3.3.0 or newer - python-pyinotify 0.9.3-2 or newer - python-requests - For the Elasticsearch output backend (recommended for sites with > 50 nodes) - An Elasticsearch server - A Grafana server (for visu) - For the RAPL input backend: - libpowercap, powercap-utils (https://github.com/powercap/powercap) - For the infiniband backend: - `perfquery` command line tool - for the ipmipower backend: - `ipmi-oem` command line tool (freeipmi) or other configurable command ### Installation You can install, upgrade, uninstall colmet with these commands:: ``` $ pip install [--user] colmet $ pip install [--user] --upgrade colmet $ pip uninstall colmet ``` Or from git (last development version):: ``` $ pip install [--user] git+https://github.com/oar-team/colmet.git ``` Or if you already pulled the sources:: ``` $ pip install [--user] path/to/sources ``` ### Usage: for the nodes : ``` sudo colmet-node -vvv --zeromq-uri tcp://127.0.0.1:5556 ``` for the collector : ``` # Simple local HDF5 file collect: colmet-collector -vvv --zeromq-bind-uri tcp://127.0.0.1:5556 --hdf5-filepath /data/colmet.hdf5 --hdf5-complevel 9 ``` ``` # Collector with an Elasticsearch backend: colmet-collector -vvv \ --zeromq-bind-uri tcp://192.168.0.1:5556 \ --buffer-size 5000 \ --sample-period 3 \ --elastic-host http://192.168.0.2:9200 \ --elastic-index-prefix colmet_dahu_ 2>>/var/log/colmet_err.log >> /var/log/colmet.log ``` You will see the number of counters retrieved in the debug log. For more information, please refer to the help of theses scripts (`--help`) ### Notes about backends Some input backends may need external libraries that need to be previously compiled and installed: ``` # For the perfhw backend: cd colmet/node/backends/lib_perf_hw/ && make && cp lib_perf_hw.so /usr/local/lib/ # For the rapl backend: cd colmet/node/backends/lib_rapl/ && make && cp lib_rapl.so /usr/local/lib/ ``` Here's acomplete colmet-node start-up process, with perfw, rapl and more backends: ``` export LIB_PERFHW_PATH=/usr/local/lib/lib_perf_hw.so export LIB_RAPL_PATH=/applis/site/colmet/lib_rapl.so colmet-node -vvv --zeromq-uri tcp://192.168.0.1:5556 \ --cpuset_rootpath /dev/cpuset/oar \ --enable-infiniband --omnipath \ --enable-lustre \ --enable-perfhw --perfhw-list instructions cache_misses page_faults cpu_cycles cache_references \ --enable-RAPL \ --enable-jobproc \ --enable-ipmipower >> /var/log/colmet.log 2>&1 ``` #### RAPL - Running Average Power Limit (Intel) RAPL is a feature on recent Intel processors that makes possible to know the power consumption of cpu in realtime. Usage : start colmet-node with option `--enable-RAPL` A file named RAPL_mapping.[timestamp].csv is created in the working directory. It established the correspondence between `counter_1`, `counter_2`, etc from collected data and the actual name of the metric as well as the package and zone (core / uncore / dram) of the processor the metric refers to. If a given counter is not supported by harware the metric name will be "`counter_not_supported_by_hardware`" and `0` values will appear in the collected data; `-1` values in the collected data means there is no counter mapped to the column. #### Perfhw This provides metrics collected using interface [perf_event_open](http://man7.org/linux/man-pages/man2/perf_event_open.2.html). Usage : start colmet-node with option `--enable-perfhw` Optionnaly choose the metrics you want (max 5 metrics) using options `--perfhw-list` followed by space-separated list of the metrics/ Example : `--enable-perfhw --perfhw-list instructions cpu_cycles cache_misses` A file named perfhw_mapping.[timestamp].csv is created in the working directory. It establishes the correspondence between `counter_1`, `counter_2`, etc from collected data and the actual name of the metric. Available metrics (refers to perf_event_open documentation for signification) : ``` cpu_cycles instructions cache_references cache_misses branch_instructions branch_misses bus_cycles ref_cpu_cycles cache_l1d cache_ll cache_dtlb cache_itlb cache_bpu cache_node cache_op_read cache_op_prefetch cache_result_access cpu_clock task_clock page_faults context_switches cpu_migrations page_faults_min page_faults_maj alignment_faults emulation_faults dummy bpf_output ``` #### Temperature This backend gets temperatures from `/sys/class/thermal/thermal_zone*/temp` Usage : start colmet-node with option `--enable-temperature` A file named temperature_mapping.[timestamp].csv is created in the working directory. It establishes the correspondence between `counter_1`, `counter_2`, etc from collected data and the actual name of the metric. Colmet CHANGELOG ================ Version 0.6.8 ------------- - Added nvidia GPU support Version 0.6.7 ------------- - bugfix: glob import missing into procstats Version 0.6.6 ------------- - Added --no-check-certificates option for elastic backend - Added involved jobs and new metrics into jobprocstats Version 0.6.4 ------------- - Added http auth support for elasticsearch backend Version 0.6.3 ------------- Released on September 4th 2020 - Bugfixes into lustrestats and jobprocstats backend Version 0.6.2 ------------- Released on September 3rd 2020 - Python package fix Version 0.6.1 ------------- Released on September 3rd 2020 - New input backends: lustre, infiniband, temperature, rapl, perfhw, impipower, jobproc - New ouptut backend: elasticsearch - Example Grafana dashboard for Elasticsearch backend - Added "involved_jobs" value for metrics that are global to a node (job 0) - Bugfix for "dictionnary changed size during iteration" Version 0.5.4 ------------- Released on January 19th 2018 - hdf5 extractor script for OAR RESTFUL API - Added infiniband backend - Added lustre backend - Fixed cpuset_rootpath default always appended Version 0.5.3 ------------- Released on April 29th 2015 - Removed unnecessary lock from the collector to avoid colmet to wait forever - Removed (async) zmq eventloop and added ``--sample-period`` to the collector. - Fixed some bugs about hdf file Version 0.5.2 ------------- Released on Apr 2nd 2015 - Fixed python syntax error Version 0.5.1 ------------- Released on Apr 2nd 2015 - Fixed error about missing ``requirements.txt`` file in the sdist package Version 0.5.0 ------------- Released on Apr 2nd 2015 - Don't run colmet as a daemon anymore - Maintained compatibility with zmq 3.x/4.x - Dropped ``--zeromq-swap`` (swap was dropped from zmq 3.x) - Handled zmq name change from HWM to SNDHWM and RCVHWM - Fixed requirements - Dropped python 2.6 support Version 0.4.0 ------------- - Saved metrics in new HDF5 file if colmet is reloaded in order to avoid HDF5 data corruption - Handled HUP signal to reload ``colmet-collector`` - Removed ``hiwater_rss`` and ``hiwater_vm`` collected metrics. Version 0.3.1 ------------- - New metrics ``hiwater_rss`` and ``hiwater_vm`` for taskstats - Worked with pyinotify 0.8 - Added ``--disable-procstats`` option to disable procstats backend. Version 0.3.0 ------------- - Divided colmet package into three parts - colmet-node : Retrieve data from taskstats and procstats and send to collectors with ZeroMQ - colmet-collector : A collector that stores data received by ZeroMQ in a hdf5 file - colmet-common : Common colmet part. - Added some parameters of ZeroMQ backend to prevent a memory overflow - Simplified the command line interface - Dropped rrd backend because it is not yet working - Added ``--buffer-size`` option for collector to define the maximum number of counters that colmet should queue in memory before pushing it to output backend - Handled SIGTERM and SIGINT to terminate colmet properly Version 0.2.0 ------------- - Added options to enable hdf5 compression - Support for multiple job by cgroup path scanning - Used Inotify events for job list update - Don't filter packets if no job_id range was specified, especially with zeromq backend - Waited the cgroup_path folder creation before scanning the list of jobs - Added procstat for node monitoring through fictive job with 0 as identifier - Used absolute time take measure and not delay between measure, to avoid the drift of measure time - Added workaround when a newly cgroup is created without process in it (monitoring is suspended upto one process is launched) Version 0.0.1 ------------- - Conception


نیازمندی

مقدار نام
- tables
- pyinotify
- pyzmq
- requests


نحوه نصب


نصب پکیج whl colmet-0.6.8:

    pip install colmet-0.6.8.whl


نصب پکیج tar.gz colmet-0.6.8:

    pip install colmet-0.6.8.tar.gz