# `cogex`
`cogex` is a tool for managing extractors for Cognite Data Fusion written in Python. It provides
utilities for initializing a new extractor project and building self-contained executables of Python
based extractors.
## Important note for users running `pyenv`
`pyenv` is a neat tool for managing Python installations.
Since `cogex` uses PyInstaller to build executables, we need Python to be installed with a shared
instance of `libpython`, which `pyenv` does not do by default. To fix this, make sure to add the
`--enable-shared` flag when installing new Python versions with `pyenv`, like so:
```bash
env PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install 3.9.0
```
You can read more about it in the [PyInstaller documentation](https://pyinstaller.readthedocs.io/en/stable/development/venv.html#pyenv-and-pyinstaller)
## Overview of features
### Start a new extractor project
To start a new extractor project, move to the desired directory and run
```bash
cogex init
```
You will first be prompted for some information, before `cogex` will initialize a new project.
### Add dependencies
Extractor projects initiated with `cogex` will use `poetry` for managing dependencies. Running
`cogex init` will automatically install the Cognite SDK and extractor-utils framework, but if your
extractor needs any other dependency, simply add them using `poetry`, like so:
```bash
poetry add requests
```
### Type checking and code style
It is recommended that you run code checkers on your extractor, in particular:
* `black` is an opinionated code style checker that will enforce a consistent code style throughout
your project. This is useful to avoid unecessary changes and minimizing PR diffs.
* `isort` is a tool that sorts your imports, also contributing to a consistent code style and
minimal PR diffs.
* `mypy` is a static type checker for Python which ensures that you are not making any type errors
in your code that would go unnoticed before suddently breaking your extractor in production.
`cogex` will install all of these, and automatically run them on every commit. If you for some
reason need to perform a commit despite one of these failing, you can run `git commit --no-verify`,
although this is not recommended.
### Build and package an extractor project
It is not always an option to rely on a Python installation at the machine your extractor will be
deployed at. For those scenarios it is useful to package the extractor, including its dependencies
and the Python runtime, into a single self-contained executable. To do this, run
```bash
cogex build
```
This will create a new executable (for the operating system you ran `cogex build` from) in the
`dist` directory.
### Creating a new version of your extractor
To keep track of which version of the code base is running at a given deployment it is very useful
to version your extractor. When releasing a new version, run
```bash
poetry version [patch/minor/major]
```
To automatically bump the corresponding version number. Note that this only updates the version
number in `pyproject.toml`. When running `cogex build` this new version number will be propagated
through the rest of the code base.
Any extractor project should follow semantic versioning, which means you should bump
* `patch` for any minor bug fixes or improvements
* `minor` for new features or bigger improvements that __doesn't__ break compatability
* `major` for new feature or improvements that breaks compatability with previous versions, in
other words for those scenarios where the new version is not a drop-in replacement for an old
version. For example:
- When adding a new required config field
- When removing a config field
- When changing defaults in a way that could break existing deployments