Django Elastic Migrations
=========================
`django-elastic-migrations`_ is a Django app for creating, indexing and changing schemas of Elasticsearch indexes.
.. image:: https://travis-ci.com/HBS-HBX/django-elastic-migrations.svg?branch=master
:target: https://travis-ci.com/HBS-HBX/django-elastic-migrations
:alt: Build Status
.. image:: https://codecov.io/gh/HBS-HBX/django-elastic-migrations/branch/master/graph/badge.svg
:target: https://codecov.io/gh/HBS-HBX/django-elastic-migrations
:alt: codecov
.. _django-elastic-migrations: https://pypi.org/project/django-elastic-migrations/
Overview
--------
Elastic has given us basic python tools for working with its search indexes:
* `elasticsearch-py`_, a python interface to elasticsearch's REST API
* `elasticsearch-dsl-py`_, a Django-esque way of declaring Elasticsearch schemas,
built upon `elasticsearch-py`_
Django Elastic Migrations adapts these tools into a Django app which also:
* Provides Django management commands for ``list``\ ing indexes, as well as performing
``create``, ``update``, ``activate`` and ``drop`` actions on them
* Implements concurrent bulk indexing powered by python ``multiprocessing``
* Gives Django test hooks for Elasticsearch
* Records a history of all actions that change Elasticsearch indexes
* Supports AWS Elasticsearch 6.0, 6.1 (6.2 TBD; see `#3 support elasticsearch-dsl 6.2`_)
* Enables having two or more servers share the same Elasticsearch cluster
.. _elasticsearch-py: https://github.com/elastic/elasticsearch-py
.. _elasticsearch-dsl-py: https://github.com/elastic/elasticsearch-dsl-py
.. _#3 support elasticsearch-dsl 6.2: https://github.com/HBS-HBX/django-elastic-migrations/issues/3
Models
^^^^^^
Django Elastic Migrations provides comes with three Django models:
**Index**, **IndexVersion**, and **IndexAction**:
*
**Index** - a logical reference to an Elasticsearch index.
Each ``Index`` points to multiple ``IndexVersions``, each of which contains
a snapshot of that ``Index`` schema at a particular time. Each ``Index`` has an
*active* ``IndexVersion`` to which all actions are directed.
*
**IndexVersion** - a snapshot of an Elasticsearch ``Index`` schema at a particular
point in time. The Elasticsearch index name is the name of the *Index* plus the
primary key id of the ``IndexVersion`` model, e.g. ``movies-1``. When the schema is
changed, a new ``IndexVersion`` is added with name ``movies-2``, etc.
*
**IndexAction** - a record of a change that impacts an ``Index``, such as updating
the index or changing which ``IndexVersion`` is active in an ``Index``.
Management Commands
^^^^^^^^^^^^^^^^^^^
Use ``./manage.py es --help`` to see the list of all of these commands.
Read Only Commands
~~~~~~~~~~~~~~~~~~
* ``./manage.py es_list``
* help: For each *Index*\ , list activation status and doc
count for each of its *IndexVersions*
* usage: ``./manage.py es_list``
Action Commands
~~~~~~~~~~~~~~~
These management commands add an Action record in the database,
so that the history of each *Index* is recorded.
* ``./manage.py es_create`` - create a new index.
* ``./manage.py es_activate`` - *activate* a new ``IndexVersion``. all
updates and reads for that ``Index`` by will then go to that version.
* ``./manage.py es_update`` - update the documents in the index.
* ``./manage.py es_clear`` - remove the documents from an index.
* ``./manage.py es_drop`` - drop an index.
* ``./manage.py es_dangerous_reset`` - erase elasticsearch and reset the
Django Elastic Migrations models.
For each of these, use ``--help`` to see the details.
Usage
^^^^^
Installation
~~~~~~~~~~~~
#. ``pip install django-elastic-migrations``; see `django-elastic-migrations`_ on PyPI
#. Put a reference to this package in your ``requirements.txt``
#. Ensure that a valid ``elasticsearch-dsl-py`` version is accessible, and configure
the path to your configured Elasticsearch singleton client in your django settings:
``DJANGO_ELASTIC_MIGRATIONS_ES_CLIENT = "tests.es_config.ES_CLIENT"``.
There should only be one ``ES_CLIENT`` instantiated in your application.
#. Add ``django_elastic_migrations`` to ``INSTALLED_APPS`` in your Django
settings file
#. Add the following information to your Django settings file:
::
DJANGO_ELASTIC_MIGRATIONS_ES_CLIENT = "path.to.your.singleton.ES_CLIENT"
# optional, any unique number for your releases to associate with indexes
DJANGO_ELASTIC_MIGRATIONS_GET_CODEBASE_ID = subprocess.check_output(['git', 'describe', "--tags"]).strip()
# optional, can be used to have multiple servers share the same
# elasticsearch instance without conflicting
DJANGO_ELASTIC_MIGRATIONS_ENVIRONMENT_PREFIX = "qa1_"
#. Create the ``django_elastic_migrations`` tables by running ``./manage.py migrate``
#. Create an ``DEMIndex``:
::
from django_elastic_migrations.indexes import DEMIndex, DEMDocType
from .models import Movie
from elasticsearch_dsl import Text
MoviesIndex = DEMIndex('movies')
@MoviesIndex.doc_type
class MovieSearchDoc(DEMDocType):
text = TEXT_COMPLEX_ENGLISH_NGRAM_METAPHONE
@classmethod
def get_queryset(self, last_updated_datetime=None):
"""
return a queryset or a sliceable list of items to pass to
get_reindex_iterator
"""
qs = Movie.objects.all()
if last_updated_datetime:
qs.filter(last_modified__gt=last_updated_datetime)
return qs
@classmethod
def get_reindex_iterator(self, queryset):
return [
MovieSearchDoc(
text="a little sample text").to_dict(
include_meta=True) for g in queryset]
#. Add your new index to DJANGO_ELASTIC_MIGRATIONS_INDEXES in settings/common.py
#. Run ``./manage.py es_list`` to see the index as available:
::
./manage.py es_list
Available Index Definitions:
+----------------------+-------------------------------------+---------+--------+-------+-----------+
| Index Base Name | Index Version Name | Created | Active | Docs | Tag |
+======================+=====================================+=========+========+=======+===========+
| movies | | 0 | 0 | 0 | Current |
| | | | | | (not |
| | | | | | created) |
+----------------------+-------------------------------------+---------+--------+-------+-----------+
Reminder: an index version name looks like 'my_index-4', and its base index name
looks like 'my_index'. Most Django Elastic Migrations management commands
take the base name (in which case the activated version is used)
or the specific index version name.
#. Create the ``movies`` index in elasticsearch with ``./manage.py es_create movies``:
::
$> ./manage.py es_create movies
The doc type for index 'movies' changed; created a new index version
'movies-1' in elasticsearch.
$> ./manage.py es_list
Available Index Definitions:
+----------------------+-------------------------------------+---------+--------+-------+-----------+
| Index Base Name | Index Version Name | Created | Active | Docs | Tag |
+======================+=====================================+=========+========+=======+===========+
| movies | movies-1 | 1 | 0 | 0 | 07.11.005 |
| | | | | | -93-gd101 |
| | | | | | a1f |
+----------------------+-------------------------------------+---------+--------+-------+-----------+
Reminder: an index version name looks like 'my_index-4', and its base index name
looks like 'my_index'. Most Django Elastic Migrations management commands
take the base name (in which case the activated version is used)
or the specific index version name.
#. Activate the ``movies-1`` index version, so all updates and reads go to it.
::
./manage.py es_activate movies
For index 'movies', activating 'movies-1' because you said so.
#. Assuming you have implemented ``get_reindex_iterator``, you can call
``./manage.py es_update`` to update the index.
::
$> ./manage.py es_update movies
Handling update of index 'movies' using its active index version 'movies-1'
Checking the last time update was called:
- index version: movies-1
- update date: never
Getting Reindex Iterator...
Completed with indexing movies-1
$> ./manage.py es_list
Available Index Definitions:
+----------------------+-------------------------------------+---------+--------+-------+-----------+
| Index Base Name | Index Version Name | Created | Active | Docs | Tag |
+======================+=====================================+=========+========+=======+===========+
| movies | movies-1 | 1 | 1 | 3 | 07.11.005 |
| | | | | | -93-gd101 |
| | | | | | a1f |
+----------------------+-------------------------------------+---------+--------+-------+-----------+
Deployment
^^^^^^^^^^
* Creating and updating a new index schema can happen before you deploy.
For example, if your app servers are running with the ``movies-1`` index activated, and you
have a new version of the schema you'd like to pre-index, then log into another
server and run ``./manage.py es_create movies`` followed by
``./manage.py es_update movies --newer``. This will update documents in all ``movies``
indexes that are newer than the active one.
* After deploying, you can run
``./manage.py es_activate movies`` to activate the latest version. Be sure to cycle your
gunicorn workers to ensure the change is caught by your app servers.
* During deployment, if ``get_reindex_iterator`` is implemented in such a way as to respond
to the datetime of the last reindex date, then you can call
``./manage.py es_update movies --resume``, and it will index *only those documents that have
changed since the last reindexing*. This way you can do most of the indexing ahead of time,
and only reindex a portion at the time of the deployment.
Django Testing
^^^^^^^^^^^^^^
#. (optional) update ``DJANGO_ELASTIC_MIGRATIONS_ENVIRONMENT_PREFIX`` in
your Django settings. The default test prefix is ``test_``. Every
test will create its own indexes.
::
if 'test' in sys.argv:
DJANGO_ELASTIC_MIGRATIONS_ENVIRONMENT_PREFIX = 'test_'
#. Override TestCase - ``test_utilities.py``
.. code-block::
from django_elastic_migrations import DEMIndexManager
class MyTestCase(TestCase):
def _pre_setup(self):
DEMIndexManager.test_pre_setup()
super(MyTestCase, self)._pre_setup()
def _post_teardown(self):
DEMIndexManager.test_post_teardown()
super(MyTestCase, self)._post_teardown()
Excluding from Django's ``dumpdata`` command
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When calling `django's dumpdata command <https://docs.djangoproject.com/en/2.0/ref/django-admin/#dumpdata>`_\,
you likely will want to exclude the database tables used in this app:
::
from django.core.management import call_command
params = {
'database': 'default',
'exclude': [
# we don't want to include django_elastic_migrations in dumpdata,
# because it's environment specific
'django_elastic_migrations.index',
'django_elastic_migrations.indexversion',
'django_elastic_migrations.indexaction'
],
'indent': 3,
'output': 'path/to/my/file.json'
}
call_command('dumpdata', **params)
An example of this is included with the
`moviegen management command`_.
.. _moviegen management command: https://github.com/HBS-HBX/django-elastic-migrations/blob/master/tests/management/commands/moviegen.py
Tuning Bulk Indexing Parameters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
By default, ``/.manage.py es_update`` will divide the result of
``DEMDocType.get_queryset()`` into batches of size ``DocType.BATCH_SIZE``.
Override this number to change the batch size.
There are many configurable paramters to Elasticsearch's `bulk updater <https://elasticsearch-py.readthedocs.io/en/master/helpers.html?highlight=bulk#elasticsearch.helpers.streaming_bulk>`_.
To provide a custom value, override ``DEMDocType.get_bulk_indexing_kwargs()``
and return the kwargs you would like to customize.
Development
-----------
This project uses ``make`` to manage the build process. Type ``make help``
to see the available ``make`` targets.
Elasticsearch Docker Compose
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``docker-compose -f local.yml up``
`See docs/docker_setup for more info <./docs/docker_setup.rst>`_
Requirements
^^^^^^^^^^^^
This project uses `pip-tools`_. The ``requirements.txt`` files are generated
and pinned to latest versions with ``make upgrade``:
* run ``make requirements`` to run the pip install.
* run ``make upgrade`` to upgrade the dependencies of the requirements to the latest
versions. This process also excludes ``django`` and ``elasticsearch-dsl``
from the ``requirements/test.txt`` so they can be injected with different
versions by tox during matrix testing.
.. _pip-tools: https://github.com/jazzband/pip-tools
Populating Local ``tests_movies`` Database Table With Data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
It may be helpful for you to populate a local database with Movies test
data to experiment with using ``django-elastic-migrations``. First,
migrate the database:
``./manage.py migrate --run-syncdb --settings=test_settings``
Next, load the basic fixtures:
``./manage.py loaddata tests/100films.json``
You may wish to add more movies to the database. A management command
has been created for this purpose. Get a `Free OMDB API key here <https://www.omdbapi.com/apikey.aspx>`_\ ,
then run a query like this (replace ``MYAPIKEY`` with yours):
.. code-block::
$> ./manage.py moviegen --title="Inception" --api-key="MYAPIKEY"
{'actors': 'Leonardo DiCaprio, Joseph Gordon-Levitt, Ellen Page, Tom Hardy',
'awards': 'Won 4 Oscars. Another 152 wins & 204 nominations.',
'boxoffice': '$292,568,851',
'country': 'USA, UK',
'director': 'Christopher Nolan',
'dvd': '07 Dec 2010',
'genre': 'Action, Adventure, Sci-Fi',
'imdbid': 'tt1375666',
'imdbrating': '8.8',
'imdbvotes': '1,721,888',
'language': 'English, Japanese, French',
'metascore': '74',
'plot': 'A thief, who steals corporate secrets through the use of '
'dream-sharing technology, is given the inverse task of planting an '
'idea into the mind of a CEO.',
'poster': 'https://m.media-amazon.com/images/M/MV5BMjAxMzY3NjcxNF5BMl5BanBnXkFtZTcwNTI5OTM0Mw@@._V1_SX300.jpg',
'production': 'Warner Bros. Pictures',
'rated': 'PG-13',
'ratings': [{'Source': 'Internet Movie Database', 'Value': '8.8/10'},
{'Source': 'Rotten Tomatoes', 'Value': '86%'},
{'Source': 'Metacritic', 'Value': '74/100'}],
'released': '16 Jul 2010',
'response': 'True',
'runtime': 148,
'title': 'Inception',
'type': 'movie',
'website': 'http://inceptionmovie.warnerbros.com/',
'writer': 'Christopher Nolan',
'year': '2010'}
To save the movie to the database, use the ``--save`` flag. Also useful is
the ``--noprint`` option, to suppress json. Also, if you add
``OMDB_API_KEY=MYAPIKEY`` to your environment variables, you don't have
to specify it each time:
.. code-block::
$ ./manage.py moviegen --title "Closer" --noprint --save
Saved 1 new movie(s) to the database: Closer
Now that it's been saved to the database, you may want to create a fixture,
so you can get back to this state in the future.
.. code-block::
$ ./manage.py moviegen --makefixture=tests/myfixture.json
dumping fixture data to tests/myfixture.json ...
[...........................................................................]
Later, you can restore this database with the regular ``loaddata`` command:
.. code-block::
$ ./manage.py loaddata tests/myfixture.json
Installed 101 object(s) from 1 fixture(s)
There are already 100 films available using ``loaddata`` as follows:
.. code-block::
$ ./manage.py loaddata tests/100films.json
Running Tests Locally
^^^^^^^^^^^^^^^^^^^^^
Run ``make test``. To run all tests and quality checks locally,
run ``make test-all``.
To just run linting, ``make quality``. Please note that if any of the
linters return a nonzero code, it will give an ``InvocationError`` error
at the end. See `tox's documentation for InvocationError`_ for more information.
We use ``edx_lint`` to compile ``pylintrc``. To update the rules,
change ``pylintrc_tweaks`` and run ``make pylintrc``.
.. _tox's documentation for InvocationError: https://tox.readthedocs.io/en/latest/example/general.html#understanding-invocationerror-exit-codes
Cutting a New Version
^^^^^^^^^^^^^^^^^^^^^
* optional: run ``make update`` to update dependencies
* bump version in `django_elastic_migrations/__init__.py <https://github.com/HBS-HBX/django-elastic-migrations/blob/master/django_elastic_migrations/__init__.py#L13>`_.
* update `CHANGELOG.rst <https://github.com/HBS-HBX/django-elastic-migrations/blob/master/CHANGELOG.rst>`_.
* ``make clean``
* ``python3 setup.py sdist bdist_wheel``
* ``twine check dist/django-elastic-migrations-*.tar.gz`` to see if there are any syntax mistakes before tagging
* submit PR bumping the version
* ensure test matrix is passing on travis and merge PR
* pull changes to master
* ``make clean``
* ``python3 setup.py sdist bdist_wheel``
* ``twine check dist/django-elastic-migrations-*.tar.gz`` to see if there are any syntax mistakes before tagging
* ``twine upload -r testpypi dist/django-elastic-migrations-*.tar.gz``
* `Check it at https://test.pypi.org/project/django-elastic-migrations/ <https://test.pypi.org/project/django-elastic-migrations/>`_
* ``python3 setup.py tag`` to tag the new version
* ``twine upload -r pypi dist/django-elastic-migrations-*.tar.gz``
* `Update new release at https://github.com/HBS-HBX/django-elastic-migrations/releases <https://github.com/HBS-HBX/django-elastic-migrations/releases/>`_
Changelog
---------
0.8.2 (2018-11-20)
^^^^^^^^^^^^^^^^^^
* fix `#59 twine check error in 0.8.1 <https://github.com/HBS-HBX/django-elastic-migrations/issues/59>`_
0.8.1 (2018-11-19)
^^^^^^^^^^^^^^^^^^
* fix `#50 add test coverage for es_list <https://github.com/HBS-HBX/django-elastic-migrations/issues/50>`_
* fix `#58 ignore indexes with a dot in name in es_list --es-only and es_dangerous_reset <https://github.com/HBS-HBX/django-elastic-migrations/issues/58>`_
0.8.0 (2018-11-13)
^^^^^^^^^^^^^^^^^^
* fix `#6 support Django 2 <https://github.com/HBS-HBX/django-elastic-migrations/issues/6>`_
* fix `#43 remove es_deactivate <https://github.com/HBS-HBX/django-elastic-migrations/issues/43>`_
* fix `#44 add django 1.10 and 1.11 to test matrix <https://github.com/HBS-HBX/django-elastic-migrations/issues/44>`_
* fix `#45 remove support for python 2 <https://github.com/HBS-HBX/django-elastic-migrations/issues/45>`_
* In practice, Python 2 may work, but it is removed from the test matrix and won't be updated
0.7.8 (2018-11-13)
^^^^^^^^^^^^^^^^^^
* fix `#7 Convert Readme to rst for pypi <https://github.com/HBS-HBX/django-elastic-migrations/issues/7>`_
* first release on PyPI
* update project dependencies
0.7.7 (2018-09-17)
^^^^^^^^^^^^^^^^^^
* fix `#41 stack trace when indexing in py3 <https://github.com/HBS-HBX/django-elastic-migrations/issues/41>`_
0.7.6 (2018-09-11)
^^^^^^^^^^^^^^^^^^
* fix `#36 es_update --start flag broken <https://github.com/HBS-HBX/django-elastic-migrations/issues/39>`_
0.7.5 (2018-08-20)
^^^^^^^^^^^^^^^^^^
* fix `#35 open multiprocessing log in context handler <https://github.com/HBS-HBX/django-elastic-migrations/issues/35>`_
0.7.4 (2018-08-15)
^^^^^^^^^^^^^^^^^^
* fix `#33 error when nothing to resume using --resume <https://github.com/HBS-HBX/django-elastic-migrations/issues/33>`_
0.7.3 (2018-08-14)
^^^^^^^^^^^^^^^^^^
* fix #31 es_update movies --newer --workers does not store worker information
0.7.2 (2018-08-13)
^^^^^^^^^^^^^^^^^^
* fix #21 wrong batch update total using multiprocessing in 0.7.1
* fix #23 KeyError _index_version_name in es_update --newer
* address #25 use pks for queryset inside workers #29
0.7.1 (2018-08-07)
^^^^^^^^^^^^^^^^^^
* fixed gh #8 es_dangerous_reset --es-only to sync database to ES
* fixed gh #17 make es_dangerous_reset remove dem models
* improved test coverage
* added tests for ``es_create --es-only``
* added ``IndexVersion.hard_delete()`` (not called by default)
* added ``hard_delete`` flag to ``DropIndexAction``
* added ``hard_delete`` flag to ``DEMIndexManager.test_post_teardown()``
* updated ``__str__()`` of ``IndexAction`` to be more descriptive
0.7.0 (2018-08-06)
^^^^^^^^^^^^^^^^^^
* fixed gh #5: "add python 3 support and tests"
0.6.1 (2018-08-03)
^^^^^^^^^^^^^^^^^^
* fixed gh #9: "using elasticsearch-dsl 6.1, TypeError in DEMIndex.save"
0.6.0 (2018-08-01)
^^^^^^^^^^^^^^^^^^
* Added test structure for py2 - GH #2
* Renamed default log handler from ``django-elastic-migrations`` to ``django_elastic_migrations``
0.5.3 (2018-07-23)
^^^^^^^^^^^^^^^^^^
* First basic release