Django Elastic Migrations
`django-elastic-migrations`_ is a Django app for creating, indexing and changing schemas of Elasticsearch indexes.
Elastic has given us basic python tools for working with its search indexes:
* `elasticsearch-py`_, a python interface to elasticsearch's REST API
* `elasticsearch-dsl-py`_, a Django-esque way of declaring Elasticsearch schemas,
built upon `elasticsearch-py`_
Django Elastic Migrations adapts these tools into a Django app which also:
* Provides Django management commands for ``list``\ ing indexes, as well as performing
``create``, ``update``, ``activate`` and ``drop`` actions on them
* Implements concurrent bulk indexing powered by python ``multiprocessing``
* Gives Django test hooks for Elasticsearch
* Records a history of all actions that change Elasticsearch indexes
* Supports AWS Elasticsearch 6.0, 6.1 (6.2 TBD; see `#3 support elasticsearch-dsl 6.2`_)
* Enables having two or more servers share the same Elasticsearch cluster
Django Elastic Migrations provides comes with three Django models:
**Index**, **IndexVersion**, and **IndexAction**:
**Index** - a logical reference to an Elasticsearch index.
Each ``Index`` points to multiple ``IndexVersions``, each of which contains
a snapshot of that ``Index`` schema at a particular time. Each ``Index`` has an
*active* ``IndexVersion`` to which all actions are directed.
**IndexVersion** - a snapshot of an Elasticsearch ``Index`` schema at a particular
point in time. The Elasticsearch index name is the name of the *Index* plus the
primary key id of the ``IndexVersion`` model, e.g. ``movies-1``. When the schema is
changed, a new ``IndexVersion`` is added with name ``movies-2``, etc.
**IndexAction** - a record of a change that impacts an ``Index``, such as updating
the index or changing which ``IndexVersion`` is active in an ``Index``.
Management Commands
Use ``./ es --help`` to see the list of all of these commands.
Read Only Commands
* ``./ es_list``
* help: For each *Index*\ , list activation status and doc
count for each of its *IndexVersions*
* usage: ``./ es_list``
Action Commands
These management commands add an Action record in the database,
so that the history of each *Index* is recorded.
* ``./ es_create`` - create a new index.
* ``./ es_activate`` - *activate* a new ``IndexVersion``. all
updates and reads for that ``Index`` by will then go to that version.
* ``./ es_update`` - update the documents in the index.
* ``./ es_clear`` - remove the documents from an index.
* ``./ es_drop`` - drop an index.
* ``./ es_dangerous_reset`` - erase elasticsearch and reset the
Django Elastic Migrations models.
For each of these, use ``--help`` to see the details.
#. ``pip install django-elastic-migrations``; see `django-elastic-migrations`_ on PyPI
#. Put a reference to this package in your ``requirements.txt``
#. Ensure that a valid ``elasticsearch-dsl-py`` version is accessible, and configure
the path to your configured Elasticsearch singleton client in your django settings:
There should only be one ``ES_CLIENT`` instantiated in your application.
#. Add ``django_elastic_migrations`` to ``INSTALLED_APPS`` in your Django
settings file
#. Add the following information to your Django settings file:
# optional, any unique number for your releases to associate with indexes
DJANGO_ELASTIC_MIGRATIONS_GET_CODEBASE_ID = subprocess.check_output(['git', 'describe', "--tags"]).strip()
# optional, can be used to have multiple servers share the same
# elasticsearch instance without conflicting
#. Create the ``django_elastic_migrations`` tables by running ``./ migrate``
#. Create an ``DEMIndex``:
from django_elastic_migrations.indexes import DEMIndex, DEMDocType
from .models import Movie
from elasticsearch_dsl import Text
MoviesIndex = DEMIndex('movies')
class MovieSearchDoc(DEMDocType):
def get_queryset(self, last_updated_datetime=None):
return a queryset or a sliceable list of items to pass to
qs = Movie.objects.all()
if last_updated_datetime:
return qs
def get_reindex_iterator(self, queryset):
return [
text="a little sample text").to_dict(
include_meta=True) for g in queryset]
#. Add your new index to DJANGO_ELASTIC_MIGRATIONS_INDEXES in settings/
#. Run ``./ es_list`` to see the index as available:
./ es_list
Available Index Definitions:
| Index Base Name | Index Version Name | Created | Active | Docs | Tag |
| movies | | 0 | 0 | 0 | Current |
| | | | | | (not |
| | | | | | created) |
Reminder: an index version name looks like 'my_index-4', and its base index name
looks like 'my_index'. Most Django Elastic Migrations management commands
take the base name (in which case the activated version is used)
or the specific index version name.
#. Create the ``movies`` index in elasticsearch with ``./ es_create movies``:
$> ./ es_create movies
The doc type for index 'movies' changed; created a new index version
'movies-1' in elasticsearch.
$> ./ es_list
Available Index Definitions:
| Index Base Name | Index Version Name | Created | Active | Docs | Tag |
| movies | movies-1 | 1 | 0 | 0 | 07.11.005 |
| | | | | | -93-gd101 |
| | | | | | a1f |
Reminder: an index version name looks like 'my_index-4', and its base index name
looks like 'my_index'. Most Django Elastic Migrations management commands
take the base name (in which case the activated version is used)
or the specific index version name.
#. Activate the ``movies-1`` index version, so all updates and reads go to it.
./ es_activate movies
For index 'movies', activating 'movies-1' because you said so.
#. Assuming you have implemented ``get_reindex_iterator``, you can call
``./ es_update`` to update the index.
$> ./ es_update movies
Handling update of index 'movies' using its active index version 'movies-1'
Checking the last time update was called:
- index version: movies-1
- update date: never
Getting Reindex Iterator...
Completed with indexing movies-1
$> ./ es_list
Available Index Definitions:
| Index Base Name | Index Version Name | Created | Active | Docs | Tag |
| movies | movies-1 | 1 | 1 | 3 | 07.11.005 |
| | | | | | -93-gd101 |
| | | | | | a1f |
* Creating and updating a new index schema can happen before you deploy.
For example, if your app servers are running with the ``movies-1`` index activated, and you
have a new version of the schema you'd like to pre-index, then log into another
server and run ``./ es_create movies`` followed by
``./ es_update movies --newer``. This will update documents in all ``movies``
indexes that are newer than the active one.
* After deploying, you can run
``./ es_activate movies`` to activate the latest version. Be sure to cycle your
gunicorn workers to ensure the change is caught by your app servers.
* During deployment, if ``get_reindex_iterator`` is implemented in such a way as to respond
to the datetime of the last reindex date, then you can call
``./ es_update movies --resume``, and it will index *only those documents that have
changed since the last reindexing*. This way you can do most of the indexing ahead of time,
and only reindex a portion at the time of the deployment.
Django Testing
your Django settings. The default test prefix is ``test_``. Every
test will create its own indexes.
if 'test' in sys.argv:
#. Override TestCase - ````
.. code-block::
from django_elastic_migrations import DEMIndexManager
class MyTestCase(TestCase):
def _pre_setup(self):
super(MyTestCase, self)._pre_setup()
def _post_teardown(self):
super(MyTestCase, self)._post_teardown()
Excluding from Django's ``dumpdata`` command
When calling `django's dumpdata command <>`_\,
you likely will want to exclude the database tables used in this app:
from import call_command
params = {
'database': 'default',
'exclude': [
# we don't want to include django_elastic_migrations in dumpdata,
# because it's environment specific
'indent': 3,
'output': 'path/to/my/file.json'
call_command('dumpdata', **params)
An example of this is included with the
`moviegen management command`_.
Tuning Bulk Indexing Parameters
By default, ``/ es_update`` will divide the result of
``DEMDocType.get_queryset()`` into batches of size ``DocType.BATCH_SIZE``.
Override this number to change the batch size.
There are many configurable paramters to Elasticsearch's `bulk updater <>`_.
To provide a custom value, override ``DEMDocType.get_bulk_indexing_kwargs()``
and return the kwargs you would like to customize.
This project uses ``make`` to manage the build process. Type ``make help``
to see the available ``make`` targets.
Elasticsearch Docker Compose
``docker-compose -f local.yml up``
`See docs/docker_setup for more info <./docs/docker_setup.rst>`_
This project uses `pip-tools`_. The ``requirements.txt`` files are generated
and pinned to latest versions with ``make upgrade``:
* run ``make requirements`` to run the pip install.
* run ``make upgrade`` to upgrade the dependencies of the requirements to the latest
versions. This process also excludes ``django`` and ``elasticsearch-dsl``
from the ``requirements/test.txt`` so they can be injected with different
versions by tox during matrix testing.
Populating Local ``tests_movies`` Database Table With Data
It may be helpful for you to populate a local database with Movies test
data to experiment with using ``django-elastic-migrations``. First,
migrate the database:
``./ migrate --run-syncdb --settings=test_settings``
Next, load the basic fixtures:
``./ loaddata tests/100films.json``
You may wish to add more movies to the database. A management command
has been created for this purpose. Get a `Free OMDB API key here <>`_\ ,
then run a query like this (replace ``MYAPIKEY`` with yours):
.. code-block::
$> ./ moviegen --title="Inception" --api-key="MYAPIKEY"
{'actors': 'Leonardo DiCaprio, Joseph Gordon-Levitt, Ellen Page, Tom Hardy',
'awards': 'Won 4 Oscars. Another 152 wins & 204 nominations.',
'boxoffice': '$292,568,851',
'country': 'USA, UK',
'director': 'Christopher Nolan',
'dvd': '07 Dec 2010',
'genre': 'Action, Adventure, Sci-Fi',
'imdbid': 'tt1375666',
'imdbrating': '8.8',
'imdbvotes': '1,721,888',
'language': 'English, Japanese, French',
'metascore': '74',
'plot': 'A thief, who steals corporate secrets through the use of '
'dream-sharing technology, is given the inverse task of planting an '
'idea into the mind of a CEO.',
'poster': '',
'production': 'Warner Bros. Pictures',
'rated': 'PG-13',
'ratings': [{'Source': 'Internet Movie Database', 'Value': '8.8/10'},
{'Source': 'Rotten Tomatoes', 'Value': '86%'},
{'Source': 'Metacritic', 'Value': '74/100'}],
'released': '16 Jul 2010',
'response': 'True',
'runtime': 148,
'title': 'Inception',
'type': 'movie',
'website': '',
'writer': 'Christopher Nolan',
'year': '2010'}
To save the movie to the database, use the ``--save`` flag. Also useful is
the ``--noprint`` option, to suppress json. Also, if you add
``OMDB_API_KEY=MYAPIKEY`` to your environment variables, you don't have
to specify it each time:
.. code-block::
$ ./ moviegen --title "Closer" --noprint --save
Saved 1 new movie(s) to the database: Closer
Now that it's been saved to the database, you may want to create a fixture,
so you can get back to this state in the future.
.. code-block::
$ ./ moviegen --makefixture=tests/myfixture.json
dumping fixture data to tests/myfixture.json ...
Later, you can restore this database with the regular ``loaddata`` command:
.. code-block::
$ ./ loaddata tests/myfixture.json
Installed 101 object(s) from 1 fixture(s)
There are already 100 films available using ``loaddata`` as follows:
.. code-block::
$ ./ loaddata tests/100films.json
Running Tests Locally
Run ``make test``. To run all tests and quality checks locally,
run ``make test-all``.
To just run linting, ``make quality``. Please note that if any of the
linters return a nonzero code, it will give an ``InvocationError`` error
at the end. See `tox's documentation for InvocationError`_ for more information.
We use ``edx_lint`` to compile ``pylintrc``. To update the rules,
change ``pylintrc_tweaks`` and run ``make pylintrc``.
Cutting a New Version
* optional: run ``make update`` to update dependencies
* bump version in `django_elastic_migrations/ <>`_.
* update `CHANGELOG.rst <>`_.
* ``make clean``
* ``python3 sdist bdist_wheel``
* ``twine check dist/django-elastic-migrations-*.tar.gz`` to see if there are any syntax mistakes before tagging
* submit PR bumping the version
* ensure test matrix is passing on travis and merge PR
* pull changes to master
* ``make clean``
* ``python3 sdist bdist_wheel``
* ``twine check dist/django-elastic-migrations-*.tar.gz`` to see if there are any syntax mistakes before tagging
* ``twine upload -r testpypi dist/django-elastic-migrations-*.tar.gz``
* `Check it at <>`_
* ``python3 tag`` to tag the new version
* ``twine upload -r pypi dist/django-elastic-migrations-*.tar.gz``
* `Update new release at <>`_
0.8.2 (2018-11-20)
* fix `#59 twine check error in 0.8.1 <>`_
0.8.1 (2018-11-19)
* fix `#50 add test coverage for es_list <>`_
* fix `#58 ignore indexes with a dot in name in es_list --es-only and es_dangerous_reset <>`_
0.8.0 (2018-11-13)
* fix `#6 support Django 2 <>`_
* fix `#43 remove es_deactivate <>`_
* fix `#44 add django 1.10 and 1.11 to test matrix <>`_
* fix `#45 remove support for python 2 <>`_
* In practice, Python 2 may work, but it is removed from the test matrix and won't be updated
0.7.8 (2018-11-13)
* fix `#7 Convert Readme to rst for pypi <>`_
* first release on PyPI
* update project dependencies
0.7.7 (2018-09-17)
* fix `#41 stack trace when indexing in py3 <>`_
0.7.6 (2018-09-11)
* fix `#36 es_update --start flag broken <>`_
0.7.5 (2018-08-20)
* fix `#35 open multiprocessing log in context handler <>`_
0.7.4 (2018-08-15)
* fix `#33 error when nothing to resume using --resume <>`_
0.7.3 (2018-08-14)
* fix #31 es_update movies --newer --workers does not store worker information
0.7.2 (2018-08-13)
* fix #21 wrong batch update total using multiprocessing in 0.7.1
* fix #23 KeyError _index_version_name in es_update --newer
* address #25 use pks for queryset inside workers #29
0.7.1 (2018-08-07)
* fixed gh #8 es_dangerous_reset --es-only to sync database to ES
* fixed gh #17 make es_dangerous_reset remove dem models
* improved test coverage
* added tests for ``es_create --es-only``
* added ``IndexVersion.hard_delete()`` (not called by default)
* added ``hard_delete`` flag to ``DropIndexAction``
* added ``hard_delete`` flag to ``DEMIndexManager.test_post_teardown()``
* updated ``__str__()`` of ``IndexAction`` to be more descriptive
0.7.0 (2018-08-06)
* fixed gh #5: "add python 3 support and tests"
0.6.1 (2018-08-03)
* fixed gh #9: "using elasticsearch-dsl 6.1, TypeError in"
0.6.0 (2018-08-01)
* Added test structure for py2 - GH #2
* Renamed default log handler from ``django-elastic-migrations`` to ``django_elastic_migrations``
0.5.3 (2018-07-23)
* First basic release