diceware-list
=============
|bdg-build| `sources <https://github.com/ulif/diceware-list>`_ | `issues <https://github.com/ulif/diceware-list/issues>`_
.. |bdg-build| image:: https://travis-ci.org/ulif/diceware-list.svg?branch=master
:target: https://travis-ci.org/ulif/diceware-list
:alt: Build Status
Create and check wordlists for use with `diceware`_.
This is not a `diceware`_ implementation, but a helper to create and check
appropriate wordlists.
Currently, we provide three scripts:
- `diceware-list` create wordlists based on input lists
- `wlflakes` checks existing wordlists for flaws
- `wldownload` downloads Android wordlists
Install
--------
Install latest release from pypi_ ::
(venv) $ pip install diceware-list
or clone repository from github::
$ git clone https://github.com/ulif/diceware-list.git
Please consider using `virtualenv`_ for deployment.
In an active `virtualenv` you can install an executable script of
`diceware-list` running::
(venv) $ python setup.py install
(venv) $ diceware-list --help
usage: diceware-list [-h] [-l LENGTH] [-n] [--ascii] [-d SIDES] [-k]
[--use-416] [-p {none,short,long}] [-v] [--version]
DICTFILE [DICTFILE ...]
Create Wordlists: `diceware-list`
---------------------------------
The `diceware-list` script creates new lists out of given ones::
$ diceware-list -n -l 7776 /usr/share/dict/words
11111 a
11112 a's
...
12353 capt
12354 cara
12355 carl
...
66663 zoos
66664 éclat
66665 élan
66666 épée
The main target of `diceware-list` is to provide "good"
wordlists. Wordlists are considered "good" if they
- contain enough terms for use with a certain diceware application
(for instance 6^6 = 7776 terms if used with six six-sided dice)
- contain terms as short as possible (to reduce typing)
- (optionally) contain no words with non-ASCII chars (to enable use
with non-localized keyboards)
- (optionally) are a `prefix code`_, i.e. no complete word in the list is
prefix of another word in the list.
- contain no offending terms
The last topic is hard to solve technically (hints welcome!), but
`diceware-list` can help to follow the other design rules.
The wordlists generated by `diceware-list` are not meant to be kept
secret. You might put them on the internet, publish on facebook or
print them in the New York Times. Instead the security of the
`diceware`_ technique relies on the entropy or (in this case)
"randomness" of your dice, computer, etc.
In other words: Your passphrases will not be safe because of hiding
your wordlist. They will be safe because there are so many possible
combinations of words you can pick from your wordlist. That means:
longer lists are more secure than shorter ones (if really used to full
extent by your source of randomness with `diceware`_), but hidden
lists are *not* more secure than public ones.
Usage
-----
First, you need a file with words as "dictionary". On typical Debian
systems such files can be found in ``/usr/share/dicts/``.
This file can then be fed to `diceware-list` to create a wordlist
suitable for use with diceware.::
$ diceware-list /usr/share/dict/words
!
!!
!!!
...
alan
alana
alar
...
zzz
zzzz
By default all input words are filtered and output. Using the `-l` option you
can request a certain length of the output wordlist. If an input list provides
more terms than needed, we will pick a subset. If there are not enough terms in
the input list, an error is raised.
With `-n` you can tell `diceware-list` to put numbers into each line,
representing dice throws [#]_ ::
$ diceware_list -n -l 7776 /usr/share/dict/words
11111 !
11112 !!
...
12353 alan
12354 alana
12355 alar
...
66665 zzz
66666 zzzz
If you create a wordlist for use with non-standard dice, for instance for
10-sided dice, then you can tell with `-d` like this::
$ diceware_list -n -d 10 -l 10000 /usr/share/dict/words
1-1-1-1 aol
1-1-1-2 aachen
1-1-1-3 aaron
...
10-10-10-8 zoomed
10-10-10-9 zooms
10-10-10-10 zoos
The `--ascii` option filters terms out, that contain non-ASCII
characters. This can help in generating non-english word lists that
are usable with regular english keyboards.
The verbose option `--verbose` can be given multiple times to increase
verbosity.
See `--help` for other options.
`diceware-list` follows loosely the recommendations given on
http://diceware.com/ by Mr. Reinhold.
Check wordlists: wlflakes
-------------------------
Find flakes in wordlists.
::
$ wlflakes mywordlist.txt
No output means: no problems detected.
We can look for prefix flakes. I.e., we check, whether any line in the given
file is the beginning of any other line.
::
$ cat wordlist.txt
air
port
airport
$ wlflakes wordlist.txt
wordlist.txt:3: E1 "air" from line 1 is a prefix of "airport"
Double entries are also shown:
::
$ cat wordlist.txt
air
port
air
$ wlflakes wordlist.txt
wordlist.txt:1: E1 "air" from line 1 is a prefix of "air"
wordlist.txt:1: E2 "air" appears multiple times
More checks offered by `wlflakes`:
Warnings:
- show terms containing non-ASCII chars
- too short list entries (that are easer to bruteforce than to guess)
`wlflakes` supports also ``--help`` or ``-h`` to list all options supported.
Handle Android wordlists: wldownload
------------------------------------
Android wordlists are a nice source for wordlists. They can be downloaded from
public repositories::
$ wldownload --raw -v
Starting download of Android wordlist file.
Fetching wordlist from https://android.googlesource.com/platform/pack...
Done.
`wldownload` downloads these lists and helps to transform them into lists
usable for diceware. Be aware, that terms from lists are output on stdout by
default (and Android wordlists contain easily more than 100,000 terms)::
$ wldownload > mylist
$ cat mylist
the
to
...
yt
yuk
Terms are output on stdout by default (use shell redirects or ``--outfile`` to
change that behaviour).
You can request non-english wordlists using ``--lang`` or ``-l`` with a
language code like ``cs`` or ``de``. Use ``--lang-codes`` to list all supported
language codes.
The ``--no-offensive`` flag suppresses terms marked as possibly offensive.
Testing
-------
In a clone of the sources you can run tests like this::
(venv) $ python setup.py test
This command will download all required packages, especially
`py.test`_.
You can also install `py.test`_ manually with `pip`_::
(venv)$ pip install pytest
(venv)$ pip install -e .
and afterwards run tests like so::
(venv)$ py.test
If you also install `tox`::
(venv)$ pip install tox
then you can run all tests for all supported platforms at once::
(venv)$ tox
Coverage
--------
To get a coverage report, you can use the respective `tox` target::
(venv)$ tox -e cov
Or you use the common `coverage` tool::
(venv)$ pip install coverage
(venv)$ coverage run setup.py test
(venv)$ coverage report --include="diceware_list.py,libwordlist.py"
.. [#] The wordlist length in this case should be
``(number-of-sides-per-dice)`` powered to
``(number-of-dicethrows)``, for instance 6**5 = 7776 for five
six-sided dice or a single six-sided dice thrown five times.
.. _diceware: http://diceware.com/
.. _pip: https://pip.pypa.io/en/latest/
.. _`prefix code`: https://en.wikipedia.org/wiki/Prefix_code
.. _py.test: https://pytest.org/
.. _pypi: https://pypi.python.org/
.. _virtualenv: https://virtualenv.pypa.io/
Changes
*******
2.1.dev0 (unreleased)
=====================
- Support also Python 3.6, 3.7 and `pypy3`.
- `diceware-list` allows to limit the set set of allowed chars (``-c`` or
``--chars``).
- `diceware-list` now allows uppercase chars in terms on request (``-u`` or
``--allow-uppercase``).
- `wlflakes` checks now for non ASCII chars in lists and for terms, that are
too short and therefore easy to bruteforce.
- Added functions to compute entropy of wordlists and their alphabet chars.
- Fixed #4: terms differing in only upper/lower case, led to double entries in
result list.
2.0 (2018-01-23)
================
- Add new `wldownload` command. This is a tool for handling Android wordlists
(download, uncompress, parse).
- Add new `wlflakes` command. This is a tool for checking existing
wordlists for consistency, cryptoflakes, etc.
- The `diceware-list` option `-l` contains no default any more. If the option
is not set, all suitable terms are output.
1.0 (2017-02-09)
================
- The ``dicewarekit.txt`` list is not included in generated lists by
default from now on. You can request inclusion with new option
`use-kit`. The old option `no-kit` is not supported any more.
- In numbered output, separate digits by ``-`` to distinguish numbers
with more than one digit. Needed at least when generating wordlists
for dice with more than 9 sides.
- Rename `-s` option to `-d` (as in ``dice-sides``).
- Logging output now registered under name `libwordlist`.
- Added new module `libwordlist` containing the API parts of `diceware-list`.
- New `--version` option.
- New `--prefix` option. If set prefix code is generated, i.e. lists that
contain no item which is prefix of another list item.
- Claim support for Python 3.6.
- Restructure package: all single scripts are now part of a package.
0.3 (2016-07-25)
================
- Install script as `diceware-list` instead of `diceware_list`.
- Allow `--sides` option to support dice that do not have six sides.
0.2 (2016-03-18)
================
- Allow `-v` option multiple times for increased verbosity.
- Pick maximum width terms randomly. Until that change we included all
shorter entries and additionally the (alphabetically) first entries
of maximum width. Now, we pick a random set of these maximum width
entries for the result list.
- Claim support for Python 3.5.
0.1 (2016-02-09)
================
- Initial release.