# Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties

[](./LICENSE)

[](#pytorch)
[](#tensorflow-1.x)
[](#tensorflow-2.x)
Expectigrad is a first-order stochastic optimization method that fixes the
[known divergence issue](https://arxiv.org/abs/1904.09237)
of Adam, RMSProp, and related adaptive methods while offering better performance on
well-known deep learning benchmarks.
Expectigrad introduces two innovations to adaptive gradient methods:
- **Arithmetic RMS:** Computes the true RMS instead of an exponential moving average (EMA).
This makes Expectigrad more robust to divergence and, in theory, less susceptible to
gradient noise.
- **Outer momentum:** Applies momentum _after_ adapting the step sizes, not
before.
This reduces bias in the updates by preserving the
[superposition property](https://en.wikipedia.org/wiki/Superposition_principle).
See [the paper](https://arxiv.org/abs/2010.01356) for more details.
Pytorch, TensorFlow 1.x, and TensorFlow 2.x are all supported.
See [installation](#installation) and [usage](#usage) below to get started.
### Pseudocode
>  <br/>
>  <br/>
>  <br/>
>  <br/>
>  <br/>
>  <br/>
>  <br/>
>  <br/>
>  <br/>
>  <br/>
>  <br/>
> 
### Citing
If you use this code for published work, please cite [the original paper](https://arxiv.org/abs/2010.01356):
```
@article{daley2020expectigrad,
title={Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties},
author={Daley, Brett and Amato, Christopher},
journal={arXiv preprint arXiv:2010.01356},
year={2020}
}
```
---
## Installation
Use pip to quickly install Expectigrad:
```
pip install expectigrad
```
Or you can clone this repository and install manually:
```
git clone https://github.com/brett-daley/expectigrad.git
cd expectigrad
python setup.py -e .
```
## Usage
Pytorch and both versions of TensorFlow are supported.
Refer to the code snippets below to instantiate the optimizer for your deep learning framework.
### Pytorch
```python
import expectigrad
expectigrad.pytorch.Expectigrad(
params, lr=0.001, beta=0.9, eps=1e-8, sparse_counter=True
)
```
| Args | | |
| --- | :-: | --- |
| params | (`iterable`) | Iterable of parameters to optimize or dicts defining parameter groups. |
| lr | (`float`) | The learning rate, a scale factor applied to each optimizer step. Default: `0.001` |
| beta | (`float`) | The decay rate for Expectigrad's bias-corrected, "outer" momentum. Must be in the interval [0, 1). Default: `0.9` |
| eps | (`float`) | A small constant added to the denominator for numerical stability. Must be greater than 0. Default: `1e-8` |
| sparse_counter | (`bool`) | If True, Expectigrad's counter increments only where the gradient is nonzero. If False, the counter increments unconditionally. Default: `True` |
---
### Tensorflow 1.x
```python
import expectigrad
expectigrad.tensorflow1.ExpectigradOptimizer(
learning_rate=0.001, beta=0.9, epsilon=1e-8, sparse_counter=True,
use_locking=False, name='Expectigrad'
)
```
| Args | | |
| --- | :-: | --- |
| learning_rate | | The learning rate, a scale factor applied to each optimizer step. Can be a float, `tf.keras.optimizers.schedules.LearningRateSchedule`, `Tensor`, or callable that takes no arguments and returns the value to use. Default: `0.001` |
| beta | (`float`) | The decay rate for Expectigrad's bias-corrected, "outer" momentum. Must be in the interval [0, 1). Default: `0.9` |
| epsilon | (`float`) | A small constant added to the denominator for numerical stability. Must be greater than 0. Default: `1e-8` |
| sparse_counter | (`bool`) | If True, Expectigrad's counter increments only where the gradient is nonzero. If False, the counter increments unconditionally. Default: `True` |
| use_locking | (`bool`) | If True, apply use locks to prevent concurrent updates to variables. Default: `False` |
| name | (`str`) | Optional name for the operations created when applying gradients. Default: `'Expectigrad'` |
---
### Tensorflow 2.x
```python
import expectigrad
expectigrad.tensorflow2.Expectigrad(
learning_rate=0.001, beta=0.9, epsilon=1e-8, name='Expectigrad', **kwargs
)
```
| Args | | |
| --- | :-: | --- |
| learning_rate | | The learning rate, a scale factor applied to each optimizer step. Can be a float, `tf.keras.optimizers.schedules.LearningRateSchedule`, `Tensor`, or callable that takes no arguments and returns the value to use. Default: `0.001` |
| beta | (`float`) | The decay rate for Expectigrad's bias-corrected, "outer" momentum. Must be in the interval [0, 1). Default: `0.9` |
| epsilon | (`float`) | A small constant added to the denominator for numerical stability. Must be greater than 0. Default: `1e-8` |
| sparse_counter | (`bool`) | If True, Expectigrad's counter increments only where the gradient is nonzero. If False, the counter increments unconditionally. Default: `True`
| name | (`str`) | Optional name for the operations created when applying gradients. Default: `'Expectigrad'` |
| **kwargs | | Keyword arguments. Allowed to be {`clipnorm`, `clipvalue`, `lr`, `decay`}. `clipnorm` is gradient clipping by norm; `clipvalue` is gradient clipping by value; `decay` is included for backward compatibility to allow time inverse decay of learning rate; `lr` is included for backward compatibility, recommended to use `learning_rate` instead. |
---