# AMOEBA-6422
## Environment Setup
1. Install dependencies
- Install postgresql-12 (https://www.postgresql.org/download/)
- Download demo data from [dropbox](https://www.dropbox.com/s/jw5719fycsxq5n6/data.zip?dl=0) and unzip it to `data` folder
- Start postgresql server with data folder: `/usr/lib/postgresql/12/bin/pg_ctl -D ./data start`
- If you install postgresql-12 from apt, stop it via `sudo systemctl stop postgresql.service` and give permission to the lock file via `sudo chmod o+w /var/run/postgresql`
- Install dependencies
- `sudo apt-get install python3-dev libmysqlclient-dev` (if not Debian/Ubuntu, see https://github.com/PyMySQL/mysqlclient#install for more details)
- `pip install -r requirements.txt` for python dependencies
2. Setup environment
- Installation of `direnv` is needed. And run `eval "$(direnv hook bash)"` to setup the environment (replace `bash` with `zsh` or any shell you're using).
- Build the `MUTATOR` of `AMOEBA` (only necessary if you make changes to the `MUTATOR`)
```
cd /workspace/calcite-fuzzing && ./gradlew build -x core:checkstyleMain -x test -x core:checkstyleTest -x core:forbiddenApisTest -x core:autostyleJavaCheck
```
## Quick Start
`AMOEBA` is configurable, a launch command template looks like the following:
```bash
$ timeout {total_timeout} ./test_driver.py --workers {num_workers} --output {outputfolder} --queries {num_queries_per_worker} --rewriter ./calcite-fuzzing --dbms={dbms_undertest} --validate --num_loops={num_feedbackloops} --feedback={conf_feedback} --dbconf=db_conf_demo.json --query_timeout {per_query_timeout}
```
You can customize the value of the following options:
- {total_timeout}: timeout for the entire run of `AMOEBA` (unit is seconds)
- {workers}: number of parallel workers to invoke `GENERATOR` and `MUTATOR`
- {output}: location to store the intermediate results and bug reports
- {queries}: number of base queries that are generated by each `GENERATOR` instance
- {dbms}: DBMS that `AMOEBA` will evaluate on (i.e., `postgresql` or `cockroachdb`)
- {num_loops}: number of feedback loops
- {validate}: a boolean argument that decides whether to invoke the `VALIDATOR` after generating the equivalent query pairs
- {feedback}: what types of feedbacks to utilize (i.e., `both`, `none`, `mutator`, or `validator`)
- {query_timeout}: timeout for executing each query (unit is seconds)
For example, you can launch `AMOEBA` with the following command:
```shell
timeout 3600 ./test_driver.py --workers 1 --output /home/postgres/exp/1 --queries 200 --rewriter ./calcite-fuzzing --dbms=postgresql --validate --num_loops=100 --feedback=none --dbconf=db_conf_demo.json --query_timeout 30
```
If `AMOEBA` is working correctly, you should expect to see the following progress information is printed:
```bash
start query generator
['mutator.py --prob_table=/home/postgres/test/190156/prob_table_190156.json --db_info=/workspace/amoeba_conf/db_conf_demo.json -s seq --queries 100 1>/home/postgres/test/190156/0/log_sa0 2>/home/postgres/test/190156/0/input.sql']
finish query generator
start query rewriter
['java -cp calcite-core-1.22.0-SNAPSHOT-tests.jar:./*:. org.apache.calcite.test.Transformer /home/postgres/test/190156/0']
finish query rewriter
start validator
begin compare plan cost of equivalent queries
compare plan cost /home/postgres/test/190156/0/out/q20.sql
find plan diff /home/postgres/test/190156/0/out/q20.sql
compare plan cost /home/postgres/test/190156/0/out/q13.sql
find plan diff /home/postgres/test/190156/0/out/q13.sql
compare plan cost /home/postgres/test/190156/0/out/q18.sql
compare plan cost /home/postgres/test/190156/0/out/q23.sql
```
This example command should complete within 20 minutes. You can check the generated intermediate results in `/home/postgres/exp/1`. If `AMOEBA` discovers potential performance bugs, the generated bug report will live at `/home/postgres/exp/1/bugs.md`.
The shortcut CTRL+C can be used to terminate `AMOEBA` manually. Otherwise, `AMOEBA` will terminate either after a specified experiment timeout is reached or after a specified number of base queries have been examined. The option `total_timeout` controls the experiment timeout. The options `workers`, `queries`, and `num_loops` altogether determine the number of base queries that `AMOEBA` is going to examine.
## Note
- calcite part is based on Apache Calcite 1.22 and the only new class is added on `core/src/test/java/org/apache/calcite/test/Transformer.java`
- need to remove all dependencies in this repo and ask users to download it as needed.
- grant appropriate permission to your user to access the database. ref: https://stackoverflow.com/a/23934693/10180666