This programme is to automatically generate alpha factors and filter
relatively good factors with back-testing methods. Time consuming parts
are optimized with ``numba`` package.
Dependencies
------------
- python >= 3.5
- pandas >= 0.22.0
- numpy >= 1.14.0
- RNWS >= 0.2.1
- numba >= 0.38.0
- single_factor_model>=0.3.0
- IPython 5.1.0
- empyrical
- alphalens
Note: It is best to use the latest version of ``llvmlite`` in order to
make ``numba`` work properly. Otherwise it may couse a kernel-dies
situation.
Example
-------
load packages and read in data
------------------------------
.. code:: python
from alpha_factory import generator_class,get_memory_use_pct,clean
from RNWS import read
import numpy as np
import pandas as pd
start=20180101
end=20180331
factor_path='.'
frame_path='.'
df=pd.read_csv(frame_path+'/frames.csv')
## read in data
re=read.read_df('./re',file_pattern='re',start=start,end=end)
cap=read.read_df('./cap',file_pattern='cap',header=0,dat_col='cap',start=start,end=end)
open_price,close,vwap,adj,high,low,volume,sus=read.read_df('./mkt_data',file_pattern='mkt',start=start,end=end,header=0,dat_col=['open','close','vwap','adjfactor','high','low','volume','sus'])
ind1,ind2,ind3=read.read_df('./ind',file_pattern='ind',start=start,end=end,header=0,dat_col=['level1','level2','level3'])
inx_weight=read.read_df('./ZZ800_weight','Stk_ZZ800',start=start,end=end,header=None,inx_col=1,dat_col=3)
Note:\ ``frames`` contains columns as:
``df_name,equation,dependency,type``, where ``type`` includes
``df,cap,group``. In this case ``frames.csv`` have ``df_name``:
``re,cap,open_price,close,vwap,high,low,volume,ind1,ind2,ind3``.
You can also read data by using ``pd.read_csv`` directly depending on
how you store your data.
start to generate
-----------------
.. code:: python
parms={'re':close.mul(adj).pct_change()
,'cap':cap
,'open_price':open_price
,'close':close
,'vwap':vwap
,'high':high
,'low':low
,'volume':volume
,'ind1':ind1
,'ind2':ind2
,'ind3':ind3}
with generator_class(df,factor_path,**parms) as gen:
gen.generator(batch_size=3,name_start='a')
gen.generator(batch_size=3,name_start='a')
gen.output_df(path=frame_path+'/frames_new.csv')
continue to generate with existing frames and factors
-----------------------------------------------------
.. code:: python
with generator_class(df,factor_path,**parms) as gen:
gen.reload_df(path=frame_path+'/frames_new.csv')
gen.reload_factors(align=True)
clean()
for i in range(5):
gen.generator(batch_size=2,name_start='a')
print('step %d memory usage:\t %.1f%% \n'%(i,get_memory_use_pct()))
if get_memory_use_pct()>80:
break
gen.output_df(path=frame_path+'/frames_new2.csv')
Note: It is very important to ``align`` all factors and initial
dataframes before generating.
you can also choose how to store your factors by setting
``store_method``
backtesting with stratified sampling approach and ic-ir meansure after generation
---------------------------------------------------------------------------------
.. code:: python
data_box_param={'ind':ind1
,'price':vwap*adjfactor
,'sus':sus
,'ind_weight':inx_weight
,'path':'./databox'
}
back_test_param={'sharpe_ratio_thresh':3
,'n':5
,'out_path':'.'
,'back_end':'loky'
,'n_jobs':6
,'detail_root_path':None
,'double_side_cost':0.003
,'rf':0.03
}
icir_param={'ir_thresh':0.4
,'out_path':'.'
,'back_end':'loky'
,'n_jobs':6
}
with generator_class(df,factor_path,**parms) as gen:
for i in range(5):
gen.generator(batch_size=2,name_start='a')
gen.output_df(path=frame_path+'/frames_new.csv')
gen.getOrCreate_databox(**data_box_param)
gen.back_test(**back_test_param)
gen.icir(**icir_param)
clean()
if get_memory_use_pct()>90:
print('Memory exceeded')
break
To temporarily save (and reload) factor data you can use
``create_tmp_memory`` and ``reload_tmp_memory`` methods. This is usually
used before ``back_test`` and ``icir`` to release more memory for
parallel running.
generate script of factors
--------------------------
.. code:: python
from alpha_factory import write_file
import pandas as pd
df2=pd.read_csv(frame_path+'/frames_new.csv')
write_file(df2,'script.py')
locate a factor
---------------
.. code:: python
from alpha_factory.utilise import get_factor_path
factor_name='a0'
path=get_factor_path(factor_path,factor_name)
only when ``storage_method='byTime'``
use your own functions
----------------------
To use your own functions you need to append your code in class
``functions`` from ``basic_functions.py`` in the sourse file and also
append the corresponding names in ``functions.csv`` from ``data`` file
in the sourse file.
After that you can set ``debug=True`` in ``generator`` function to check
if there is any bug from all those functions. If indeed there is, a new
embeded ipython would be activated to help you find out what is going on
in the loop.