适度冒险因子 由于本因子需要用到分钟级别的量价数据,全部得到数据量太大,难以保存与下载。因此,我只选取了其中299只股票2021年的数据进行简单的实现。
数据准备 1 2 3 4 5 6 7 8 9 import pandas as pdimport jsonimport osimport zipfileimport numpy as npfrom tqdm import tqdmfrom mytools import backtestimport warningswarnings.filterwarnings("ignore" )
1 2 with open ("../data/stock_pool.json" , 'r' ) as f: stock_pool = json.load(f)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 def dataloader (stock_code ): with zipfile.ZipFile("../data/mins.zip" , 'r' ) as zfile: f = zfile.open (f'mins/{stock_code} .csv' ) df = pd.read_csv(f) df['rtn' ] = df.groupby('date' ).apply(lambda x: (x['close' ]-x['close' ].shift(1 )) / x['close' ].shift(1 )).reset_index(drop=True ) df['date' ] = pd.to_datetime(df['date' ]) return df df = dataloader("000001.SZ" ) df.head()
close
volume
stock_code
date
hour
minute
rtn
0
2853.1013
3887508
000001.SZ
2021-01-04
9
31
NaN
1
2847.0500
1843936
000001.SZ
2021-01-04
9
32
-0.002121
2
2850.0757
1673800
000001.SZ
2021-01-04
9
33
0.001063
3
2824.3584
2422714
000001.SZ
2021-01-04
9
34
-0.009023
4
2813.7690
2531900
000001.SZ
2021-01-04
9
35
-0.003749
1 2 3 4 5 6 7 8 9 10 11 12 13 daily_stock_data = pd.read_csv("../data/daily_stock_data.csv" ) daily_stock_data['date' ] = pd.to_datetime(daily_stock_data['date' ]) daily_stock_data['rtn' ] = (daily_stock_data['close' ] - daily_stock_data['pre_close' ]) / daily_stock_data['pre_close' ] daily_stock_data.head(5 )
stock_code
date
open
high
low
close
pre_close
volume
rtn
0
000001.SZ
2021-12-31
16.86
16.90
16.40
16.48
16.82
1750760.89
-0.020214
1
000001.SZ
2021-12-30
16.76
16.95
16.72
16.82
16.75
796663.60
0.004179
2
000001.SZ
2021-12-29
17.16
17.16
16.70
16.75
17.17
1469373.98
-0.024461
3
000001.SZ
2021-12-28
17.22
17.33
17.09
17.17
17.22
1126638.91
-0.002904
4
000001.SZ
2021-12-27
17.33
17.35
17.16
17.22
17.31
731118.99
-0.005199
计算预测收益 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 dsd = {} for key in tqdm(['high' , 'open' , 'low' , 'close' , 'volume' ]): dsd[key] = pd.pivot(daily_stock_data, index='date' , columns='stock_code' , values=key) dsd['pred_rtn' ] = (dsd['close' ].shift(-1 )-dsd['close' ])/dsd['close' ] pred_rtn_na = dsd['pred_rtn' ].isna() vol0 = dsd['volume' ].shift(-1 )==0 dsd['pred_rtn' ][vol0 & (~pred_rtn_na)] = 0 yz = dsd['high' ].shift(-1 )==dsd['low' ].shift(-1 ) zt = ~(dsd['close' ].shift(-1 ) > dsd['close' ]) dsd['pred_rtn' ][yz & zt & (~pred_rtn_na)] = 0 pred_rtn = dsd['pred_rtn' ].stack().reset_index().rename(columns={0 : 'pred_rtn' })
因子计算 因子计算思路:
计算分钟频率交易量的变化: $\Delta volume = volume_t-volume_{t-1}$
由此得到每日放量的“激增时刻”: $t_s = \Delta volume>\overline{\Delta volume}+\sigma (\Delta volume)?0:1$
分别计算激增时刻后五分钟收益率的平均值,标准差作为这个激增时刻所引起的市场反应的“耀眼收益率$r_s$”与”“耀眼波动率$\sigma_s$”
分别计算t日所有激增时刻的“耀眼收益率$r_s$”与”“耀眼波动率\sigma_s”的均值,作为“日耀眼收益率$r_s^t$”与“日耀眼波动率$\sigma_s^t$”
计算二者在截面上的均值作为当日的“适度水平”,计算两个日度指标与市场平均水平的差距,然后计算差距的均值作为“月均耀眼指标”,标准差作为“月稳耀眼指标”。
最后:
$$ 月耀眼收益率 = 月均耀眼收益率 + 月稳耀眼收益率 \ 月耀眼波动率 = 月均耀眼波动率 + 月稳耀眼波动率 $$
激增时刻 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 def find_surge_time (stk_data ): """ 识别每日交易过程中的“激增时刻”,即交易量超过当天交易量增量mean+std的时刻 Args: stk_data (_type_): 单只股票的分钟序列 Returns: _type_: _description_ """ stk_data['volume_delta' ] = stk_data.groupby(['stock_code' , 'date' ]) \ .apply(lambda x: x['volume' ]-x['volume' ].shift(1 )).reset_index(drop=True ) up_bound = stk_data.groupby(['stock_code' , 'date' ])['volume_delta' ] \ .apply(lambda x: x.mean()+x.std()).reset_index() \ .rename(columns={"volume_delta" : 'up_bound' }) stk_data = pd.merge(stk_data, up_bound, on=['stock_code' , 'date' ], how="left" ) stk_data['surge' ] = 0 stk_data.loc[stk_data['volume_delta' ]>stk_data['up_bound' ], 'surge' ] = 1 return stk_data
1 2 3 4 5 6 ls = [] for stock_code in tqdm(stock_pool): stk_data = dataloader(stock_code) ls.append(stk_data) stk_data = pd.concat(ls).reset_index(drop=True ) stk_data = find_surge_time(stk_data)
因子计算 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 def calculate_moderate_risk_factor (stk_data0 ): """ 计算适度冒险因子 Args: stk_data (_type_): 股票数据 """ def monthly_excellent_factor (stk_data, aspect ): """ 计算股票不同指标月度情况 """ stk_data.loc[stk_data['surge' ]==0 ][aspect] = np.nan fac_ex = stk_data.groupby(['stock_code' , 'date' ], group_keys=False )[aspect] \ .apply(lambda x: x.mean()).to_frame() \ .rename(columns={aspect: 'excellent' }).reset_index() market_level = fac_ex.groupby('date' )['excellent' ] \ .mean().to_frame().rename(columns={'excellent' : 'market_level' }) fac_ex = pd.merge(fac_ex, market_level, on="date" , how='left' ) fac_ex['moderate' ] = abs (fac_ex['excellent' ] - fac_ex['market_level' ]) fac_ex = fac_ex.set_index('date' ) factor = pd.DataFrame() factor['moderate_mean' ] = fac_ex.groupby('stock_code' )['moderate' ].rolling(20 ).mean() factor['moderate_std' ] = fac_ex.groupby('stock_code' )['moderate' ].rolling(20 ).std() factor['factor' ] = factor['moderate_mean' ] + factor['moderate_std' ] return factor[['factor' ]] stk_data = stk_data0.copy() stk_data['rtn_m5' ] = stk_data.groupby(['stock_code' , 'date' ], group_keys=False )['rtn' ] \ .apply(lambda x: x.rolling(5 ).mean().shift(-5 )) stk_data['rtn_s5' ] = stk_data.groupby(['stock_code' , 'date' ], group_keys=False )['rtn' ] \ .apply(lambda x: x.rolling(5 ).std().shift(-5 )) fac_ex_ret = monthly_excellent_factor(stk_data, "rtn_m5" ) fac_ex_vol = monthly_excellent_factor(stk_data, "rtn_s5" ) factor = (fac_ex_ret['factor' ] + fac_ex_vol['factor' ]).reset_index() return factor
因子数据处理 1 2 3 4 5 6 7 8 factor = calculate_moderate_risk_factor(stk_data) factor = factor.dropna() factor = pd.merge(factor, pred_rtn, on=['date' , 'stock_code' ], how='left' ) factor = factor[~factor['pred_rtn' ].isna()].rename(columns={'factor' : "moderate_risk_factor" , 'date' : "close_date" }) factor = backtest.winsorize_factor(factor, 'moderate_risk_factor' ) factor.head(5 )
stock_code
close_date
moderate_risk_factor
pred_rtn
5651
000001.SZ
2021-01-29
0.000551
0.063231
5652
000002.SZ
2021-01-29
0.001112
0.010076
5653
000004.SZ
2021-01-29
0.000724
-0.055281
5654
000005.SZ
2021-01-29
0.001220
-0.093458
5655
000006.SZ
2021-01-29
0.000634
0.014403
因子检测 1 2 3 res_dict = backtest.fama_macbeth(factor, 'moderate_risk_factor' ) fama_macbeth_res = pd.DataFrame([res_dict]) fama_macbeth_res
fac_name
t
p
pos_count
neg_count
0
moderate_risk_factor
0.189631
0.849773
113
109
1 group_rtns, group_cum_rtns = backtest.group_return_analysis(factor, 'moderate_risk_factor' )
整体来看该因子是一个正向因子,从我选择的回测期来看,这并不是一个有效的因子。
通过Fama-MacBeth检验,其带来的收益几乎为0,而且并不显著。
对因子进行分组回测可以看到,收益两头高中间低,可以进行进一步的优化。
但是由于回测时间太短,而且只在300只股票中测试,无法判定因子的真实效果,可能只是收到市场风格影响,可以在更长的时间,更大的票池上测试。
1 rtn, evaluate_result = backtest.backtest_1week_nstock(factor, 'moderate_risk_factor' )
sharpe_ratio
max_drawdown
max_drawdown_start
max_drawdown_end
sortino_ratio
annual_return
annual_volatility
section
0
1.930454
0.131483
2021-09-10
2021-11-04
2.777997
0.388784
0.178471
Sum
1
1.930454
0.131483
2021-09-10
2021-11-04
2.777997
0.388784
0.178471
2021
从策略指标来看,效果其实还可以,夏普比接近2,回撤也较小。整体收益比300只股票的均值大一些。
1 2 3 market_rtn = daily_stock_data.groupby('date' )['rtn' ].mean().to_frame().rename(columns={'rtn' : 'market_rtn' }) rtn = pd.merge(rtn, market_rtn, right_index=True , left_index=True , how="left" ) rtn['market_cum_rtn' ] = (1 + rtn['market_rtn' ]).cumprod()
1 rtn[['cum_rtn' , 'market_cum_rtn' ]].plot()