会发生这种情况,一定是字段为int或者float类型,它会把空字符串默认为0,varchar则会还是空字符串
解决办法一般为两个:第一个把所有空字符串替换为\N
import re
with open("F:\\factor_db\\test_new", 'a') as g:
for line in open("F:\\factor_db\\test_old.txt", 'r'):
b = list(set(re.findall('\t+', line)))
b.sort(key = lambda i:len(i),reverse=True)
line = line.replace('\t\n', '\t\\N\n')
if max([len(i) for i in b])==1:
g.write(line)
else:
for i in b:
templen=len(i)
line=line.replace(i, '\t\\N'*(templen-1)+'\t')
g.write(line)
test_old.txt是直接从mysql生成的文本,test_new是最后处理成新的文本,然后直接
load data infile 'F:/factor_db/test_new' into table t_stock_factor_barra fields terminated by '\t' lines terminated by '\r\n' ignore 1 lines
数据库直接就是null
推荐第二个:
load data infile 'F:/factor_db/barra_test' into table t_stock_factor_barra fields terminated by '\t' lines terminated by '\r\n' ignore 1 lines
(`full_insID`,`date`,@`beta`,@`book_to_price_ratio`,@`earnings_yield`,@`growth`,@`leverage`,@`liquidity`,@`momentum`,@`non_linear_size`,@`residual_volatil
ity`,@`size`)
set
`beta` = NULLif(@beta,''),
`book_to_price_ratio` = NULLif(@book_to_price_ratio,''),
`earnings_yield` = NULLif(@earnings_yield,''),
`growth` = NULLif(@growth,''),
`leverage` = NULLif(@leverage,''),
`liquidity` = NULLif(@liquidity,''),
`momentum` = NULLif(@momentum,''),
`non_linear_size` = NULLif(@non_linear_size,''),
`residual_volatility` = NULLif(@residual_volatility,''),
`size` = NULLif(@size,'')
;
F:/factor_db/barra_test是要导入的文档
(`full_insID`,`date`,@`beta`,@`book_to_price_ratio`,@`earnings_yield`,@`growth`,@`leverage`,@`liquidity`,@`momentum`,@`non_linear_size`,@`residual_volatil
ity`,@`size`)
是数据库表字段名称,一定要按顺序写
加@的就是判断有空字符串,直接存为null的,下面跟上`字段名` = NULLif(@字段名,'')这句话就能完成了