对fold之后的脉冲星数据读取、处理--2

对fold之后的脉冲星数据读取、处理–2

前期处理数据,fits类型的数据可以用astropy.io.fits来查看文件的简要信息以及后期数据的处理。

from astropy.io import fits
hdu = fits.open("xxxxxx.fits")
print(hdu.info())

折叠后数据包含了5个部分,如下所示:

Name Type
PRIMARY PrimaryHDU
HISTORY BinTableHDU
PSRPARAM BinTableHDU
POLYCO BinTableHDU
SUBINT BinTableHDU

其中PRIMARY为文件表头,包含了数据采样bits数、望远镜名字、接收机ID、观测时间等参数。仅这一个部分类型为PrimaryHDU。剩下的部分都为BinTableHDU
BinTableHDU包含了headercolumnsdata
HISTORY为记录数据处理过程。

PSRPARAM为脉冲星参数信息。

POLYCO用于每个数据阶段的polyco参数表。关于脉冲相位预测的参数和算法的描述,请参阅Tempo文档 。

SUBINT:数据部分。

可以用以下命令查看除了PRIMARY的部分的概要,比如最后一部分数据的:

from astropy.io import fits
import numpy as np
hdu = fits.open("xxxxxx.fits")
print(hdu[4].header)
print(hdu[4].columns)

输出如下:

XTENSION= ‘BINTABLE’ / ***** Subintegration data *****
BITPIX = 8 / N/A
NAXIS = 2 / 2-dimensional binary table
NAXIS1 = 65628 / width of table in bytes
NAXIS2 = 1415 / Number of rows in table (NSUBINT)
PCOUNT = 0 / size of special data area
GCOUNT = 1 / one data group (required keyword)
TFIELDS = 11 / Number of fields per row
TTYPE1 = ‘INDEXVAL’ / Optionally used if INT_TYPE != TIME
TFORM1 = '1D ’ / Double
TTYPE2 = 'TSUBINT ’ / Length of subintegration
TFORM2 = '1D ’ / Double
TTYPE3 = ‘OFFS_SUB’ / Offset from Start of subint centre
TFORM3 = '1D ’ / Double
TTYPE5 = 'AUX_DM ’ / additional DM (ionosphere, corona, etc.)
TFORM5 = '1D ’ / Double
TTYPE6 = 'AUX_RM ’ / additional RM (ionosphere, corona, etc.)
TFORM6 = '1D ’ / Double
TTYPE7 = ‘DAT_FREQ’ / [MHz] Centre frequency for each channel
TFORM7 = 'D ’ / NCHAN doubles
TTYPE8 = 'DAT_WTS ’ / Weights for each channel
TFORM8 = 'E ’ / NCHAN floats
TTYPE9 = ‘DAT_OFFS’ / Data offset for each channel
TFORM9 = '4E ’ / NCHANNPOL floats
TTYPE10 = 'DAT_SCL ’ / Data scale factor (outval=dataval
scl + offs)
TFORM10 = '4E ’ / NCHANNPOL floats
TTYPE11 = 'DATA ’ / Subint data table
TFORM11 = '32768I ’ / I (Fold) or B (1-8 bit) Search
EPOCHS = 'VALID ’ / Epoch convention (VALID, MIDTIME, STT_MJD)
INT_TYPE= 'TIME ’ / Time axis (TIME, BINPHSPERI, BINLNGASC, etc)
INT_UNIT= 'SEC ’ / Unit of time axis (SEC, PHS (0-1), DEG)
SCALE = 'Jansky ’ / Intensity units (FluxDen/RefFlux/Jansky)
POL_TYPE= ‘AABBCRCI’ / Polarisation identifier (e.g., AABBCRCI, AA+BB)
NPOL = 4 / Nr of polarisations
TBIN = 4.9152E-05 / [s] Time per bin or sample
NBIN = 8192 / Nr of bins (PSR/CAL mode; else 1)
NBIN_PRD= ’
’ / Nr of bins/pulse period (for gated data)
PHS_OFFS= '* ’ / Phase offset of bin 0 for gated data
NBITS = 8 / Nr of bits/datum (SEARCH mode data, else 1)
ZERO_OFF= '* ’ / Zero offset for SEARCH-mode data
SIGNINT = 0 / 1 for signed ints in SEARCH-mode data, else 0
NSUBOFFS= '* ’ / Subint offset (Contiguous SEARCH-mode files)
NCHAN = 1 / Number of channels/sub-bands in this file
CHAN_BW = 500. / [MHz] Channel/sub-band width
REFFREQ = 1400. / [MHz] Reference frequency
DM = 279. / [cm-3 pc] DM for post-detection dedisperion
RM = -12.76 / [rad m-2] RM for post-detection deFaraday
NCHNOFFS= '* ’ / Channel/sub-band offset for split files
NSBLK = 1024 / Samples/row (SEARCH mode, else 1)
NSTOT = '* ’ / Total number of samples (SEARCH mode, else 1)
EXTNAME = 'SUBINT ’ / name of this binary table extension
TUNIT2 = 's ’ / Units of field
TUNIT3 = 's ’ / Units of field
TUNIT5 = 'CM-3 ’
TUNIT6 = 'RAD ’
TUNIT7 = 'MHz ’ / Units of field
TDIM11 = ‘(8192,1,4)’ / (NBIN,NCHAN,NPOL) or (NCHAN,NPOL,NSBLK*NBITS/8)
TUNIT11 = 'Jy ’ / Units of subint data
EXTVER = 1 / auto assigned by template parser
TTYPE4 = 'PERIOD ’ / label for field
TFORM4 = '1D ’ / format of field
TUNIT4 = 's ’ / Spin period in seconds

ColDefs(
name = ‘INDEXVAL’; format = ‘1D’
name = ‘TSUBINT’; format = ‘1D’; unit = ‘s’
name = ‘OFFS_SUB’; format = ‘1D’; unit = ‘s’
name = ‘PERIOD’; format = ‘1D’; unit = ‘s’
name = ‘AUX_DM’; format = ‘1D’; unit = ‘CM-3’
name = ‘AUX_RM’; format = ‘1D’; unit = ‘RAD’
name = ‘DAT_FREQ’; format = ‘D’; unit = ‘MHz’
name = ‘DAT_WTS’; format = ‘E’
name = ‘DAT_OFFS’; format = ‘4E’
name = ‘DAT_SCL’; format = ‘4E’
name = ‘DATA’; format = ‘32768I’; unit = ‘Jy’; dim = ‘(8192,1,4)’
)

其中DATA中包括了INDEXVALTSUBINTOFFS_SUBPERIODAUX_DMAUX_RMDAT_FREQDAT_WTSDAT_OFFSDAT_SCLDATA

可以通过以下命令查看某个数据,比如OFFS_SUB:

from astropy.io import fits
import numpy as np
hdu = fits.open("xxxxxx.fits")
print(hdu[4].data["OFFS_SUB"])

拼接数据

写了一个程序用作折叠数据的拼接。

import os
import sys
import math
import time
import pickle
import argparse
import numpy as np
from astropy.io import fits
import copy, random, struct
from astropy.io.fits import ColDefs


import matplotlib.pyplot as plt




def get_col_name(hdu):
    """
    get fits column name
    """
    ncol = len(hdu[4].columns)
    colname = []
    for i in range(ncol):
        colname.append(hdu[4].columns[i].name)
    return colname


def get_coldata(fitsname):
    """
    read fold pulsar data
    """
    hdu = fits.open(fitsname) ## read data
    
    colname = get_col_name(hdu)
    ncol = len(colname)
    coldata = {}
    for i in range(ncol):
        coldata[colname[i]]=hdu[4].data[colname[i]]
    return coldata
    
def header_set(header,name,Value):
    """change the Key's Value of fits header """
    header = header
    header[name] = Value
    return header

def columns_set(column,data1,name,attri,Value):
    """change the Key's Value of fits columns """
    columns = fits.ColDefs(column)
    columns.change_attrib(data=data1,col_name=name,attrib=attri,new_value=Value)
    return columns

### input parameter

parser = argparse.ArgumentParser(prog='Calculate',
                                usage='%(prog)s [options] calculate two numbers',
                                description='This is a calculat program')
parser.add_argument('-f','--filename',required=True,nargs='+',help='the input file name')

parser.add_argument('-o','--outfile',default='Addfile',help=' the output file name')
args = parser.parse_args()
#print(args)
#print(type(args))
filename = args.filename
outfile = args.outfile
if outfile =="Addfile":
    out_front = filename[0].split(".")[-1]
    outfile=outfile+"."+out_front
#print(outfile+"."+out_front)
if os.path.exists(r'%s'%outfile):
    os.remove(r'%s'%outfile)


#arg = sys.argv[1:]


#fits2 = "J1853_all.Fc"

#if os.path.exists(r'%s'%fits2):
#            os.remove(r'%s'%fits2)


hdu1 = fits.open(filename[0])


colname = get_col_name(hdu1)
coldata_a = {}
for i in range(len(colname)):
    coldata_a[colname[i]]=[]
    
#print(colname_dict)
#data=[]
#scales=[]
#offsets=[]
#weights=[]

nsub = 0

print ("WUUUUU",filename[:])

for i in range(len(filename)):
    #datai,nsubi,scalei,offseti,weighti = read_data(arg[i])
    coldata = get_coldata(filename[i])
    nsubi = len(coldata["DATA"])
    for j in range(len(colname)):
        coldata_a[colname[j]].extend(coldata[colname[j]])
    
    nsub +=nsubi
    #data.extend(datai)
    #scales.extend(scalei)
    
    #offsets.extend(offseti)
    #print("offset shape",offseti.shape)
    #weights.extend(weighti)
    
for i in range(len(colname)):
    coldata_a[colname[i]] = np.array(coldata_a[colname[i]])

header0 = hdu1[0].header
colheader = hdu1[4].header
### 设置第一个header card 的参数
primary_hdu=fits.PrimaryHDU(header=header0)
## 数据处理 星表 以及处理过程表
tab1 =  hdu1[1]
tab2 = hdu1[2]
tab3 = hdu1[3]
tab1_col = tab1.columns
tab1_header = tab1.header
tab1_col = fits.ColDefs(tab1_col)
tab1_col["NSUB"].array=np.array([nsub]*5,dtype='int32')
newtab1 = fits.BinTableHDU.from_columns(tab1_col,header=tab1_header)

## 设置 nsub的数量
colheader = header_set(colheader,"NAXIS2",nsub)

## 创建新的表 列
clumns_new = hdu1[4].columns
clumns_new = fits.ColDefs(clumns_new)
#DAT_OFFS = str(int(nsub)) +"E"
nbin = hdu1[4].header["NBIN"] ## the number samples per pulse
tbin = hdu1[4].header["TBIN"] ## [s] Time per bin or sample
total_time = round(nsub*nbin*tbin)
dt = hdu1[1].data['TBIN'][-1]
tsubint = nbin*dt
offs_sub_init = tsubint/2

print("tsubint",tbin,tsubint)
## 对表格的数据 重新赋值
for i in range(len(colname)):
    if colname[i] == 'OFFS_SUB':
        ## the first offs_sub
        offs_sub = []
        #print(off0)
        for j in np.arange(0,len(coldata_a[colname[i]])):
            #print(j*off0-off0)
            offs_sub.append(offs_sub_init+j*tsubint)
        
        clumns_new[colname[i]].array=np.array(offs_sub)
    else:
        clumns_new[colname[i]].array=coldata_a[colname[i]]
    
#print(colname[i])
#print(clumns_new[-1].array.shape)


### 组合表格 fits
newHdu = fits.BinTableHDU.from_columns(clumns_new,header=colheader)
outputdata=fits.HDUList([primary_hdu,newtab1,tab2,tab3,newHdu])



print("####wriite into %s #### "%outfile)
outputdata.writeto(outfile)
hdu1.close(output_verify="exception",verbose=True,closed=True)

程序用法

假设程序名字为psradd.py.

python parade.py -f file1.Fp file2.Fp file3.Fp -o file_all.Fp 

-f file1 file2 输入文件的名字
-o filename 输出文件的名字

说明

在做数据拼时,POLYCO部分未做修改,可能导致某些错误,以后会对这部分修改。对与SUBINT部分,数据部分做了拼接,对于OFFS_SUB数据做了处理,OFFS_SUB为每一个脉冲中心距离数据开始记录时的时间间隔。处理方法为:获取采样间隔dt、每个脉冲序列的采样数目nbin、脉冲序号i。第一个init_offs_sub = nbin*dt/2,每个序列的offs_sub值为:i*dt*nbin+init_offs_sub.

折叠模式数据以16位有符号整数的形式存储,通道和偏振顺序与脉冲轮廓在连续的位置。在转换为整数之前,对每个子积分,从通道数据中减去平均channel数值(bins和polarisations的平均值),并对剩余进行缩放,以便data数组中的值覆盖整个可用范围(-32768至32767)。使用以下方法重构原始观测功率:
R e a l v a l u e = D A T A v a l u e ∗ D A T _ S C L + D A T _ O F F S . Real value = DATAvalue * DAT\_SCL + DAT\_OFFS. Realvalue=DATAvalueDAT_SCL+DAT_OFFS.

在使用pam -r phase命令时,偏移的相位为每个单独脉冲的,并不涉及上一个、下一个脉冲,如果一个完整的脉冲轮廓出现在一个脉冲序列之间,如下图所示:

对fold之后的脉冲星数据读取、处理--2_第1张图片
如果一个脉冲出现在两个脉冲序列中,在单个脉冲序列如下图所示,会导致脉冲偏移, 在这种情况下,分析单脉冲能量时、比如nulling和mode shift,不能使用pam -r或者其他命令,在使用psrahive中的psradd命令也要慎重。

对fold之后的脉冲星数据读取、处理--2_第2张图片

2021年12月15日。或许之后psrchive会修复上述问题。

你可能感兴趣的:(python,天体物理学,数据分析)