Python生信练习

python生信练习

1 在文件中创建字符串nN

Create 10 strings in a file, for each string, the first half is

n and the last half is N. The length of the string increase

by 2.

 Example:

 nN

 nnNN

 …

 nnnnnNNNNN

 …

#open函数:file_object=open(file_name,access_mode='r'),access_mode:文件使用模式
n=open('n.txt','w') # r只读,w可写,a追加,创建txt文件

for i in range(1,10):

    k=1
    while(k<=i):
        n.write('n')
        k=k+1
    k=1
    while(k<=i):
        n.write('N')
        k=k+1

    n.write('\n')#\n为换行符

2 计算各种3bp组合的频率

Calculate the frequency of all kinds of 3bps in a given genome sequence**:**CTCCTGAACAACCCCACTGTACTTCCCACTGCTCCTGAACAACCCCACTGTACTTCCCACTGCTCCTGAACAACCCCACTGTACTTCCCACTGCTCCTGAACAACCAGTCGATATCCACAAACACAGAAACAACCCTTCGCAGCCTGGCCACACACATCATTCCACAACACATAGGACTCCCCCACAAACACAGAAACAACCCTTCGCAGCCTGGTCACACACATCATTCCACAACACATAGGACTCCCCCACAAACGTAATGGAGAGGTTGCAATAACCCATAAAATCACAATTAATAATAGTAGTGTTGCATATACCGACACAGACAGCACAAGTGGACGTATGACAAGACACACAAATAGTAGCACACAAAAGCAAAGCAAAAAGCATAGCACAAT

Output the frequency as key=>value pairs, and sorted by keys in alphabetical order.

AAA 10

AAC 2

import itertools#导入库,该库的作用是排列组合
import re#导入re模块,该库的作用是正则表达式
from collections import Counter
import operator

str = "CTCCTGAACAACCCCACTGTACTTCCCACTGCTCCTGAACAACCCCACTGTACTTCCCACTGCTCCTGAACAACCCCACTGTACTTCCCACTGCTCCTGAACAACCAGTCGATATCCACAAACACAGAAACAACCCTTCGCAGCCTGGCCACACACATCATTCCACAACACATAGGACTCCCCCACAAACACAGAAACAACCCTTCGCAGCCTGGTCACACACATCATTCCACAACACATAGGACTCCCCCACAAACGTAATGGAGAGGTTGCAATAACCCATAAAATCACAATTAATAATAGTAGTGTTGCATATACCGACACAGACAGCACAAGTGGACGTATGACAAGACACACAAATAGTAGCACACAAAAGCAAAGCAAAAAGCATAGCACAAT";
print(len(str))#字符串长度为3的倍数

str_list = re.findall(".{3}",str)#findall是找到所有的字符,再在字符中添加空格
new_str = " ".join(str_list)#再在字符中添加空格

new_str=new_str.split()#将字符串按空格分隔,此时每个3bp看作一个元素,计数时会按照这个计数,否则只按照ATCG单个字符计数

result = Counter(list(new_str))#Counter函数,传入列表,计算列表中元素出现次数,返回字典
sort_key_result = dict(sorted(result.items(), key=operator.itemgetter(0)))  #按照key值升序
print(result)#输出字典(已经算出每个3bp的频率)
print(sort_key_result)#输出排序后的字典

3.对比两序列,若字符一样在二维矩阵中将相应位置赋值为1

We have two sequences:

S1: ATGATAGCAGTGAAATGGG

S2: GATAGCAGTGAAACGGGCA

Build up a two-dimensional array, with length equal to the sequence length.

for array[i][j], If S1[i] eq S2[j], array[i][j]=1; If S1[i] ne S2[j], array[i][j]=0;

Output the array in a single file with a human-readable format.

import numpy as np
import pandas as pd#导入numpy和pandas库

S1=list('ATGATAGCAGTGAAATGGG')
print(S1)
S2=list('GATAGCAGTGAAACGGGCA')
print(S2)

print(len(S1))#获得长度,依次创建二维数组

compare = [[0]*19 for i in range(19)]#创建二维数组

for i in range(19):
    for j in range(19):
        if(S1[i]==S2[j]):
            compare[i][j]=1

print(compare)

a = np.array(compare)#将二维数组变换为矩阵形式,用到了numpy库
print(a)

data_df = pd.DataFrame(a)#关键1,将ndarray格式转换为DataFrame
data_df.columns = S2#行索引,即行的表头
data_df.index = S1#列索引,即列的表头

writer = pd.ExcelWriter('compare.xlsx')  #关键2,创建名称为compare的excel表格
data_df.to_excel(writer,'page_1',float_format='%.5f')  #关键3,float_format 控制精度,将data_df写到表格的第一页中。若多个文件,可以在page_2中写入
writer.save()#存储表格

未完待续,持续更新…

你可能感兴趣的:(python,python,开发语言)