rosalind练习题二十五

# Problem
# For a collection of strings, a larger string containing every one of the smaller strings as a substring is called a superstring.
# By the assumption of parsimony, a shortest possible superstring over a collection of reads serves as a candidate chromosome.

# Given: At most 50 DNA strings of approximately equal length, not exceeding 1 kbp, in FASTA format (which represent reads deriving from the same strand of a single linear chromosome).The dataset is guaranteed to satisfy the following condition: there exists a unique way to reconstruct the entire chromosome from these reads by gluing together pairs of reads that overlap by more than half their length.
# Return: A shortest superstring containing all the given strings (thus corresponding to a reconstructed chromosome).

# Sample Dataset
# >Rosalind_56
# ATTAGACCTG
# >Rosalind_57
# CCTGCCGGAA
# >Rosalind_58
# AGACCTGCCG
# >Rosalind_59
# GCCGGAATAC
# Sample Output
# ATTAGACCTGCCGGAATAC

# 这道题是要我们将给定的DNA序列拼接起来,得到一个包含所有序列的最短超级字符串。可用贪心算法。

# 计算字符串之间的重叠部分长度
def overlap(s, t):
    max_len = min(len(s), len(t))
    for i in range(max_len, 0, -1):
        if s[-i:] == t[:i]:
            return i
    return 0


# 查找下一个要拼接的字符串
def find_next_string(current_string, unused_strings):
    overlap_lengths = [overlap(current_string, s) for s in unused_strings]
    next_index = overlap_lengths.index(max(overlap_lengths))
    return unused_strings.pop(next_index)


# 使用贪心算法拼接字符串
def shortest_superstring(strings):
    superstring = strings.pop(0)
    while strings:
        next_string = find_next_string(superstring, strings)
        overlap_len = overlap(superstring, next_string)
        superstring += next_string[overlap_len:]
    return superstring


# 读取文件
filename = "input7.txt"
with open(filename, 'r') as f:
    lines = [line.strip() for line in f.readlines()]

# 提取字符串列表
strings = []
current_string = ''
for line in lines:
    if line.startswith('>'):
        strings.append(current_string)
        current_string = ''
    else:
        current_string += line
strings.append(current_string)

# 寻找最短的超级字符串
superstring = shortest_superstring(strings)

# 输出结果
print(superstring)
 

你可能感兴趣的:(算法,python)