Python对srt的解析

最过在看Desperate_Housewives_-_Season_1,奇艺上只有中文字幕,对于我等希望练习英语听力的人来讲是一大缺憾。网上遍寻不到合适的工具来显示外挂字幕。正好最近在学习Python,于是心想求人不如求已,自已动手做一个得了。

凡事得有步骤,我的构想如下:
1. 分析SRT格式文件;
2. 提取时间信息和要显示的字符,此为最重要的部分,最好的方式是调用Python的正则表达式来提取相关的信息;
3. 调用pyosd显示,类似于QQ音乐播放器的歌词显示功能;

关于SRT的说明,可以参考http://en.wikipedia.org/wiki/SubRip。不过因为工作中经常接触外挂字幕,所以对于SRT也有一定的了解.
The SubRip file format is "perhaps the most basic of all subtitle formats."[10] SubRip files are named with the extension .srt, and contain formatted plain text. The time format used is hours:minutes:seconds,milliseconds. The decimal separator used is the comma, since the program was written in France. The line break used is often the CR+LF pair. Subtitles are numbered sequentially, starting at 1.

    Subtitle number          //相当于index,标记subtitle的序号
    Start time --> End time        //开始与结束时间,duration可以据此计算出来  
    Text of subtitle (one or more lines)     //字幕信息
    Blank line[11][10]           //空白行

以下是实现的代码,很rough, 我还在修改中,只是实现了部分功能:

import re

import pyosd

import sys

import getopt

import time



class srtParsing():

    index = 0

    #hour minute sec = 0

    duration = 0

    print time.time()

    

    def srtGetIndex(self, line):

	reg = re.compile('\d')

	if(reg.search(line)):

	    print line





    def srtGetTimeStamp(self, line):

	reg = re.compile('\-\-\>')

	p = pyosd.osd()

	if(reg.search(line)):

	    print line

	    time = line.split('-->')



	    #START TIME:

	    hour_end = time[1].split(':')

	    minute_end = int(hour_end[1])

	    sec_end = hour_end[2].split(',')

	    hour_end = int(hour_end[0])

	    mis_end = int(sec_end[1])

	    sec_end = int(sec_end[0])

	    print "end-->h:%d m:%d s:%d,mis:%d" %(hour_end, minute_end, sec_end, mis_end)



	    #END TIME:

	    hour_start = time[0].split(':')

	    minute_start = int(hour_start[1])

	    sec_start = hour_start[2].split(',')

	    hour_start = int(hour_start[0])

	    mis_start = sec_start[1]

	    sec_start = int(sec_start[0])



	    time_start = hour_start * 60 * 60 + minute_start * 60 + sec_start

	    print "start time :%d" %time_start

	    time_end = hour_end * 60 * 60 + minute_end * 60 + sec_end

	    print "end time:%d" %time_end

	    duration = time_end - time_start

	    print duration

	    p.set_timeout(duration)



    def srtGetSubInfo(self, line):

	reg = re.compile(r'^[a-zA-Z]')

	p = pyosd.osd()

	p.set_pos(pyosd.POS_BOT)

	p.set_colour("YELLOW")

	p.set_align(1)

	#p.set_shadow_offset(10)

	p.set_vertical_offset(100)

	if(reg.search(line)):

	    print line

	    p.display(line)

	    p.wait_until_no_display()





if __name__ == "__main__":

    srt = srtParsing()

    f=open("/home/workspace/subtitle/src/dh.srt")

    for line in f:

        srt.srtGetTimeStamp(line)

        srt.srtGetSubInfo(line)

  下一步工作是对时间的控制,需要从系统中获取时间与标签对比,从而精确控制显示.

你可能感兴趣的:(python)