5t4rk

恶意代码--pdf文件格式解析与恶意攻击脚本分享

0x01 pdf文件

至于整个格式adobe公司已经有相应的公开文档，同时网上的资料也很多都能查阅到。

比如文件格式 :

http://www.2cto.com/Article/201011/77380.html

http://baike.baidu.com/link?url=xxI9mMNs28qKwpqJ0OX-fApWoubQH2hw9qq-GdvtYse9qnFIEvcFTc8NSViu9iBPa4w0w83QULjThRQznoPDSK

还有网上也有对应的封装好的接口，开源的代码支持目前主流平台。

开源代码 :

windows http://www.pdflib.com/binaries/PDFlib/705/PDFlib-Lite-7.0.5p3.zip

linux http://www.pdflib.com/binaries/PDFlib/705/PDFlib-Lite-7.0.5p3.tar.gz

mac os http://www.pdflib.com/binaries/PDFlib/705/PDFlib-Lite-7.0.5p3.dmg

0x02 格式解析

python脚本(国外牛人写的)

作者

#!/usr/bin/python

__description__ = 'pdf-parser, use it to parse a PDF document'
__author__ = 'Didier Stevens'
__version__ = '0.6.5'
__date__ = '2016/07/27'
__minimum_python_version__ = (2, 5, 1)
__maximum_python_version__ = (3, 4, 3)

"""
Source code put in public domain by Didier Stevens, no Copyright
https://DidierStevens.com
Use at your own risk

History:
  2008/05/02: continue
  2008/05/03: continue
  2008/06/02: streams
  2008/10/19: refactor, grep & extract functionality
  2008/10/20: reference
  2008/10/21: cleanup
  2008/11/12: V0.3 dictionary parser
  2008/11/13: option elements
  2008/11/14: continue
  2009/05/05: added /ASCIIHexDecode support (thanks Justin Prosco)
  2009/05/11: V0.3.1 updated usage, added --verbose and --extract
  2009/07/16: V0.3.2 Added Canonicalize (thanks Justin Prosco)
  2009/07/18: bugfix EqualCanonical
  2009/07/24: V0.3.3 Added --hash option
  2009/07/25: EqualCanonical for option --type, added option --nocanonicalizedoutput
  2009/07/28: V0.3.4 Added ASCII85Decode support
  2009/08/01: V0.3.5 Updated ASCIIHexDecode to support whitespace obfuscation
  2009/08/30: V0.3.6 TestPythonVersion
  2010/01/08: V0.3.7 Added RLE and LZW support (thanks pARODY); added dump option
  2010/01/09: Fixed parsing of incomplete startxref
  2010/09/22: V0.3.8 Changed dump option, updated PrettyPrint, added debug option
  2011/12/17: fixed bugs empty objects
  2012/03/11: V0.3.9 fixed bugs double nested [] in PrettyPrintSub (thanks kurt)
  2013/01/11: V0.3.10 Extract and dump bug fixes by Priit; added content option
  2013/02/16: Performance improvement in cPDFTokenizer by using StringIO for token building by Christophe Vandeplas; xrange replaced with range
  2013/02/16: V0.4.0 added http/https support; added error handling for missing file or URL; ; added support for ZIP file with password 'infected'
  2013/03/13: V0.4.1 fixes for Python 3
  2013/04/11: V0.4.2 modified PrettyPrintSub for strings with unprintable characters
  2013/05/04: Added options searchstream, unfiltered, casesensitive, regex
  2013/09/18: V0.4.3 fixed regression bug -w option
  2014/09/25: V0.5.0 added option -g
  2014/09/29: Added PrintGenerateObject and PrintOutputObject
  2014/12/05: V0.6.0 Added YARA support
  2014/12/09: cleanup, refactoring
  2014/12/13: Python 3 fixes
  2015/01/11: Added support for multiple YARA rule files; added request to search in trailer
  2015/01/31: V0.6.1 Added optionyarastrings
  2015/02/09: Added decoders
  2015/04/05: V0.6.2 Added generateembedded
  2015/04/06: fixed bug reported by Kurt for stream produced by Ghostscript where endstream is not preceded by whitespace; fixed prettyprint bug
  2015/04/24: V0.6.3 when option dump's filename is -, content is dumped to stdout
  2015/08/12: V0.6.4 option hash now also calculates hashes of streams when selecting or searching objects; and displays hexasciidump first line
  2016/07/27: V0.6.5 bugfix whitespace 0x00 0x0C after stream 0x0D 0x0A reported by @mr_me

Todo:
  - handle printf todo
  - support for JS hex string EC61C64349DB8D88AF0523C4C06E0F4D.pdf.vir

"""

import re
import optparse
import zlib
import binascii
import hashlib
import sys
import zipfile
import time
import os
if sys.version_info[0] >= 3:
    from io import StringIO
    import urllib.request
    urllib23 = urllib.request
else:
    from cStringIO import StringIO
    import urllib2
    urllib23 = urllib2
try:
    import yara
except:
    pass

CHAR_WHITESPACE = 1
CHAR_DELIMITER = 2
CHAR_REGULAR = 3

CONTEXT_NONE = 1
CONTEXT_OBJ = 2
CONTEXT_XREF = 3
CONTEXT_TRAILER = 4

PDF_ELEMENT_COMMENT = 1
PDF_ELEMENT_INDIRECT_OBJECT = 2
PDF_ELEMENT_XREF = 3
PDF_ELEMENT_TRAILER = 4
PDF_ELEMENT_STARTXREF = 5
PDF_ELEMENT_MALFORMED = 6

dumplinelength = 16

#Convert 2 Bytes If Python 3
def C2BIP3(string):
    if sys.version_info[0] > 2:
        return bytes([ord(x) for x in string])
    else:
        return string

# CIC: Call If Callable
def CIC(expression):
    if callable(expression):
        return expression()
    else:
        return expression

# IFF: IF Function
def IFF(expression, valueTrue, valueFalse):
    if expression:
        return CIC(valueTrue)
    else:
        return CIC(valueFalse)

def Timestamp(epoch=None):
    if epoch == None:
        localTime = time.localtime()
    else:
        localTime = time.localtime(epoch)
    return '%04d%02d%02d-%02d%02d%02d' % localTime[0:6]

def CopyWithoutWhiteSpace(content):
    result = []
    for token in content:
        if token[0] != CHAR_WHITESPACE:
            result.append(token)
    return result

def Obj2Str(content):
    return ''.join(map(lambda x: repr(x[1])[1:-1], CopyWithoutWhiteSpace(content)))

class cPDFDocument:
    def __init__(self, file):
        self.file = file
        if file.lower().startswith('http://') or file.lower().startswith('https://'):
            try:
                if sys.hexversion >= 0x020601F0:
                    self.infile = urllib23.urlopen(file, timeout=5)
                else:
                    self.infile = urllib23.urlopen(file)
            except urllib23.HTTPError:
                print('Error accessing URL %s' % file)
                print(sys.exc_info()[1])
                sys.exit()
        elif file.lower().endswith('.zip'):
            try:
                self.zipfile = zipfile.ZipFile(file, 'r')
                self.infile = self.zipfile.open(self.zipfile.infolist()[0], 'r', C2BIP3('infected'))
            except:
                print('Error opening file %s' % file)
                print(sys.exc_info()[1])
                sys.exit()
        else:
            try:
                self.infile = open(file, 'rb')
            except:
                print('Error opening file %s' % file)
                print(sys.exc_info()[1])
                sys.exit()
        self.ungetted = []
        self.position = -1

    def byte(self):
        if len(self.ungetted) != 0:
            self.position += 1
            return self.ungetted.pop()
        inbyte = self.infile.read(1)
        if not inbyte or inbyte == '':
            self.infile.close()
            return None
        self.position += 1
        return ord(inbyte)

    def unget(self, byte):
        self.position -= 1
        self.ungetted.append(byte)

def CharacterClass(byte):
    if byte == 0 or byte == 9 or byte == 10 or byte == 12 or byte == 13 or byte == 32:
        return CHAR_WHITESPACE
    if byte == 0x28 or byte == 0x29 or byte == 0x3C or byte == 0x3E or byte == 0x5B or byte == 0x5D or byte == 0x7B or byte == 0x7D or byte == 0x2F or byte == 0x25:
        return CHAR_DELIMITER
    return CHAR_REGULAR

def IsNumeric(str):
    return re.match('^[0-9]+', str)

class cPDFTokenizer:
    def __init__(self, file):
        self.oPDF = cPDFDocument(file)
        self.ungetted = []

    def Token(self):
        if len(self.ungetted) != 0:
            return self.ungetted.pop()
        if self.oPDF == None:
            return None
        self.byte = self.oPDF.byte()
        if self.byte == None:
            self.oPDF = None
            return None
        elif CharacterClass(self.byte) == CHAR_WHITESPACE:
            file_str = StringIO()
            while self.byte != None and CharacterClass(self.byte) == CHAR_WHITESPACE:
                file_str.write(chr(self.byte))
                self.byte = self.oPDF.byte()
            if self.byte != None:
                self.oPDF.unget(self.byte)
            else:
                self.oPDF = None
            self.token = file_str.getvalue()
            return (CHAR_WHITESPACE, self.token)
        elif CharacterClass(self.byte) == CHAR_REGULAR:
            file_str = StringIO()
            while self.byte != None and CharacterClass(self.byte) == CHAR_REGULAR:
                file_str.write(chr(self.byte))
                self.byte = self.oPDF.byte()
            if self.byte != None:
                self.oPDF.unget(self.byte)
            else:
                self.oPDF = None
            self.token = file_str.getvalue()
            return (CHAR_REGULAR, self.token)
        else:
            if self.byte == 0x3C:
                self.byte = self.oPDF.byte()
                if self.byte == 0x3C:
                    return (CHAR_DELIMITER, '<<')
                else:
                    self.oPDF.unget(self.byte)
                    return (CHAR_DELIMITER, '<')
            elif self.byte == 0x3E:
                self.byte = self.oPDF.byte()
                if self.byte == 0x3E:
                    return (CHAR_DELIMITER, '>>')
                else:
                    self.oPDF.unget(self.byte)
                    return (CHAR_DELIMITER, '>')
            elif self.byte == 0x25:
                file_str = StringIO()
                while self.byte != None:
                    file_str.write(chr(self.byte))
                    if self.byte == 10 or self.byte == 13:
                        self.byte = self.oPDF.byte()
                        break
                    self.byte = self.oPDF.byte()
                if self.byte != None:
                    if self.byte == 10:
                        file_str.write(chr(self.byte))
                    else:
                        self.oPDF.unget(self.byte)
                else:
                    self.oPDF = None
                self.token = file_str.getvalue()
                return (CHAR_DELIMITER, self.token)
            return (CHAR_DELIMITER, chr(self.byte))

    def TokenIgnoreWhiteSpace(self):
        token = self.Token()
        while token != None and token[0] == CHAR_WHITESPACE:
            token = self.Token()
        return token

    def unget(self, byte):
        self.ungetted.append(byte)

class cPDFParser:
    def __init__(self, file, verbose=False, extract=None):
        self.context = CONTEXT_NONE
        self.content = []
        self.oPDFTokenizer = cPDFTokenizer(file)
        self.verbose = verbose
        self.extract = extract

    def GetObject(self):
        while True:
            if self.context == CONTEXT_OBJ:
                self.token = self.oPDFTokenizer.Token()
            else:
                self.token = self.oPDFTokenizer.TokenIgnoreWhiteSpace()
            if self.token:
                if self.token[0] == CHAR_DELIMITER:
                    if self.token[1][0] == '%':
                        if self.context == CONTEXT_OBJ:
                            self.content.append(self.token)
                        else:
                            return cPDFElementComment(self.token[1])
                    elif self.token[1] == '/':
                        self.token2 = self.oPDFTokenizer.Token()
                        if self.token2[0] == CHAR_REGULAR:
                            if self.context != CONTEXT_NONE:
                                self.content.append((CHAR_DELIMITER, self.token[1] + self.token2[1]))
                            elif self.verbose:
                                print('todo 1: %s' % (self.token[1] + self.token2[1]))
                        else:
                            self.oPDFTokenizer.unget(self.token2)
                            if self.context != CONTEXT_NONE:
                                self.content.append(self.token)
                            elif self.verbose:
                                print('todo 2: %d %s' % (self.token[0], repr(self.token[1])))
                    elif self.context != CONTEXT_NONE:
                        self.content.append(self.token)
                    elif self.verbose:
                        print('todo 3: %d %s' % (self.token[0], repr(self.token[1])))
                elif self.token[0] == CHAR_WHITESPACE:
                    if self.context != CONTEXT_NONE:
                        self.content.append(self.token)
                    elif self.verbose:
                        print('todo 4: %d %s' % (self.token[0], repr(self.token[1])))
                else:
                    if self.context == CONTEXT_OBJ:
                        if self.token[1] == 'endobj':
                            self.oPDFElementIndirectObject = cPDFElementIndirectObject(self.objectId, self.objectVersion, self.content)
                            self.context = CONTEXT_NONE
                            self.content = []
                            return self.oPDFElementIndirectObject
                        else:
                            self.content.append(self.token)
                    elif self.context == CONTEXT_TRAILER:
                        if self.token[1] == 'startxref' or self.token[1] == 'xref':
                            self.oPDFElementTrailer = cPDFElementTrailer(self.content)
                            self.oPDFTokenizer.unget(self.token)
                            self.context = CONTEXT_NONE
                            self.content = []
                            return self.oPDFElementTrailer
                        else:
                            self.content.append(self.token)
                    elif self.context == CONTEXT_XREF:
                        if self.token[1] == 'trailer' or self.token[1] == 'xref':
                            self.oPDFElementXref = cPDFElementXref(self.content)
                            self.oPDFTokenizer.unget(self.token)
                            self.context = CONTEXT_NONE
                            self.content = []
                            return self.oPDFElementXref
                        else:
                            self.content.append(self.token)
                    else:
                        if IsNumeric(self.token[1]):
                            self.token2 = self.oPDFTokenizer.TokenIgnoreWhiteSpace()
                            if IsNumeric(self.token2[1]):
                                self.token3 = self.oPDFTokenizer.TokenIgnoreWhiteSpace()
                                if self.token3[1] == 'obj':
                                    self.objectId = eval(self.token[1])
                                    self.objectVersion = eval(self.token2[1])
                                    self.context = CONTEXT_OBJ
                                else:
                                    self.oPDFTokenizer.unget(self.token3)
                                    self.oPDFTokenizer.unget(self.token2)
                                    if self.verbose:
                                        print('todo 6: %d %s' % (self.token[0], repr(self.token[1])))
                            else:
                                self.oPDFTokenizer.unget(self.token2)
                                if self.verbose:
                                    print('todo 7: %d %s' % (self.token[0], repr(self.token[1])))
                        elif self.token[1] == 'trailer':
                            self.context = CONTEXT_TRAILER
                            self.content = [self.token]
                        elif self.token[1] == 'xref':
                            self.context = CONTEXT_XREF
                            self.content = [self.token]
                        elif self.token[1] == 'startxref':
                            self.token2 = self.oPDFTokenizer.TokenIgnoreWhiteSpace()
                            if self.token2 and IsNumeric(self.token2[1]):
                                return cPDFElementStartxref(eval(self.token2[1]))
                            else:
                                self.oPDFTokenizer.unget(self.token2)
                                if self.verbose:
                                    print('todo 9: %d %s' % (self.token[0], repr(self.token[1])))
                        elif self.extract:
                            self.bytes = ''
                            while self.token:
                                self.bytes += self.token[1]
                                self.token = self.oPDFTokenizer.Token()
                            return cPDFElementMalformed(self.bytes)
                        elif self.verbose:
                            print('todo 10: %d %s' % (self.token[0], repr(self.token[1])))
            else:
                break

class cPDFElementComment:
    def __init__(self, comment):
        self.type = PDF_ELEMENT_COMMENT
        self.comment = comment
#                        if re.match('^%PDF-[0-9]\.[0-9]', self.token[1]):
#                            print(repr(self.token[1]))
#                        elif re.match('^%%EOF', self.token[1]):
#                            print(repr(self.token[1]))

class cPDFElementXref:
    def __init__(self, content):
        self.type = PDF_ELEMENT_XREF
        self.content = content

class cPDFElementTrailer:
    def __init__(self, content):
        self.type = PDF_ELEMENT_TRAILER
        self.content = content

    def Contains(self, keyword):
        data = ''
        for i in range(0, len(self.content)):
            if self.content[i][1] == 'stream':
                break
            else:
                data += Canonicalize(self.content[i][1])
        return data.upper().find(keyword.upper()) != -1

def IIf(expr, truepart, falsepart):
    if expr:
        return truepart
    else:
        return falsepart

class cPDFElementIndirectObject:
    def __init__(self, id, version, content):
        self.type = PDF_ELEMENT_INDIRECT_OBJECT
        self.id = id
        self.version = version
        self.content = content
        #fix stream for Ghostscript bug reported by Kurt
        if self.ContainsStream():
            position = len(self.content) - 1
            if position < 0:
                return
            while self.content[position][0] == CHAR_WHITESPACE and position >= 0:
                position -= 1
            if position < 0:
                return
            if self.content[position][0] != CHAR_REGULAR:
                return
            if self.content[position][1] == 'endstream':
                return
            if not self.content[position][1].endswith('endstream'):
                return
            self.content = self.content[0:position] + [(self.content[position][0], self.content[position][1][:-len('endstream')])] + [(self.content[position][0], 'endstream')] + self.content[position+1:]

    def GetType(self):
        content = CopyWithoutWhiteSpace(self.content)
        dictionary = 0
        for i in range(0, len(content)):
            if content[i][0] == CHAR_DELIMITER and content[i][1] == '<<':
                dictionary += 1
            if content[i][0] == CHAR_DELIMITER and content[i][1] == '>>':
                dictionary -= 1
            if dictionary == 1 and content[i][0] == CHAR_DELIMITER and EqualCanonical(content[i][1], '/Type') and i < len(content) - 1:
                return content[i+1][1]
        return ''

    def GetReferences(self):
        content = CopyWithoutWhiteSpace(self.content)
        references = []
        for i in range(0, len(content)):
            if i > 1 and content[i][0] == CHAR_REGULAR and content[i][1] == 'R' and content[i-2][0] == CHAR_REGULAR and IsNumeric(content[i-2][1]) and content[i-1][0] == CHAR_REGULAR and IsNumeric(content[i-1][1]):
                references.append((content[i-2][1], content[i-1][1], content[i][1]))
        return references

    def References(self, index):
        for ref in self.GetReferences():
            if ref[0] == index:
                return True
        return False

    def ContainsStream(self):
        for i in range(0, len(self.content)):
            if self.content[i][0] == CHAR_REGULAR and self.content[i][1] == 'stream':
                return self.content[0:i]
        return False

    def Contains(self, keyword):
        data = ''
        for i in range(0, len(self.content)):
            if self.content[i][1] == 'stream':
                break
            else:
                data += Canonicalize(self.content[i][1])
        return data.upper().find(keyword.upper()) != -1

    def StreamContains(self, keyword, filter, casesensitive, regex):
        if not self.ContainsStream():
            return False
        streamData = self.Stream(filter)
        if filter and streamData == 'No filters':
            streamData = self.Stream(False)
        if regex:
            return re.search(keyword, streamData, IIf(casesensitive, 0, re.I))
        elif casesensitive:
            return keyword in streamData
        else:
            return keyword.lower() in streamData.lower()

    def Stream(self, filter=True):
        state = 'start'
        countDirectories = 0
        data = ''
        filters = []
        for i in range(0, len(self.content)):
            if state == 'start':
                if self.content[i][0] == CHAR_DELIMITER and self.content[i][1] == '<<':
                    countDirectories += 1
                if self.content[i][0] == CHAR_DELIMITER and self.content[i][1] == '>>':
                    countDirectories -= 1
                if countDirectories == 1 and self.content[i][0] == CHAR_DELIMITER and EqualCanonical(self.content[i][1], '/Filter'):
                    state = 'filter'
                elif countDirectories == 0 and self.content[i][0] == CHAR_REGULAR and self.content[i][1] == 'stream':
                    state = 'stream-whitespace'
            elif state == 'filter':
                if self.content[i][0] == CHAR_DELIMITER and self.content[i][1][0] == '/':
                    filters = [self.content[i][1]]
                    state = 'search-stream'
                elif self.content[i][0] == CHAR_DELIMITER and self.content[i][1] == '[':
                    state = 'filter-list'
            elif state == 'filter-list':
                if self.content[i][0] == CHAR_DELIMITER and self.content[i][1][0] == '/':
                    filters.append(self.content[i][1])
                elif self.content[i][0] == CHAR_DELIMITER and self.content[i][1] == ']':
                    state = 'search-stream'
            elif state == 'search-stream':
                if self.content[i][0] == CHAR_REGULAR and self.content[i][1] == 'stream':
                    state = 'stream-whitespace'
            elif state == 'stream-whitespace':
                if self.content[i][0] == CHAR_WHITESPACE:
                    whitespace = self.content[i][1]
                    if whitespace.startswith('\x0D\x0A') and len(whitespace) > 2:
                        data += whitespace[2:]
                    elif whitespace.startswith('\x0A') and len(whitespace) > 1:
                        data += whitespace[1:]
                else:
                    data += self.content[i][1]
                state = 'stream-concat'
            elif state == 'stream-concat':
                if self.content[i][0] == CHAR_REGULAR and self.content[i][1] == 'endstream':
                    if filter:
                        return self.Decompress(data, filters)
                    else:
                        return data
                else:
                    data += self.content[i][1]
            else:
                return 'Unexpected filter state'
        return filters

    def Decompress(self, data, filters):
        for filter in filters:
            if EqualCanonical(filter, '/FlateDecode') or EqualCanonical(filter, '/Fl'):
                try:
                    data = FlateDecode(data)
                except zlib.error as e:
                    message = 'FlateDecode decompress failed'
                    if len(data) > 0 and ord(data[0]) & 0x0F != 8:
                        message += ', unexpected compression method: %02x' % ord(data[0])
                    return message + '. zlib.error %s' % e.message
            elif EqualCanonical(filter, '/ASCIIHexDecode') or EqualCanonical(filter, '/AHx'):
                try:
                    data = ASCIIHexDecode(data)
                except:
                    return 'ASCIIHexDecode decompress failed'
            elif EqualCanonical(filter, '/ASCII85Decode') or EqualCanonical(filter, '/A85'):
                try:
                    data = ASCII85Decode(data.rstrip('>'))
                except:
                    return 'ASCII85Decode decompress failed'
            elif EqualCanonical(filter, '/LZWDecode') or EqualCanonical(filter, '/LZW'):
                try:
                    data = LZWDecode(data)
                except:
                    return 'LZWDecode decompress failed'
            elif EqualCanonical(filter, '/RunLengthDecode') or EqualCanonical(filter, '/R'):
                try:
                    data = RunLengthDecode(data)
                except:
                    return 'RunLengthDecode decompress failed'
#            elif i.startswith('/CC')                        # CCITTFaxDecode
#            elif i.startswith('/DCT')                       # DCTDecode
            else:
                return 'Unsupported filter: %s' % repr(filters)
        if len(filters) == 0:
            return 'No filters'
        else:
            return data

    def StreamYARAMatch(self, rules, decoders, decoderoptions, filter):
        if not self.ContainsStream():
            return None
        streamData = self.Stream(filter)
        if filter and streamData == 'No filters':
            streamData = self.Stream(False)

        oDecoders = [cIdentity(streamData, None)]
        for cDecoder in decoders:
            try:
                oDecoder = cDecoder(streamData, decoderoptions)
                oDecoders.append(oDecoder)
            except Exception as e:
                print('Error instantiating decoder: %s' % cDecoder.name)
                raise e
        results = []
        for oDecoder in oDecoders:
            while oDecoder.Available():
                yaraResults = rules.match(data=oDecoder.Decode())
                if yaraResults != []:
                    results.append([oDecoder.Name(), yaraResults])

        return results

class cPDFElementStartxref:
    def __init__(self, index):
        self.type = PDF_ELEMENT_STARTXREF
        self.index = index

class cPDFElementMalformed:
    def __init__(self, content):
        self.type = PDF_ELEMENT_MALFORMED
        self.content = content

def TrimLWhiteSpace(data):
    while data != [] and data[0][0] == CHAR_WHITESPACE:
        data = data[1:]
    return data

def TrimRWhiteSpace(data):
    while data != [] and data[-1][0] == CHAR_WHITESPACE:
        data = data[:-1]
    return data

class cPDFParseDictionary:
    def __init__(self, content, nocanonicalizedoutput):
        self.content = content
        self.nocanonicalizedoutput = nocanonicalizedoutput
        dataTrimmed = TrimLWhiteSpace(TrimRWhiteSpace(self.content))
        if dataTrimmed == []:
            self.parsed = None
        elif self.isOpenDictionary(dataTrimmed[0]) and self.isCloseDictionary(dataTrimmed[-1]):
            self.parsed = self.ParseDictionary(dataTrimmed)[0]
        else:
            self.parsed = None

    def isOpenDictionary(self, token):
        return token[0] == CHAR_DELIMITER and token[1] == '<<'

    def isCloseDictionary(self, token):
        return token[0] == CHAR_DELIMITER and token[1] == '>>'

    def ParseDictionary(self, tokens):
        state = 0 # start
        dictionary = []
        while tokens != []:
            if state == 0:
                if self.isOpenDictionary(tokens[0]):
                    state = 1
                else:
                    return None, tokens
            elif state == 1:
                if self.isOpenDictionary(tokens[0]):
                    pass
                elif self.isCloseDictionary(tokens[0]):
                    return dictionary, tokens
                elif tokens[0][0] != CHAR_WHITESPACE:
                    key = ConditionalCanonicalize(tokens[0][1], self.nocanonicalizedoutput)
                    value = []
                    state = 2
            elif state == 2:
                if self.isOpenDictionary(tokens[0]):
                    value, tokens = self.ParseDictionary(tokens)
                    dictionary.append((key, value))
                    state = 1
                elif self.isCloseDictionary(tokens[0]):
                    dictionary.append((key, value))
                    return dictionary, tokens
                elif value == [] and tokens[0][0] == CHAR_WHITESPACE:
                    pass
                elif value == [] and tokens[0][1] == '[':
                    value.append(tokens[0][1])
                elif value != [] and value[0] == '[' and tokens[0][1] != ']':
                    value.append(tokens[0][1])
                elif value != [] and value[0] == '[' and tokens[0][1] == ']':
                    value.append(tokens[0][1])
                    dictionary.append((key, value))
                    value = []
                    state = 1
                elif value != [] and tokens[0][1][0] == '/':
                    dictionary.append((key, value))
                    key = ConditionalCanonicalize(tokens[0][1], self.nocanonicalizedoutput)
                    value = []
                    state = 2
                else:
                    value.append(ConditionalCanonicalize(tokens[0][1], self.nocanonicalizedoutput))
            tokens = tokens[1:]

    def Retrieve(self):
        return self.parsed

    def PrettyPrintSub(self, prefix, dictionary):
        if dictionary != None:
            print('%s<<' % prefix)
            for e in dictionary:
                if e[1] == []:
                    print('%s  %s' % (prefix, e[0]))
                elif type(e[1][0]) == type(''):
                    if len(e[1]) == 3 and IsNumeric(e[1][0]) and e[1][1] == '0' and e[1][2] == 'R':
                        joiner = ' '
                    else:
                        joiner = ''
                    value = joiner.join(e[1]).strip()
                    reprValue = repr(value)
                    if "'" + value + "'" != reprValue:
                        value = reprValue
                    print('%s  %s %s' % (prefix, e[0], value))
                else:
                    print('%s  %s' % (prefix, e[0]))
                    self.PrettyPrintSub(prefix + '    ', e[1])
            print('%s>>' % prefix)

    def PrettyPrint(self, prefix):
        self.PrettyPrintSub(prefix, self.parsed)

    def Get(self, select):
        for key, value in self.parsed:
            if key == select:
                return value
        return None

def FormatOutput(data, raw):
    if raw:
        if type(data) == type([]):
            return ''.join(map(lambda x: x[1], data))
        else:
            return data
    else:
        return repr(data)

#Fix for http://bugs.python.org/issue11395
def StdoutWriteChunked(data):
    if sys.version_info[0] > 2:
        sys.stdout.buffer.write(data)
    else:
        while data != '':
            sys.stdout.write(data[0:10000])
            try:
                sys.stdout.flush()
            except IOError:
                return
            data = data[10000:]

def IfWIN32SetBinary(io):
    if sys.platform == 'win32':
        import msvcrt
        msvcrt.setmode(io.fileno(), os.O_BINARY)

def PrintOutputObject(object, options):
    if options.dump == '-':
        filtered = object.Stream(options.filter == True)
        if filtered == []:
            filtered = ''
        IfWIN32SetBinary(sys.stdout)
        StdoutWriteChunked(filtered)
        return

    print('obj %d %d' % (object.id, object.version))
    print(' Type: %s' % ConditionalCanonicalize(object.GetType(), options.nocanonicalizedoutput))
    print(' Referencing: %s' % ', '.join(map(lambda x: '%s %s %s' % x, object.GetReferences())))
    dataPrecedingStream = object.ContainsStream()
    oPDFParseDictionary = None
    if dataPrecedingStream:
        print(' Contains stream')
        if options.debug:
            print(' %s' % FormatOutput(dataPrecedingStream, options.raw))
        oPDFParseDictionary = cPDFParseDictionary(dataPrecedingStream, options.nocanonicalizedoutput)
        if options.hash:
            streamContent = object.Stream(False)
            print('  unfiltered')
            print('   len: %6d md5: %s' % (len(streamContent), hashlib.md5(streamContent).hexdigest()))
            print('   %s' % HexAsciiDumpLine(streamContent))
            streamContent = object.Stream(True)
            print('  filtered')
            print('   len: %6d md5: %s' % (len(streamContent), hashlib.md5(streamContent).hexdigest()))
            print('   %s' % HexAsciiDumpLine(streamContent))
            streamContent = None
    else:
        if options.debug or options.raw:
            print(' %s' % FormatOutput(object.content, options.raw))
        oPDFParseDictionary = cPDFParseDictionary(object.content, options.nocanonicalizedoutput)
    print('')
    oPDFParseDictionary.PrettyPrint('  ')
    print('')
    if options.filter and not options.dump:
        filtered = object.Stream()
        if filtered == []:
            print(' %s' % FormatOutput(object.content, options.raw))
        else:
            print(' %s' % FormatOutput(filtered, options.raw))
    if options.content:
        if object.ContainsStream():
            stream = object.Stream(False)
            if stream != []:
                print(' %s' % FormatOutput(stream, options.raw))
        else:
            print(''.join([token[1] for token in object.content]))


    if options.dump:
        filtered = object.Stream(options.filter == True)
        if filtered == []:
            filtered = ''
        try:
            fDump = open(options.dump, 'wb')
            try:
                fDump.write(C2BIP3(filtered))
            except:
                print('Error writing file %s' % options.dump)
            fDump.close()
        except:
            print('Error writing file %s' % options.dump)
    print('')
    return

def Canonicalize(sIn):
    if sIn == '':
        return sIn
    elif sIn[0] != '/':
        return sIn
    elif sIn.find('#') == -1:
        return sIn
    else:
        i = 0
        iLen = len(sIn)
        sCanonical = ''
        while i < iLen:
            if sIn[i] == '#' and i < iLen - 2:
                try:
                    sCanonical += chr(int(sIn[i+1:i+3], 16))
                    i += 2
                except:
                    sCanonical += sIn[i]
            else:
                sCanonical += sIn[i]
            i += 1
        return sCanonical

def EqualCanonical(s1, s2):
    return Canonicalize(s1) == s2

def ConditionalCanonicalize(sIn, nocanonicalizedoutput):
    if nocanonicalizedoutput:
        return sIn
    else:
        return Canonicalize(sIn)

# http://code.google.com/p/pdfminerr/source/browse/trunk/pdfminer/pdfminer/ascii85.py
def ASCII85Decode(data):
  import struct
  n = b = 0
  out = ''
  for c in data:
    if '!' <= c and c <= 'u':
      n += 1
      b = b*85+(ord(c)-33)
      if n == 5:
        out += struct.pack('>L',b)
        n = b = 0
    elif c == 'z':
      assert n == 0
      out += '\0\0\0\0'
    elif c == '~':
      if n:
        for _ in range(5-n):
          b = b*85+84
        out += struct.pack('>L',b)[:n-1]
      break
  return out

def ASCIIHexDecode(data):
    return binascii.unhexlify(''.join([c for c in data if c not in ' \t\n\r']).rstrip('>'))

def FlateDecode(data):
    return zlib.decompress(C2BIP3(data))

def RunLengthDecode(data):
    f = StringIO(data)
    decompressed = ''
    runLength = ord(f.read(1))
    while runLength:
        if runLength < 128:
            decompressed += f.read(runLength + 1)
        if runLength > 128:
            decompressed += f.read(1) * (257 - runLength)
        if runLength == 128:
            break
        runLength = ord(f.read(1))
#    return sub(r'(\d+)(\D)', lambda m: m.group(2) * int(m.group(1)), data)
    return decompressed

#### LZW code sourced from pdfminer
# Copyright (c) 2004-2009 Yusuke Shinyama 
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software,
# and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

class LZWDecoder(object):
    def __init__(self, fp):
        self.fp = fp
        self.buff = 0
        self.bpos = 8
        self.nbits = 9
        self.table = None
        self.prevbuf = None
        return

    def readbits(self, bits):
        v = 0
        while 1:
            # the number of remaining bits we can get from the current buffer.
            r = 8-self.bpos
            if bits <= r:
                # |-----8-bits-----|
                # |-bpos-|-bits-|  |
                # |      |----r----|
                v = (v<>(r-bits)) & ((1<>\s*$', '', dictionary)
                dictionary = dictionary.strip()
                print("    oPDF.stream2(%d, %d, %s, %s, 'f')" % (objectId, object.version, repr(decompressed.rstrip()), repr(dictionary)))
        else:
            print('    oPDF.stream(%d, %d, %s, %s)' % (objectId, object.version, repr(object.Stream(False).rstrip()), repr(re.sub('/Length\s+\d+', '/Length %d', FormatOutput(dataPrecedingStream, True)).strip())))
    else:
        print('    oPDF.indirectobject(%d, %d, %s)' % (objectId, object.version, repr(FormatOutput(object.content, True).strip())))

def PrintObject(object, options):
    if options.generate:
        PrintGenerateObject(object, options)
    else:
        PrintOutputObject(object, options)

def File2Strings(filename):
    try:
        f = open(filename, 'r')
    except:
        return None
    try:
        return map(lambda line:line.rstrip('\n'), f.readlines())
    except:
        return None
    finally:
        f.close()

def ProcessAt(argument):
    if argument.startswith('@'):
        strings = File2Strings(argument[1:])
        if strings == None:
            raise Exception('Error reading %s' % argument)
        else:
            return strings
    else:
        return [argument]

def YARACompile(fileordirname):
    dFilepaths = {}
    if os.path.isdir(fileordirname):
        for root, dirs, files in os.walk(fileordirname):
            for file in files:
                filename = os.path.join(root, file)
                dFilepaths[filename] = filename
    else:
        for filename in ProcessAt(fileordirname):
            dFilepaths[filename] = filename
    return yara.compile(filepaths=dFilepaths)

def AddDecoder(cClass):
    global decoders

    decoders.append(cClass)

class cDecoderParent():
    pass

def LoadDecoders(decoders, verbose):
    if decoders == '':
        return
    scriptPath = os.path.dirname(sys.argv[0])
    for decoder in sum(map(ProcessAt, decoders.split(',')), []):
        try:
            if not decoder.lower().endswith('.py'):
                decoder += '.py'
            if os.path.dirname(decoder) == '':
                if not os.path.exists(decoder):
                    scriptDecoder = os.path.join(scriptPath, decoder)
                    if os.path.exists(scriptDecoder):
                        decoder = scriptDecoder
            exec(open(decoder, 'r').read(), globals(), globals())
        except Exception as e:
            print('Error loading decoder: %s' % decoder)
            if verbose:
                raise e

class cIdentity(cDecoderParent):
    name = 'Identity function decoder'

    def __init__(self, stream, options):
        self.stream = stream
        self.options = options
        self.available = True

    def Available(self):
        return self.available

    def Decode(self):
        self.available = False
        return self.stream

    def Name(self):
        return ''

def DecodeFunction(decoders, options, stream):
    if decoders == []:
        return stream
    return decoders[0](stream, options.decoderoptions).Decode()

class cDumpStream():
    def __init__(self):
        self.text = ''

    def Addline(self, line):
        if line != '':
            self.text += line + '\n'

    def Content(self):
        return self.text

def HexDump(data):
    oDumpStream = cDumpStream()
    hexDump = ''
    for i, b in enumerate(data):
        if i % dumplinelength == 0 and hexDump != '':
            oDumpStream.Addline(hexDump)
            hexDump = ''
        hexDump += IFF(hexDump == '', '', ' ') + '%02X' % ord(b)
    oDumpStream.Addline(hexDump)
    return oDumpStream.Content()

def CombineHexAscii(hexDump, asciiDump):
    if hexDump == '':
        return ''
    return hexDump + '  ' + (' ' * (3 * (dumplinelength - len(asciiDump)))) + asciiDump

def HexAsciiDump(data):
    oDumpStream = cDumpStream()
    hexDump = ''
    asciiDump = ''
    for i, b in enumerate(data):
        if i % dumplinelength == 0:
            if hexDump != '':
                oDumpStream.Addline(CombineHexAscii(hexDump, asciiDump))
            hexDump = '%08X:' % i
            asciiDump = ''
        hexDump+= ' %02X' % ord(b)
        asciiDump += IFF(ord(b) >= 32, b, '.')
    oDumpStream.Addline(CombineHexAscii(hexDump, asciiDump))
    return oDumpStream.Content()

def HexAsciiDumpLine(data):
    return HexAsciiDump(data[0:16])[10:-1]
    
def Main():
    """pdf-parser, use it to parse a PDF document
    """

    global decoders

    oParser = optparse.OptionParser(usage='usage: %prog [options] pdf-file|zip-file|url\n' + __description__, version='%prog ' + __version__)
    oParser.add_option('-s', '--search', help='string to search in indirect objects (except streams)')
    oParser.add_option('-f', '--filter', action='store_true', default=False, help='pass stream object through filters (FlateDecode, ASCIIHexDecode, ASCII85Decode, LZWDecode and RunLengthDecode only)')
    oParser.add_option('-o', '--object', help='id of indirect object to select (version independent)')
    oParser.add_option('-r', '--reference', help='id of indirect object being referenced (version independent)')
    oParser.add_option('-e', '--elements', help='type of elements to select (cxtsi)')
    oParser.add_option('-w', '--raw', action='store_true', default=False, help='raw output for data and filters')
    oParser.add_option('-a', '--stats', action='store_true', default=False, help='display stats for pdf document')
    oParser.add_option('-t', '--type', help='type of indirect object to select')
    oParser.add_option('-v', '--verbose', action='store_true', default=False, help='display malformed PDF elements')
    oParser.add_option('-x', '--extract', help='filename to extract malformed content to')
    oParser.add_option('-H', '--hash', action='store_true', default=False, help='display hash of objects')
    oParser.add_option('-n', '--nocanonicalizedoutput', action='store_true', default=False, help='do not canonicalize the output')
    oParser.add_option('-d', '--dump', help='filename to dump stream content to')
    oParser.add_option('-D', '--debug', action='store_true', default=False, help='display debug info')
    oParser.add_option('-c', '--content', action='store_true', default=False, help='display the content for objects without streams or with streams without filters')
    oParser.add_option('--searchstream', help='string to search in streams')
    oParser.add_option('--unfiltered', action='store_true', default=False, help='search in unfiltered streams')
    oParser.add_option('--casesensitive', action='store_true', default=False, help='case sensitive search in streams')
    oParser.add_option('--regex', action='store_true', default=False, help='use regex to search in streams')
    oParser.add_option('-g', '--generate', action='store_true', default=False, help='generate a Python program that creates the parsed PDF file')
    oParser.add_option('--generateembedded', type=int, default=0, help='generate a Python program that embeds the selected indirect object as a file')
    oParser.add_option('-y', '--yara', help='YARA rule (or directory or @file) to check streams (can be used with option --unfiltered)')
    oParser.add_option('--yarastrings', action='store_true', default=False, help='Print YARA strings')
    oParser.add_option('--decoders', type=str, default='', help='decoders to load (separate decoders with a comma , ; @file supported)')
    oParser.add_option('--decoderoptions', type=str, default='', help='options for the decoder')
    (options, args) = oParser.parse_args()

    if len(args) != 1:
        oParser.print_help()
        print('')
        print('  %s' % __description__)
        print('  Source code put in the public domain by Didier Stevens, no Copyright')
        print('  Use at your own risk')
        print('  https://DidierStevens.com')

    else:
        decoders = []
        LoadDecoders(options.decoders, True)

        oPDFParser = cPDFParser(args[0], options.verbose, options.extract)
        cntComment = 0
        cntXref = 0
        cntTrailer = 0
        cntStartXref = 0
        cntIndirectObject = 0
        dicObjectTypes = {}

        selectComment = False
        selectXref = False
        selectTrailer = False
        selectStartXref = False
        selectIndirectObject = False
        if options.elements:
            for c in options.elements:
                if c == 'c':
                    selectComment = True
                elif c == 'x':
                    selectXref = True
                elif c == 't':
                    selectTrailer = True
                elif c == 's':
                    selectStartXref = True
                elif c == 'i':
                    selectIndirectObject = True
                else:
                    print('Error: unknown --elements value %s' % c)
                    return
        else:
            selectIndirectObject = True
            if not options.search and not options.object and not options.reference and not options.type and not options.searchstream:
                selectComment = True
                selectXref = True
                selectTrailer = True
                selectStartXref = True
            if options.search:
                selectTrailer = True

        if options.type == '-':
            optionsType = ''
        else:
            optionsType = options.type

        if options.generate or options.generateembedded != 0:
            savedRoot = ['1', '0', 'R']
            print('#!/usr/bin/python')
            print('')
            print('"""')
            print('')
            print('Program generated by pdf-parser.py by Didier Stevens')
            print('https://DidierStevens.com')
            print('Use at your own risk')
            print('')
            print('Input PDF file: %s' % args[0])
            print('This Python program was created on: %s' % Timestamp())
            print('')
            print('"""')
            print('')
            print('import mPDF')
            print('import sys')
            print('')
            print('def Main():')
            print('    if len(sys.argv) != 2:')
            print("        print('Usage: %s pdf-file' % sys.argv[0])")
            print('        return')
            print('    oPDF = mPDF.cPDF(sys.argv[1])')

        if options.generateembedded != 0:
            print("    oPDF.header('1.1')")
            print(r"    oPDF.comment('\xd0\xd0\xd0\xd0')")
            print(r"    oPDF.indirectobject(1, 0, '<<\r\n /Type /Catalog\r\n /Outlines 2 0 R\r\n /Pages 3 0 R\r\n /Names << /EmbeddedFiles << /Names [(test.bin) 7 0 R] >> >>\r\n>>')")
            print(r"    oPDF.indirectobject(2, 0, '<<\r\n /Type /Outlines\r\n /Count 0\r\n>>')")
            print(r"    oPDF.indirectobject(3, 0, '<<\r\n /Type /Pages\r\n /Kids [4 0 R]\r\n /Count 1\r\n>>')")
            print(r"    oPDF.indirectobject(4, 0, '<<\r\n /Type /Page\r\n /Parent 3 0 R\r\n /MediaBox [0 0 612 792]\r\n /Contents 5 0 R\r\n /Resources <<\r\n             /ProcSet [/PDF /Text]\r\n             /Font << /F1 6 0 R >>\r\n            >>\r\n>>')")
            print(r"    oPDF.stream(5, 0, 'BT /F1 12 Tf 70 700 Td 15 TL (This PDF document embeds file test.bin) Tj ET', '<< /Length %d >>')")
            print(r"    oPDF.indirectobject(6, 0, '<<\r\n /Type /Font\r\n /Subtype /Type1\r\n /Name /F1\r\n /BaseFont /Helvetica\r\n /Encoding /MacRomanEncoding\r\n>>')")
            print(r"    oPDF.indirectobject(7, 0, '<<\r\n /Type /Filespec\r\n /F (test.bin)\r\n /EF << /F 8 0 R >>\r\n>>')")

        if options.yara != None:
            if not 'yara' in sys.modules:
                print('Error: option yara requires the YARA Python module.')
                return
            rules = YARACompile(options.yara)

        while True:
            object = oPDFParser.GetObject()
            if object != None:
                if options.stats:
                    if object.type == PDF_ELEMENT_COMMENT:
                        cntComment += 1
                    elif object.type == PDF_ELEMENT_XREF:
                        cntXref += 1
                    elif object.type == PDF_ELEMENT_TRAILER:
                        cntTrailer += 1
                    elif object.type == PDF_ELEMENT_STARTXREF:
                        cntStartXref += 1
                    elif object.type == PDF_ELEMENT_INDIRECT_OBJECT:
                        cntIndirectObject += 1
                        type1 = object.GetType()
                        if not type1 in dicObjectTypes:
                            dicObjectTypes[type1] = [object.id]
                        else:
                            dicObjectTypes[type1].append(object.id)
                else:
                    if object.type == PDF_ELEMENT_COMMENT and selectComment:
                        if options.generate:
                            comment = object.comment[1:].rstrip()
                            if re.match('PDF-\d\.\d', comment):
                                print("    oPDF.header('%s')" % comment[4:])
                            elif comment != '%EOF':
                                print('    oPDF.comment(%s)' % repr(comment))
                        elif options.yara == None and options.generateembedded == 0:
                            print('PDF Comment %s' % FormatOutput(object.comment, options.raw))
                            print('')
                    elif object.type == PDF_ELEMENT_XREF and selectXref:
                        if not options.generate and options.yara == None and options.generateembedded == 0:
                            if options.debug:
                                print('xref %s' % FormatOutput(object.content, options.raw))
                            else:
                                print('xref')
                            print('')
                    elif object.type == PDF_ELEMENT_TRAILER and selectTrailer:
                        oPDFParseDictionary = cPDFParseDictionary(object.content[1:], options.nocanonicalizedoutput)
                        if options.generate:
                            result = oPDFParseDictionary.Get('/Root')
                            if result != None:
                                savedRoot = result
                        elif options.yara == None and options.generateembedded == 0:
                            if not options.search or options.search and object.Contains(options.search):
                                if oPDFParseDictionary == None:
                                    print('trailer %s' % FormatOutput(object.content, options.raw))
                                else:
                                    print('trailer')
                                    oPDFParseDictionary.PrettyPrint('  ')
                                print('')
                    elif object.type == PDF_ELEMENT_STARTXREF and selectStartXref:
                        if not options.generate and options.yara == None and options.generateembedded == 0:
                            print('startxref %d' % object.index)
                            print('')
                    elif object.type == PDF_ELEMENT_INDIRECT_OBJECT and selectIndirectObject:
                        if options.search:
                            if object.Contains(options.search):
                                PrintObject(object, options)
                        elif options.object:
                            if object.id == eval(options.object):
                                PrintObject(object, options)
                        elif options.reference:
                            if object.References(options.reference):
                                PrintObject(object, options)
                        elif options.type:
                            if EqualCanonical(object.GetType(), optionsType):
                                PrintObject(object, options)
                        elif options.hash:
                            print('obj %d %d' % (object.id, object.version))
                            rawContent = FormatOutput(object.content, True)
                            print(' len: %d md5: %s' % (len(rawContent), hashlib.md5(rawContent).hexdigest()))
                            print('')
                        elif options.searchstream:
                            if object.StreamContains(options.searchstream, not options.unfiltered, options.casesensitive, options.regex):
                                PrintObject(object, options)
                        elif options.yara != None:
                            results = object.StreamYARAMatch(rules, decoders, options.decoderoptions, not options.unfiltered)
                            if results != None and results != []:
                                for result in results:
                                    for yaraResult in result[1]:
                                        print('YARA rule%s: %s (%s)' % (IFF(result[0] == '', '', ' (stream decoder: %s)' % result[0]), yaraResult.rule, yaraResult.namespace))
                                        if options.yarastrings:
                                            for stringdata in yaraResult.strings:
                                                print('%06x %s:' % (stringdata[0], stringdata[1]))
                                                print(' %s' % binascii.hexlify(C2BIP3(stringdata[2])))
                                                print(' %s' % repr(stringdata[2]))
                                    PrintObject(object, options)
                        elif options.generateembedded != 0:
                            if object.id == options.generateembedded:
                                PrintGenerateObject(object, options, 8)
                        else:
                            PrintObject(object, options)
                    elif object.type == PDF_ELEMENT_MALFORMED:
                        try:
                            fExtract = open(options.extract, 'wb')
                            try:
                                fExtract.write(C2BIP3(object.content))
                            except:
                                print('Error writing file %s' % options.extract)
                            fExtract.close()
                        except:
                            print('Error writing file %s' % options.extract)
            else:
                break

        if options.stats:
            print('Comment: %s' % cntComment)
            print('XREF: %s' % cntXref)
            print('Trailer: %s' % cntTrailer)
            print('StartXref: %s' % cntStartXref)
            print('Indirect object: %s' % cntIndirectObject)
            names = dicObjectTypes.keys()
            names.sort()
            for key in names:
                print(' %s %d: %s' % (key, len(dicObjectTypes[key]), ', '.join(map(lambda x: '%d' % x, dicObjectTypes[key]))))

        if options.generate or options.generateembedded != 0:
            print("    oPDF.xrefAndTrailer('%s')" % ' '.join(savedRoot))
            print('')
            print("if __name__ == '__main__':")
            print('    Main()')


def TestPythonVersion(enforceMaximumVersion=False, enforceMinimumVersion=False):
    if sys.version_info[0:3] > __maximum_python_version__:
        if enforceMaximumVersion:
            print('This program does not work with this version of Python (%d.%d.%d)' % sys.version_info[0:3])
            print('Please use Python version %d.%d.%d' % __maximum_python_version__)
            sys.exit()
        else:
            print('This program has not been tested with this version of Python (%d.%d.%d)' % sys.version_info[0:3])
            print('Should you encounter problems, please use Python version %d.%d.%d' % __maximum_python_version__)
    if sys.version_info[0:3] < __minimum_python_version__:
        if enforceMinimumVersion:
            print('This program does not work with this version of Python (%d.%d.%d)' % sys.version_info[0:3])
            print('Please use Python version %d.%d.%d' % __maximum_python_version__)
            sys.exit()
        else:
            print('This program has not been tested with this version of Python (%d.%d.%d)' % sys.version_info[0:3])
            print('Should you encounter problems, please use Python version %d.%d.%d' % __maximum_python_version__)

if __name__ == '__main__':
    TestPythonVersion()
    Main()

用法命令

Usage: pdf-parser.py [options] pdf-file|zip-file|url
pdf-parser, use it to parse a PDF document

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -s SEARCH, --search=SEARCH
                        string to search in indirect objects (except streams)
  -f, --filter          pass stream object through filters (FlateDecode,
                        ASCIIHexDecode, ASCII85Decode, LZWDecode and
                        RunLengthDecode only)
  -o OBJECT, --object=OBJECT
                        id of indirect object to select (version independent)
  -r REFERENCE, --reference=REFERENCE
                        id of indirect object being referenced (version
                        independent)
  -e ELEMENTS, --elements=ELEMENTS
                        type of elements to select (cxtsi)
  -w, --raw             raw output for data and filters
  -a, --stats           display stats for pdf document
  -t TYPE, --type=TYPE  type of indirect object to select
  -v, --verbose         display malformed PDF elements
  -x EXTRACT, --extract=EXTRACT
                        filename to extract malformed content to
  -H, --hash            display hash of objects
  -n, --nocanonicalizedoutput
                        do not canonicalize the output
  -d DUMP, --dump=DUMP  filename to dump stream content to
  -D, --debug           display debug info
  -c, --content         display the content for objects without streams or
                        with streams without filters
  --searchstream=SEARCHSTREAM
                        string to search in streams
  --unfiltered          search in unfiltered streams
  --casesensitive       case sensitive search in streams
  --regex               use regex to search in streams
  -g, --generate        generate a Python program that creates the parsed PDF
                        file
  --generateembedded=GENERATEEMBEDDED
                        generate a Python program that embeds the selected
                        indirect object as a file
  -y YARA, --yara=YARA  YARA rule (or directory or @file) to check streams
                        (can be used with option --unfiltered)
  --yarastrings         Print YARA strings
  --decoders=DECODERS   decoders to load (separate decoders with a comma , ;
                        @file supported)
  --decoderoptions=DECODEROPTIONS
                        options for the decoder

  pdf-parser, use it to parse a PDF document
  Source code put in the public domain by Didier Stevens, no Copyright
  Use at your own risk
  https://DidierStevens.com

cmd

pdf-parser.py -v mist-tr.pdf

这是前面部分输出

这是结尾截取图

也可以重定向保存成文本日志，仔细分析。只需要加上>out.txt即可。

0x03 恶意pdf

JavaScript嵌入生成的pdf之中，在用户双击打开时触发执行恶意脚本。

python代码

#!/usr/bin/python

# V0.1 2008/05/23
# make-pdf-javascript, use it to create a PDF document with embedded JavaScript that will execute automatically when the document is opened
# requires module mPDF.py
# Source code put in public domain by Didier Stevens, no Copyright
# https://DidierStevens.com
# Use at your own risk
#
# History:
#  
#  2008/05/29: continue
#  2008/11/09: cleanup for release

import mPDF
import optparse

def Main():
    """make-pdf-javascript, use it to create a PDF document with embedded JavaScript that will execute automatically when the document is opened
    """

    parser = optparse.OptionParser(usage='usage: %prog [options] pdf-file', version='%prog 0.1')
    parser.add_option('-j', '--javascript', help='javascript to embed (default embedded JavaScript is app.alert messagebox)')
    parser.add_option('-f', '--javascriptfile', help='javascript file to embed (default embedded JavaScript is app.alert messagebox)')
    (options, args) = parser.parse_args()

    if len(args) != 1:
        parser.print_help()
        print ''
        print '  make-pdf-javascript, use it to create a PDF document with embedded JavaScript that will execute automatically when the document is opened'
        print '  Source code put in the public domain by Didier Stevens, no Copyright'
        print '  Use at your own risk'
        print '  https://DidierStevens.com'
    
    else:
        oPDF = mPDF.cPDF(args[0])
    
        oPDF.header()
    
        oPDF.indirectobject(1, 0, '<<\n /Type /Catalog\n /Outlines 2 0 R\n /Pages 3 0 R\n /OpenAction 7 0 R\n>>')
        oPDF.indirectobject(2, 0, '<<\n /Type /Outlines\n /Count 0\n>>')
        oPDF.indirectobject(3, 0, '<<\n /Type /Pages\n /Kids [4 0 R]\n /Count 1\n>>')
        oPDF.indirectobject(4, 0, '<<\n /Type /Page\n /Parent 3 0 R\n /MediaBox [0 0 612 792]\n /Contents 5 0 R\n /Resources <<\n             /ProcSet [/PDF /Text]\n             /Font << /F1 6 0 R >>\n            >>\n>>')
        oPDF.stream(5, 0, 'BT /F1 12 Tf 100 700 Td 15 TL (JavaScript example) Tj ET')
        oPDF.indirectobject(6, 0, '<<\n /Type /Font\n /Subtype /Type1\n /Name /F1\n /BaseFont /Helvetica\n /Encoding /MacRomanEncoding\n>>')
    
        if options.javascript == None and options.javascriptfile == None:
            javascript = """app.alert({cMsg: 'Hello from PDF JavaScript', cTitle: 'Testing PDF JavaScript', nIcon: 3});"""
        elif options.javascript != None:
            javascript = options.javascript
        else:
            try:
                fileJavasScript = open(options.javascriptfile, 'rb')
            except:
                print "error opening file %s" % options.javascriptfile
                return

            try:
                javascript = fileJavasScript.read()
            except:
                print "error reading file %s" % options.javascriptfile
                return
            finally:
                fileJavasScript.close()
        
        oPDF.indirectobject(7, 0, '<<\n /Type /Action\n /S /JavaScript\n /JS (%s)\n>>' % javascript)
    
        oPDF.xrefAndTrailer('1 0 R')

if __name__ == '__main__':
    Main()

用法及其命令

Usage: make-pdf-javascript.py [options] pdf-file

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -j JAVASCRIPT, --javascript=JAVASCRIPT
                        javascript to embed (default embedded JavaScript is
                        app.alert messagebox)
  -f JAVASCRIPTFILE, --javascriptfile=JAVASCRIPTFILE
                        javascript file to embed (default embedded JavaScript
                        is app.alert messagebox)

  make-pdf-javascript, use it to create a PDF document with embedded JavaScript that will execute automatically when the document is opened
  Source code put in the public domain by Didier Stevens, no Copyright
  Use at your own risk
  https://DidierStevens.com

只有两个主要参数一个是-j ，另一个是-f

-j=javascript 一个嵌入一段javascript代码

-f=javascriptfile 一个是嵌入整个JavaScript文件。

make-pdf-javascript.py out.pdf

0x04 参考

作者详情 : https://blog.didierstevens.com/professional/

pdf-tools : http://didierstevens.com/files/software/pdf-parser_V0_6_5.zip

make-pdf : http://didierstevens.com/files/software/make-pdf_V0_1_6.zip

你可能感兴趣的:(二进制,恶意代码,学习笔记,源码分享,技术文章)

Go开发技术路线全解析：从基础到资深的系统学习指南（2025年版） Mr.小海 golang 开发语言后端容器云原生 vim 中间件
Go开发技术路线全解析：从基础到资深的系统学习指南（2025年版）一、基础阶段：Go语言入门与核心语法环境搭建与工具链环境标准化是Go开发流程的基础，其核心目标是确保开发环境的一致性与可重复性。2025年主流的Go环境安装方式包括两种：一是通过Go官方网站下载对应操作系统的二进制安装包，二是使用系统包管理器（如Linux的apt/yum、macOS的Homebrew等）进行安装。安装完成后，需配置
设计模式学习笔记06-Decorator模式百恼神烦
本文主要是看了《设计模式》做的笔记和思考，在此分享仅代表个人观点，如有不对的地方欢迎批评和指正。基础当出现需要多个组件组成新的部件，同时不想增加类的数量（即不希望通过继承解决），可以考虑使用Decorator（装饰）模式。该模式下，通过不断地将部件放置到修饰物中，形成新的对象，并且修饰物可以负责将行为（职责）依次向内传递至部件，UML图如下：Decorator模式-UML.png使用时是将部件放入
mtk调试-camera
仅当做个人学习笔记使用，防丢失。原文链接：https://blog.csdn.net/qq_58703058/article/details/132994554Device：1、修改imgsensor相关（ProjectConfig.mk文件）device/mediateksample/{platform}/ProjectConfig.mk此文件用于将相关模块加入编译。2、在头文件中添加senso
关于XSS的一点理解「已注销」 XSS
什么是XSS攻击XSS，缩写自Cross-SiteScripting，即跨站脚本攻击，是一种注入型攻击方法，也就是攻击者把恶意脚本注入到良性和可信任的网站中。XSS攻击者通常会利用Web应用（通常在浏览器端脚本的form中）发送恶意代码给其他的Web应用用户。XSS的攻击原理就是攻击者使用XSS发送一些恶意的脚本代码给一些未防备的用户，这些用户的浏览器没办法分辨出这些脚本是否应该被信任，并且会完整
二维码：理解二维码 / 生成二维码 / 小程序支持哪种类型的二维码 / 小程序识别GS1码快雪时晴-初晴融雪前端前端
一、理解二维码1.1、概念二维码（2-dimensionalbarcode），又称二维条码，最早发源于日本，它是用某种特定的几何图形按一定规律在平面（二维方向上）分布的黑白相间的图形记录数据符号信息的；在代码编制上巧妙地利用构成计算机内部逻辑基础的“0”、“1”比特流的概念，使用若干个与二进制相对应的几何形体来表示文字数值信息，通过图象输入设备或光电扫描设备自动识读以实现信息自动处理。它具有条码技
C#学习笔记说笑谈古松 C#c#
这是我以前的学习笔记，使用word写的，缩进应该有问题。3.1变量usingsystem;在这里定义的变量就可以在整个程序中使用;inta;publicclassmain{在这里定义的变量就可以在整个类中使用;intb;publicvoidstaticMain(){在这里定义的变量就可以在整个方法中使用;intc;}}也可以用static实现!3.1常量静态常量:publicconstintMAX
《[系统底层攻坚] 张冬〈大话存储终极版〉精读计划启动——存储架构原理深度拆解之旅》-系统性学习笔记（适合小白与IT工作人员）谢郎Kobe 大活存储学习架构云计算硬件架构大数据
致所有存储技术探索者笔者近期将系统攻克存储领域经典巨作——张冬老师编著的《大话存储终极版》。这部近千页的存储系统圣经，以庖丁解牛的方式剖析了：存储硬件底层架构、分布式存储核心算法、超融合系统设计哲学等等。喜欢研究数据存储或者工作应用到存储的小伙伴，可以学习这本书。如果想利用碎片时间学习，也可以持续关注一下笔者不定期的章节解析。现在本人将此书的目录结构整理如下，未来笔者将按照顺序不定期更新【学习笔记
kvm虚拟机下的格式转换 teayear linux 运维服务器运维技术教程自动化监控
该指令使用qemu-img工具将原始磁盘镜像（raw格式）转换为QCOW2格式的虚拟磁盘镜像，具体参数解释如下：分步解析qemu-imgconvert调用QEMU的镜像转换工具，用于不同虚拟磁盘格式之间的转换。-p显示转换进度条（等同于--progress），实时反馈转换过程的状态。-fraw指定源文件的格式为raw（原始二进制格式）。raw格式无元数据头，直接存储磁盘扇区数据，常用于物理磁盘拷贝
学习笔记(66):Python入门教程-datetime模块时间运算顾子宇研发管理 python 编程语言 Python 小猿圈 Python入门教程
立即学习:https://edu.csdn.net/course/play/24459/296363?utm_source=blogtoedudatetime模块：datetime.date：表示日期的类，常用属性有year，month，daydatetime.time：表示时间的类，常用的属性有hour,minute,second,microseconddatetime.datetime：表示日
掌握C#文件操作与XML处理：学习资料完整指南竹石文化传播有限公司
本文还有配套的精品资源，点击获取简介：C#是一种广泛应用于Windows和跨平台开发的编程语言，它在.NET框架中包含强大的文件和XML操作能力。本文深入探讨了C#中的文件读写技术，包括使用System.IO命名空间中的File类进行文本和二进制文件处理，FileStream类的流操作，以及XML文档的解析、创建和修改方法。同时，文章也介绍了文件操作的扩展功能和在进行文件操作时应考虑的异常处理。通
gitlab-runner配置问题记录
引言笔者曾通过2种方式部署过gitlab-runner，在gitlab中使用这个runner拉起cijob的过程中或多或少遇到些问题，主要都是job中无法访问宿主机的docker等组件。本篇文档主要记录gitlab-runner安装及相关配置。二进制部署gitlab-runner部署以arm64架构的为例arch="arm64"curl-LJO"https://s3.dualstack.us-ea
什么是序列化？是二进制吗？一文解答你的疑惑！
一、序列化：数据转换的艺术1.1什么是序列化？序列化（Serialization）是指将数据结构或对象状态转换为可存储或可传输的格式的过程。简单来说，就是把内存中的对象变成可以保存到文件或通过网络发送的形式。//Java序列化示例publicclassPersonimplementsSerializable{privateStringname;privateintage;//gettersands
buildroot+qemu+arm64虚拟环境多种方式启动linux内核左家垅的牛 linux 运维服务器
Qemu：QEMU是一款开源的硬件虚拟化软件，可以在不同的主机平台上运行虚拟机。它通过动态的二进制转换，模拟CPU，并且提供一组设备模型，使它能够运行多种未修改的客户机OS。QEMU采用全系统仿真，可以模拟完整的计算机系统，包括处理器、内存、存储和外围设备。它提供硬件仿真，允许在一个虚拟环境中运行不同体系结构的操作系统和应用程序。QEMU可以与KVM一起使用，进而接近本地速度运行虚拟机。目前，QE
《随园诗话》学习笔记六飞鸿雪舞
卷一诗写性情，惟吾所适四、引用他言【原文】于耐圃相公构蔬香阁，种菜数畦，题一联云：“今日正宜知此味；当年曾自咬其根。”鄂西林相公亦有菜圃对联云：“此味易知，但须绿野秋来种；对他有愧，只恐苍生面色多。”两人都用真西山语；而胸襟气象，却迥不侔。【译文】于敏中在园子里构筑小楼一座，名称“蔬香阁”，种菜几畦。小楼题一联：“今日正宜知此味；当年曾自咬其根。”鄂尔泰家中也有菜圃，园子门口也有对联一副：“此味易
2022-03-23 成长_3a8a
2022年3月23日中原焦点团队刘永利分享923天。咨询伦理第1课学习笔记。第1课：绪论、价值观与多元文化。一、专业伦理的意义。专业伦理系指心理咨询师在执行业务时能够节制自己的专业特权和个人欲望，遵循伦理守则和执业标准，提供个案最好的专业服务，以增进个案的福祉。伦理可以分为个人伦理和专业伦理两种。专业伦理又可分为两大类，一类是强制性伦理，另一类是理想性伦理。强制性能力是最低标准，理想性伦理目前可能
C/C++---文件读取 MzKyle C/C++c语言 c++java
在C++中，文件读取操作主要是通过fstream类来完成的。fstream类提供了多种功能，用于从文件读取数据、写入数据以及对文件进行其他处理。文件读取操作可以通过两种主要方式实现：文本文件读取和二进制文件读取。文件读取在传参工作中，扮演十分重要的角色，方便客户端不接触代码的情况下对系统进行调试。1.文件输入流(ifstream)C++提供了ifstream（InputFileStream）类用于
Python从入门到弃坑学习笔记——第一章 Python入门 youweilong033 Python学习学习笔记 python pycharm
笔主趁着假期闲的蛋疼，打算开始学习一下Python，主要是之前就有很多朋友问我Python问题，甚至还有新闻学专业的，但我Python从没学过，还挺尴尬的。打算从现在开坑写一系列的Python学习笔记（flag立下了，乐。毕竟是从零开始学，在我的系列文章中，你将会看到包括但不限于：根据自己的想法命名东西，各种概念胡言乱语，shi一样的排版，某网课上的内容拿来主义。希望大佬们海涵，批评指正，有问题可
查看.bin二进制文件的方式（HxD十六进制编辑器的安装） Ac157ol 编辑器
文章目录Windows系统上安装HxD十六进制编辑器的步骤。**HxD是一款免费、轻量级的工具，适合查看和编辑.bin等二进制文件。****PS:实际安装过程中会发现找不到Windows11的版本，安装windows10的即可，并且没有区别setup版和portable版**安装HxD的步骤1.访问官方网站2.**下载安装程序**3.运行安装程序4.验证安装5.注意事项6.后续使用Windows系
Django学习笔记：（五）模板过滤器码农葫芦侠 Django django 学习笔记
模板过滤器1简介2语法3常见过滤器3.1add3.2addslashes3.3center3.4cut3.6date3.6default3.7default_if_none3.8dictsort3.9dictsortreversed3.10lower3.11filesizeformat3.12upper3.13first3.14last3.15floatformat3.16iriencode3.1
STM32+w5500+TcpClient学习笔记结城明日奈是我老婆嵌入式 stm32 学习笔记
文章目录参考文章本地和远程IP连接的配置(重点)TCP发送参考文章注意:SPI的CSRST脚这些都是通过cubeMX自定义的可以自行修改。用的是SPI1项目地址//MyTcpClient.h#ifndefMYTCPCLIENT_H#defineMYTCPCLIENT_H#include"main.h"#include"w5500.h"#include"socket.h"#include"wizch
pyQT学习笔记——Qt常用组件与绘图类的使用指南 tt555555555555 Qt pyqt 学习笔记
Qt常用组件与绘图类的使用指南一、大小策略（SizePolicy）1.1大小策略概述1.2具体参数1.3其他常见策略1.4伸展值的作用二、常用组件的使用2.1QSpinBox和QComboBox示例代码2.2QDialog示例代码2.3QTableView示例代码三、QPainter类介绍3.1QPainter的使用示例代码3.2QPainter的功能一、大小策略（SizePolicy）1.1大小
PyQt5学习笔记，带例子源码
一、很程序员，都喜欢开发windows桌面应用系统，基于python3开发，效率高二、PyQt5开发的桌面应用系统是可以跨平台的，可以在Mac上、Window上、Linux桌面系统上运行，以下为学习笔记及总级三、源码下载登录后复制1、QDateTimeEdit日期输入框setCalendarPopup弹出日期选择框setDisplayFormat("yyyy-MM-ddHH:mm:ss")设置展示
PyQt5学习笔记 Shane1111111 qt 学习笔记
来源：王铭东老师的B站教程链接：PyQt5快速入门_哔哩哔哩_bilibili基本控件QRadioButtonQLineedit#清空xxx.clear()#插入新内容到最右光标处xxx.insert("内容")布局1.水平布局创建组#hobby主要是保证他们是一个组。hobby_box=QGroupBox("爱好")设置hobby_box的布局将组中内容添加到该组的容器中将组hobby_box添
PyQT5 新手入门学习笔记 UncleShuShuShu python的坑 python pyqt5
一、PyQt5的起点第一个简单的pyqt程序#创建一个label程序（QLabel模块）importsysfromPyQt5.QtWidgetsimportQApplication,QLabelif__name__=='__main__':app=QApplication(sys.argv)label=QLabel('helloworld')#label的setText方法:label=Qlabe
《有关写书评文章的写作框架》千江雪_2932
11月5日书评比读后感难写，对于新手来说，要先掌握好写书评的套路和写作框架，然后先按着框架写，要不写着写着就写成读后感去了。因为想要写书评，所以，正在不断学习的过程中，今天发现有这么一篇文章，作者把书评的写作框架和过程说的非常的清楚。所以学习笔记了。写文章都要谋篇布局，写书评也是一样的，先列出主题和文章框架。以下是最简单也是最常见的书评文章框架。1、开篇破题2、引出书的内容梗概及作者简介3、用一个
2024年BCSP-X小高组基础知识题目（模拟题解析）天秀信奥编程培训 #BCXP-X模拟题北京BCSP-X试题讲解专栏 BCSP-X 信息学奥赛算法
一、单项选择计算机的核心部件是什么（）？A.显示器B.键盘C.中央处理器（CPU)D.鼠标正确答案：C.中央处理器（CPU）解析：计算机的核心部件是中央处理器（CPU）。CPU负责执行计算机程序中的指令，协调控制其他硬件设备工作。显示器、键盘和鼠标都是外部设备，主要用于人机交互，并不是计算机的核心部件。将十进制小数9.375转换为二进制小数，其正确的二进制表示是（）。A.1001.11B.1011
Go-Redis 入门与实践从连接到可观测，一站式掌握 go-redis v9**
1.环境准备组件版本建议说明Go≥1.22go-redis只支持最近两个Go版本Redis≥6.2保持与生产一致，建议开启protected-modego-redisgithub.com/redis/go-redis/v9本文以v9为例本地Redis：brewinstallredis(macOS)/apt-getinstallredis-server(Debian)/下载官方二进制。Docker：
你活着可能已经死了-《得到》“武志红的心理学课”学习笔记28 大庆思考笔记
人生由几百、几千乃至几万个大大小小的选择构成，等你老了，回顾一生的时候，你发现最亏待的，恰恰是你自己，那你这一生，就白活了。我们来做一个调查，很简单，然而也许很“致命”：你能不能想起五件事，你特别想做的，但却一直没有去做的，就按照自由联想的顺序，把这五件事写出来。现在，你可以做你自己的“父母”，试试带着点偏执劲，去追逐一些你特别想追逐的事物，以此来滋养你的本我。分享一段鲁米的诗给你：有一颗光的种子
ReactiveCocoa 学习笔记七（RACCommand）那夜的星空分外清澈 ReactiveCocoa ReactiveCocoa
RACCommandRACCommand关键的两个方法如下，理解了他们便能理解RACCommand的作用。-(instancetype)initWithEnabled:(nullableRACSignal*)enabledSignalsignalBlock:(RACSignal*(^)(InputType_Nullableinput))signalBlock;-(RACSignal*)execut
C语言学习笔记：do..while循环、goto语句女巫和她的乌鸦 C语言 c语言学习
do…while（）循环，do语句的语法：do循环语句；while（表达式）；例：intmain(){inti=1;do{printf("%d",i);i++;}while(i#include#includevoidmenu(){printf("1.play\n");printf("0.exit\n");}voidgame(){//猜数字游戏的实现:先生成随机数-->猜数字。rand函数返回了一个
jQuery 跨域访问的三种方式 No 'Access-Control-Allow-Origin' header is present on the reque qiaolevip 每天进步一点点学习永无止境跨域众观千象
XMLHttpRequest cannot load http://v.xxx.com. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://localhost:63342' is therefore not allowed access. test.html:1
mysql 分区查询优化 annan211 java 分区优化 mysql
分区查询优化引入分区可以给查询带来一定的优势，但同时也会引入一些bug. 分区最大的优点就是优化器可以根据分区函数来过滤掉一些分区，通过分区过滤可以让查询扫描更少的数据。所以，对于访问分区表来说，很重要的一点是要在where 条件中带入分区，让优化器过滤掉无需访问的分区。可以通过查看explain执行计划，是否携带 partitions
MYSQL存储过程中使用游标 chicony Mysql存储过程
DELIMITER $$ DROP PROCEDURE IF EXISTS getUserInfo $$ CREATE PROCEDURE getUserInfo(in date_day datetime)-- -- 实例-- 存储过程名为：getUserInfo-- 参数为：date_day日期格式:2008-03-08-- BEGINdecla
mysql 和 sqlite 区别 Array_06 sqlite
转载： http://www.cnblogs.com/ygm900/p/3460663.html mysql 和 sqlite 区别 SQLITE是单机数据库。功能简约，小型化，追求最大磁盘效率 MYSQL是完善的服务器数据库。功能全面，综合化，追求最大并发效率 MYSQL、Sybase、Oracle等这些都是试用于服务器数据量大功能多需要安装，例如网站访问量比较大的。而sq
pinyin4j使用 oloz pinyin4j
首先需要pinyin4j的jar包支持；jar包已上传至附件内方法一:把汉字转换为拼音；例如：编程转换后则为biancheng /** * 将汉字转换为全拼 * @param src 你的需要转换的汉字 * @param isUPPERCASE 是否转换为大写的拼音； true:转换为大写；fal
微博发送私信随意而生微博
在前面文章中说了如和获取登陆时候所需要的cookie，现在只要拿到最后登陆所需要的cookie，然后抓包分析一下微博私信发送界面 http://weibo.com/message/history?uid=****&name=**** 可以发现其发送提交的Post请求和其中的数据，让后用程序模拟发送POST请求中的数据，带着cookie发送到私信的接入口，就可以实现发私信的功能了。
jsp 香水浓 jsp
JSP初始化容器载入JSP文件后，它会在为请求提供任何服务前调用jspInit()方法。如果您需要执行自定义的JSP初始化任务，复写jspInit()方法就行了 JSP执行这一阶段描述了JSP生命周期中一切与请求相关的交互行为，直到被销毁。当JSP网页完成初始化后
在 Windows 上安装 SVN Subversion 服务端 AdyZhang SVN
在 Windows 上安装 SVN Subversion 服务端2009-09-16高宏伟哈尔滨市道里区通达街291号最佳阅读效果请访问原地址：http://blog.donews.com/dukejoe/archive/2009/09/16/1560917.aspx 现在的Subversion已经足够稳定，而且已经进入了它的黄金时段。我们看到大量的项目都在使
android开发中如何使用 alertDialog从listView中删除数据？ aijuans android
我现在使用listView展示了很多的配置信息，我现在想在点击其中一条的时候填出 alertDialog,点击确认后就删除该条数据，（ ArrayAdapter ，ArrayList，listView 全部删除），我知道在下面的onItemLongClick 方法中参数 arg2 是选中的序号，但是我不知道如何继续处理下去 1 2 3
jdk-6u26-linux-x64.bin 安装 baalwolf linux
1.上传安装文件(jdk-6u26-linux-x64.bin) 2.修改权限 [root@localhost ~]# ls -l /usr/local/jdk-6u26-linux-x64.bin 3.执行安装文件 [root@localhost ~]# cd /usr/local [root@localhost local]# ./jdk-6u26-linux-x64.bin&nbs
MongoDB经典面试题集锦 BigBird2012 mongodb
1.什么是NoSQL数据库？NoSQL和RDBMS有什么区别？在哪些情况下使用和不使用NoSQL数据库？ NoSQL是非关系型数据库，NoSQL = Not Only SQL。关系型数据库采用的结构化的数据，NoSQL采用的是键值对的方式存储数据。在处理非结构化/半结构化的大数据时；在水平方向上进行扩展时；随时应对动态增加的数据项时可以优先考虑使用NoSQL数据库。在考虑数据库的成熟
JavaScript异步编程Promise模式的6个特性 bijian1013 JavaScript Promise
Promise是一个非常有价值的构造器，能够帮助你避免使用镶套匿名方法，而使用更具有可读性的方式组装异步代码。这里我们将介绍6个最简单的特性。在我们开始正式介绍之前，我们想看看Javascript Promise的样子： var p = new Promise(function(r
[Zookeeper学习笔记之八]Zookeeper源代码分析之Zookeeper.ZKWatchManager bit1129 zookeeper
ClientWatchManager接口 //接口的唯一方法materialize用于确定那些Watcher需要被通知 //确定Watcher需要三方面的因素1.事件状态 2.事件类型 3.znode的path public interface ClientWatchManager { /** * Return a set of watchers that should
【Scala十五】Scala核心九：隐式转换之二 bit1129 scala
隐式转换存在的必要性，在Java Swing中，按钮点击事件的处理，转换为Scala的的写法如下： val button = new JButton button.addActionListener( new ActionListener { def actionPerformed(event: ActionEvent) {
Android JSON数据的解析与封装小Demo ronin47
转自：http://www.open-open.com/lib/view/open1420529336406.html package com.example.jsondemo; import org.json.JSONArray; import org.json.JSONException; import org.json.JSONObject; impor
[设计]字体创意设计方法谈 brotherlamp UI ui自学 ui视频 ui教程 ui资料
从古至今，文字在我们的生活中是必不可少的事物，我们不能想象没有文字的世界将会是怎样。在平面设计中，UI设计师在文字上所花的心思和功夫最多，因为文字能直观地表达UI设计师所的意念。在文字上的创造设计，直接反映出平面作品的主题。如设计一幅戴尔笔记本电脑的广告海报，假设海报上没有出现“戴尔”两个文字，即使放上所有戴尔笔记本电脑的图片都不能让人们得知这些电脑是什么品牌。只要写上“戴尔笔
单调队列-用一个长度为k的窗在整数数列上移动，求窗里面所包含的数的最大值 bylijinnan java 算法面试题
import java.util.LinkedList; /* 单调队列滑动窗口单调队列是这样的一个队列：队列里面的元素是有序的，是递增或者递减题目：给定一个长度为N的整数数列a(i),i=0,1,...,N-1和窗长度k. 要求：f(i) = max{a(i-k+1),a(i-k+2),..., a(i)},i = 0,1,...,N-1 问题的另一种描述就
struts2处理一个form多个submit chiangfai struts2
web应用中，为完成不同工作，一个jsp的form标签可能有多个submit。如下代码： <s:form action="submit" method="post" namespace="/my"> <s:textfield name="msg" label="叙述：">
shell查找上个月，陷阱及野路子 chenchao051 shell
date -d "-1 month" +%F 以上这段代码，假如在2012/10/31执行，结果并不会出现你预计的9月份，而是会出现八月份，原因是10月份有31天，9月份30天，所以-1 month在10月份看来要减去31天，所以直接到了8月31日这天，这不靠谱。野路子解决：假设当天日期大于15号
mysql导出数据中文乱码问题 daizj mysql 中文乱码导数据
解决mysql导入导出数据乱码问题方法：１、进入mysql，通过如下命令查看数据库编码方式： mysql> show variables like 'character_set_%'; +--------------------------+----------------------------------------+ | Variable_name&nbs
SAE部署Smarty出现：Uncaught exception 'SmartyException' with message 'unable to write dcj3sjt126com PHP smarty sae
对于SAE出现的问题：Uncaught exception 'SmartyException' with message 'unable to write file...。官方给出了详细的FAQ：http://sae.sina.com.cn/?m=faqs&catId=11#show_213 解决方案为： 01 $path
《教父》系列台词 dcj3sjt126com
Your love is also your weak point. 你的所爱同时也是你的弱点。 If anything in this life is certain, if history has taught us anything, it is that you can kill anyone. 不顾家的人永远不可能成为一个真正的男人。 &
mongodb安装与使用 dyy_gusi mongo
一.MongoDB安装和启动,widndows和linux基本相同 1.下载数据库, linux:mongodb-linux-x86_64-ubuntu1404-3.0.3.tgz 2.解压文件,并且放置到合适的位置 tar -vxf mongodb-linux-x86_64-ubun
Git排除目录 geeksun git
在Git的版本控制中，可能有些文件是不需要加入控制的，那我们在提交代码时就需要忽略这些文件，下面讲讲应该怎么给Git配置一些忽略规则。有三种方法可以忽略掉这些文件，这三种方法都能达到目的，只不过适用情景不一样。 1. 针对单一工程排除文件这种方式会让这个工程的所有修改者在克隆代码的同时，也能克隆到过滤规则，而不用自己再写一份，这就能保证所有修改者应用的都是同一
Ubuntu 创建开机自启动脚本的方法 hongtoushizi ubuntu
转载自： http://rongjih.blog.163.com/blog/static/33574461201111504843245/ Ubuntu 创建开机自启动脚本的步骤如下： 1) 将你的启动脚本复制到 /etc/init.d目录下以下假设你的脚本文件名为 test。 2) 设置脚本文件的权限 $ sudo chmod 755
第八章流量复制/AB测试/协程 jinnianshilongnian nginx lua coroutine
流量复制在实际开发中经常涉及到项目的升级，而该升级不能简单的上线就完事了，需要验证该升级是否兼容老的上线，因此可能需要并行运行两个项目一段时间进行数据比对和校验，待没问题后再进行上线。这其实就需要进行流量复制，把流量复制到其他服务器上，一种方式是使用如tcpcopy引流；另外我们还可以使用nginx的HttpLuaModule模块中的ngx.location.capture_multi进行并发
电商系统商品表设计 lkl
DROP TABLE IF EXISTS `category`; -- 类目表 /*!40101 SET @saved_cs_client = @@character_set_client */; /*!40101 SET character_set_client = utf8 */; CREATE TABLE `category` ( `id` int(11) NOT NUL
修改phpMyAdmin导入SQL文件的大小限制 pda158 sql mysql
　用phpMyAdmin导入mysql数据库时，我的10M的数据库不能导入，提示mysql数据库最大只能导入2M。　　 phpMyAdmin数据库导入出错：　　You probably tried to upload too large file. Please refer to documentation for ways to workaround this limit.
Tomcat性能调优方案 Sobfist apache jvm tomcat 应用服务器
一、操作系统调优对于操作系统优化来说，是尽可能的增大可使用的内存容量、提高CPU的频率，保证文件系统的读写速率等。经过压力测试验证，在并发连接很多的情况下，CPU的处理能力越强，系统运行速度越快。。【适用场景】任何项目。二、Java虚拟机调优应该选择SUN的JVM，在满足项目需要的前提下，尽量选用版本较高的JVM，一般来说高版本产品在速度和效率上比低版本会有改进。 J
SQLServer学习笔记 vipbooks 数据结构 xml
1、create database school 创建数据库school 2、drop database school 删除数据库school 3、use school 连接到school数据库，使其成为当前数据库 4、create table class(classID int primary key identity not null) 创建一个名为class的表，其有一