线性结构(Linear Structure):是一种有序数据项的集合,其中每个数据项都有唯一的前驱和后继。
线性结构总有两端,在不同的情况下,两端的称呼也不同:
本章从4个最简单但功能强大的结构入手,开始研究数据结构:
这些数据集的共同点在于,数据项之间只存在先后的次序关系,都是线性结构。
栈(Stack):是一种有次序的数据项的集合,在栈中,数据项的加入和移除都仅发生在同一端
栈的特点是**“后进先出”**(LIFO, Last In First Out)
栈的特性:反转次序,也就是后进先出。这种访问次序反转的特性,我们在某些计算机操作上碰到过:
将ADT Stack实现为Python的一个Class
将ADT Stack的操作实现为Class的方法
由于Stack是一个数据集,所以可以采用Python的原生数据集来实现,我们选用最常用的数据集List来实现
这里有一个细节:Stack的两端对应list设置
注:也可以使用列表的首端(index=0)作为栈顶,但是此时push和pop操作需要分别用list.indsert(0,item)、list.pop(0)来实现,这两个操作的时间复杂度均为O(n)
# ADT Stack的Python实现
class Stack:
def __init__(self):
self.items = []
def isEmpty(self):
return self.items == []
def push(self, item):
self.items.append(item)
def pop(self):
return self.items.pop()
def peek(self):
return self.items[-1]
def size(self):
return len(self.items)
# Stack测试代码
s = Stack()
print(s.isEmpty())
s.push(4)
s.push('dog')
print(s.peek())
s.push(True)
print(s.size())
print(s.isEmpty())
s.push(8.4)
print(s.pop())
print(s.pop())
print(s.size())
True
dog
3
False
8.4
True
2
括号的使用必须遵循“平衡”规则
对括号是否正确匹配的识别,是很多语言编译器的基础算法
下面看看如何构造括号匹配识别算法
def parChecker(symbolString):
s = Stack()
balanced = True
index = 0
while index < len(symbolString) and balanced:
symbol = symbolString[index]
# 经leetcode测试,in要比==节省时间
if symbol in "(":
s.push(symbol)
else:
if s.isEmpty():
balanced = False
else:
s.pop()
index += 1
# if balanced and s.isEmpty():
# return True
# else:
# return False
return balanced and s.isEmpty()
print(parChecker('((()))'))
print(parChecker('(()'))
True
False
在实际的应用里,我们会碰到更多种括号
这些不同的括号有可能混合在一起使用
因此就要注意各自的开闭匹配情况
下面这些是匹配的
下面这些是不匹配的
对上面的匹配函数稍加修改就可以:
def parChecker2(symbolString):
s = Stack()
balanced = True
index = 0
while index < len(symbolString) and balanced:
symbol = symbolString[index]
if symbol in "([{":
s.push(symbol)
else:
if s.isEmpty():
balanced = False
else:
top = s.pop()
if not matches(top, symbol):
balanced = False
index += 1
return balanced and s.isEmpty()
def matches(open, close):
opens = "([{"
closers = ")]}"
return opens.index(open) == closers.index(close)
print(parChecker2("{
{([][])}()}"))
print(parChecker2("[{()]"))
True
False
所谓的“进制”,就是用多少个字符来表示整数
我们经常需要将整数在二进制和十进制之间转换
十进制转换为二进制,采用的是“除以2求余数”的算法
“除以2”的过程,依次得到的余数是从低到高的次序,而输出则是从高到低,所以需要一个栈来反转次序
def divideBy2(decNumber):
remstack = Stack()
while decNumber > 0:
rem = decNumber % 2
remstack.push(rem)
decNumber //= 2
binString = ""
while not remstack.isEmpty():
binString += str(remstack.pop())
return binString
print(divideBy2(42))
101010
十进制转换为二进制的算法,很容易可以扩展为转换到任意N进制
计算机中另外两种常用的进制是八进制和十六进制
主要的问题是如何表示八进制及十六进制
def baseConverter(decNumber, base):
digits = "0123456789ABCDEF"
remstack = Stack()
while decNumber > 0:
rem = decNumber % base
remstack.push(rem)
decNumber //= base
newString = ""
while not remstack.isEmpty():
newString += digits[remstack.pop()]
return newString
print(baseConverter(25,2))
print(baseConverter(25,16))
11001
19
中缀表达式
前缀和后缀表达式
例如中缀表达式A+B
我们就得到了表达式的另外两种表示法: “前缀”和“后缀”表示法
这样A+B*C将变为前缀的“+A*BC”, 后缀的“ABC*+”
在前缀和后缀表达式中,操作符的次序完全决定了运算的次序,不再有混淆
流程说明
流程
def infixToPostfix(infixexpr):
prec = {
}
prec["*"] = 3
prec["/"] = 3
prec["+"] = 2
prec["-"] = 2
prec["("] = 1
opStack = Stack()
postfixList = []
tokenList = infixexpr.split(" ")
for token in tokenList:
if token in "ABCDEFGHIJKLMNOPQRSTUVWXYZ" or token in "0123456789":
postfixList.append(token)
elif token in "(":
opStack.push(token)
elif token in ")":
topToken = opStack.pop()
while topToken != "(":
postfixList.append(topToken)
topToken = opStack.pop()
else:
while (not opStack.isEmpty()) and (prec[opStack.peek()] >= prec[token]):
postfixList.append(opStack.pop())
opStack.push(token)
while not opStack.isEmpty():
postfixList.append(opStack.pop())
return " ".join(postfixList)
print(infixToPostfix("A + B * C"))
print(infixToPostfix("A * B + C * D"))
A B C * +
A B * C D * +
流程
def postinfxiEval(postfixeExpr):
operandStack = Stack()
tokenList = postfixeExpr.split(" ")
for token in tokenList:
if token in "0123456789":
operandStack.push(int(token))
else:
operand2 = operandStack.pop()
operand1 = operandStack.pop()
result = doMath(token, operand1, operand2)
operandStack.push(result)
return operandStack.pop()
def doMath(op, op1, op2):
if op == "*":
return op1 * op2
elif op == "/":
return op1 / op2
elif op == "+":
return op1 + op2
else:
return op1 - op2
postfixExpr = infixToPostfix("1 * 3 + 2 * 6")
print(postfixExpr)
finalResult = postinfxiEval(postfixExpr)
print(finalResult)
1 3 * 2 6 * +
15
计算机科学中队列的例子:
抽象数据类型Queue是一个有次序的数据集合
抽象数据类型Queue由如下操作定义:
class Queue:
def __init__(self):
self.items = []
def isEmpty(self):
return self.items == []
def enqueue(self, item):
self.items.insert(0,item)
def dequeue(self):
return self.items.pop()
def size(self):
return len(self.items)
算法
def hotPotato(namelist, num):
simqueue = Queue()
for name in namelist:
simqueue.enqueue(name)
while simqueue.size() > 1:
for i in range(num):
simqueue.enqueue(simqueue.dequeue())
simqueue.dequeue()
return simqueue.dequeue()
print(hotPotato(["Bill", "David", "Susan", "Jane", "Kent", "Brad"], 7))
Susan
多人共享一台打印机,采取“先到先服务”的队列策略来执行打印任务
在这种设定下,一个首要的问题就是:
一个具体的实例配置如下:
打印机的性能是:
问题是:怎么设定打印机的模式,让大家都不会等太久的前提下尽量提高打印质量?
这是一个典型的决策支持问题,但无法通过规则直接计算
我们要用一段程序来模拟这种打印任务场景,然后对程序运行结果进行分析,以支持对打印机模式设定的决策
如何对问题建模?
首先对问题进行抽象,确定相关的对象和过程
对象:打印任务、打印队列、打印机
过程:生成和提交打印任务
过程:实施打印
模拟时间:
模拟流程
import random
class Printer:
def __init__(self, ppm):
self.pagerate = ppm # 打印速度
self.currentTask = None # 打印任务
self.timeRemaining = 0 # 任务倒计时
def tick(self): # 打印1秒
if self.currentTask != None:
self.timeRemaining -= 1
if self.timeRemaining <= 0:
self.currentTask = None
def busy(self): # 打印忙?
if self.currentTask != None:
return True
else:
return False
def startNext(self, newtask): # 打印新作业
self.currentTask = newtask
self.timeRemaining = newtask.getPages()*60/self.pagerate
class Task:
def __init__(self, time):
self.timestamp = time # 生成时间戳
self.pages = random.randrange(1,21) # 打印页数
def getStamp(slef):
return self.timestamp
def getPages(self):
return self.pages
def waitTime(self, currenttime):
return currenttime - self.timestamp # 等待时间
def newPrintTask():
num = random.randrange(1,181) # 1/180概率生成作业
if num == 100:
return True
else:
return False
def simulation(numSeconds, pagesPerMinute): # 模拟
labprinter = Printer(pagesPerMinute)
printQueue = Queue()
waitingtimes = []
for currentSecond in range(numSeconds): # 时间流逝
if newPrintTask():
task = Task(currentSecond)
printQueue.enqueue(task)
if (not labprinter.busy()) and (not printQueue.isEmpty()):
nexttask = printQueue.dequeue()
waitingtimes.append(nexttask.waitTime(currentSecond))
labprinter.startNext(nexttask)
labprinter.tick()
averageWait = sum(waitingtimes) / len(waitingtimes)
print("Average Wait %6.2f secs %3d tasks remaining." % (averageWait, printQueue.size()))
return averageWait
运行和分析
total_averageWait = []
for i in range(10):
averageWait = simulation(3600 , 5)
total_averageWait.append(averageWait)
max_averageWait = max(total_averageWait)
min_averageWait = min(total_averageWait)
total_averageWait = sum(total_averageWait) / len(total_averageWait)
print("Total Average Wait %6.2f secs, Max Average Wait %6.2f secs, min %6.2f secs" \
% (total_averageWait, max_averageWait, min_averageWait))
Average Wait 63.35 secs 0 tasks remaining.
Average Wait 41.47 secs 0 tasks remaining.
Average Wait 161.46 secs 1 tasks remaining.
Average Wait 141.58 secs 0 tasks remaining.
Average Wait 61.50 secs 0 tasks remaining.
Average Wait 42.00 secs 0 tasks remaining.
Average Wait 66.76 secs 0 tasks remaining.
Average Wait 86.10 secs 1 tasks remaining.
Average Wait 411.35 secs 1 tasks remaining.
Average Wait 356.50 secs 5 tasks remaining.
Total Average Wait 143.21 secs, Max Average Wait 411.35 secs, min 41.47 secs
total_averageWait = []
for i in range(10):
averageWait = simulation(3600 , 10)
total_averageWait.append(averageWait)
max_averageWait = max(total_averageWait)
min_averageWait = min(total_averageWait)
total_averageWait = sum(total_averageWait) / len(total_averageWait)
print("Total Average Wait %6.2f secs, Max Average Wait %6.2f secs, min %6.2f secs" \
% (total_averageWait, max_averageWait, min_averageWait))
Average Wait 7.38 secs 0 tasks remaining.
Average Wait 12.25 secs 0 tasks remaining.
Average Wait 16.88 secs 0 tasks remaining.
Average Wait 12.18 secs 0 tasks remaining.
Average Wait 16.05 secs 0 tasks remaining.
Average Wait 7.41 secs 0 tasks remaining.
Average Wait 10.36 secs 0 tasks remaining.
Average Wait 11.58 secs 0 tasks remaining.
Average Wait 13.33 secs 0 tasks remaining.
Average Wait 10.11 secs 0 tasks remaining.
Total Average Wait 11.75 secs, Max Average Wait 16.88 secs, min 7.38 secs
讨论
为了对打印模式设置进行决策,我们用模拟程序来评估任务等待时间
模拟系统对现实的仿真
打印任务模拟程序还可以加进不同设定,来进行更丰富的模拟
更真实的模拟,来源于对问题的更精细建模,以及以真实数据进行设定和运行
也可以扩展到其它类似决策支持问题
双端队列Deque是一种有次序的数据集,
但双端队列并不具有内在的LIFO或者FIFO特性
class Deque:
def __init__(self):
self .items = []
def isEmpty(self):
return self.items == []
def addFront(self, item):
self.items.append(item)
def addRear(self, item):
self.items.insert(0, item)
def removeFront(self):
return self.items.pop()
def removeRear(self):
return self.items.pop(0)
def size(self):
return len(self.items)
def palchecker(aString):
chardeque = Deque()
for ch in aString:
chardeque.addRear(ch)
stillEqual = True
while chardeque.size() > 1 and stillEqual:
first = chardeque.removeFront()
last = chardeque.removeRear()
if first != last:
stillEqual = False
return stillEqual
print(palchecker("lsdkjfskf"))
print(palchecker("radar"))
False
True
print(palchecker("上海自来水来自海上"))
True
在前面基本数据结构的讨论中,我们采用Python List来实现了多种线性数据结构
列表List是一种简单强大的数据集结构,提供了丰富的操作接口
列表:是一种数据项按照相对位置存放的数据集
如一个考试分数的集合“54, 26, 93, 17, 77和31”
如果用无序表来表示,就是[54, 26, 93, 17, 77, 31]
List():创建一个空列表
add(item):添加一个数据项到列表中,假设item原先不存在于列表中
remove(item):从列表中移除item,列表被修改,item原先应存在于表中
search(item):在列表中查找item,返回布尔类型值
isEmpty():返回列表是否为空
size():返回列表包含了多少数据项
append(item):添加一个数据项到表末尾,假设item原先不存在于列表中
index(item):返回数据项在表中的位置
insert(pos, item):将数据项插入到位置pos,假设item原先不存在与列表中,同时原列表具有足够多个数据项,能让item占据位置pos
pop():从列表末尾移除数据项,假设原列表至少有1个数据项
pop(pos):移除位置为pos的数据项,假设原列表存在位置pos
采用链表实现无序表
为了实现无序表数据结构,可以采用链接表的方案。
虽然列表数据结构要求保持数据项的前后相对位置,但这种前后位置的保持,并不要求数据项依次存放在连续的存储空间
如下图,数据项存放位置并没有规则,但如果在数据项之间建立链接指向,就可以保持其前后相对位置
链表实现的最基本元素是节点Node
class Node:
def __init__(self,initdata):
self.data = initdata
self.next = None
def getData(self):
return self.data
def getNext(self):
return self.next
def setData(self,newData):
self.data = newData
def setNext(slef,newnext):
self.next = newnext
temp = Node(93)
temp.getData()
93
链表实现:无序表UnorderedList
class UnorderedList:
def __init__(self):
self.head = None
mylist = UnorderedList()
print(mylist.head)
None
链表实现:无序表UnorderedList的方法
isEmpty()方法
add()方法
size()方法
search()方法
remove(item)方法
class UnorderedList:
def __init__(self):
self.head = None
def isEmpty(self):
return self.head == None
def add(self, item):
temp = Node(item)
temp.setNext(self.head)
self.head = temp
def size(self):
current = self.head
count = 0
while current != None:
count += 1
current = current.getNext()
return count
def search(self, item):
current = self.head
found = False
while current != None and not found:
if current.getData() == item:
found = True
else:
current = current.getNext()
return found
def remove(self, item):
current = self.head
previous = None
found = False
while not found:
if current.getData() == item:
found = True
else:
previous = current
current = current.getNext()
if previous == None:
self.head = current.getNext()
else:
previous.setNext(current.getNext())
mylist = UnorderedList()
print(mylist.head)
print(mylist.isEmpty())
print(mylist.size())
for i in [31, 77, 17, 93, 26, 54]:
mylist.add(i)
print(mylist.head.getData())
print(mylist.isEmpty())
print(mylist.size())
print(mylist.search(17))
mylist.remove(17)
print(mylist.search(17))
None
True
0
31
77
17
93
26
54
False
6
True
False
OrderedList所定义的操作如下:
在实现有序表的时候,需要记住的是,数据项的相对位置,取决于它们之间的“大小”比较
链表实现:有序表OrderedList
class OrderList:
def __init__(self):
self.head = None
链表实现:有序表OrderedList的方法
class OrderedList:
def __init__(self):
self.head = None
def isEmpty(self):
return self.head == None
def size(self):
current = self.head
count = 0
while current != None:
count += 1
current = current.getNext()
return count
def search(self,item):
current = self.head
found = False
stop = False
while current != None and not found and not stop:
if current.getData() == item:
found = True
else:
if current.getData() > item:
stop = True
else:
current = current.getNext()
return found
def add(self,item):
current = self.head
previous = None
stop = False
while current != None and not stop:
if current.getData() > item: # 发现插入位置
stop = True
else:
previous = current
current = current.getNext()
temp = Node(item)
if previous == None: # 插入在表头
temp.setNext(self.head)
self.head = temp
else: # 插入在表中
temp.setNext(current)
previous.setNext(temp)
def remove(self, item):
current = self.head
previous = None
found = False
while current != None and not found:
if current.getData() == item:
found = True
else:
previous = current
current = current.getNext()
if previous == None:
self.head = current.getNext()
else:
previous.setNext(current.getNext())
mylist = OrderedList()
print(mylist.head)
print(mylist.isEmpty())
print(mylist.size())
for i in [31, 77, 17, 93, 26, 54]:
mylist.add(i)
print(mylist.head.getData())
print(mylist.isEmpty())
print(mylist.size())
print(mylist.search(17))
mylist.remove(17)
print(mylist.search(17))
None
True
0
31
31
17
17
17
17
False
6
True
False
isEmpty是O(1),因为仅需要检查head是否为None
size是O(n),因为除了遍历到表尾,没有其它办法得知节点的数量
search/remove以及有序表的add方法,则是O(n),因为涉及到链表的遍历,按照概率其平均操作的次数是n/2
无序表的add方法是O(1),因为仅需要插入到表头
链表实现的List,跟Python内置的列表数据类型,在有些相同方法的实现上的时间复杂度不同
主要是因为Python内置的列表数据类型是基于顺序存储来实现的,并进行了优化
参考资料:
1.数据结构与算法Python版