6.087 Practical Programming in C, lec9

External libraries. B-trees,priority queues.

Symbols and libraries

• External libraries provide a wealth of functionality –example: C standard library

• Programs access libraries’ functions and variables viaidentifiers known as symbols

• Header file declarations/prototypes mapped to symbols atcompile time

• Symbols linked to definitions in external libraries duringlinking

• Our own program produces symbols, too

所谓Symbol,应该就是方法签名吧,由于以后链接的需要,在二进制库文件中应该还有对应的方法签名,这个能通过反编译得到。

Functions and variables as symbols

• Consider the simple hello world program written below:

#include <stdio.h>

const char msg [] = "Hello, world.";

int main(void){

puts (msg);

return 0;

}

• What variables and functions are declared globally?

msg, main(), puts(), others in stdio.h

我认为C程序的执行就是对全局变量、静态变量和方法(实际上也可以看作个变量,毕竟方法执行的最终结果就是一个数)的操作。他们是程序最顶层的部分,因此用Symbol来表示。

• Let’s compile, but not link, the file hello.c to createhello.o:

athena% gcc -Wall -c hello.c -ohello.o

• -c: compile, but do not linkhello.c; result will compile the code into machine instructions butnot make the program executable

• addresses for lines of code andstatic and global variables not yet assigned

• need to perform link step onhello.o (using gcc or ld) to assign memory to each symbol

• linking resolves symbols definedelsewhere (like the C standard library) and makes the code executable

• Let’s look at the symbols in the compiled file hello.o:

athena% nm hello.o

• Output:

0000000000000000 T main

0000000000000000 R msg

U puts

• ’T’ – (text) code; ’R’ – read-only memory; ’U’- undefined symbol

• Addresses all zero before linking; symbols not allocatedmemory yet

• Undefined symbols are defined externally, resolved duringlinking

• Why aren’t symbols listed for other declarations instdio.h?

• Compiler doesn’t bother creating symbols for unused functionprototypes (saves space)

• What happens when we link?

athena% gcc -Wall hello.o -o hello

• Memory allocated for defined symbols

• Undefined symbols located in external libraries (like libcfor C standard library)

• Let’s look at the symbols now:

athena% nm hello

• Output:

(other default symbols)

.

0000000000400524 T main

000000000040062c R msg

U puts@@GLIBC_2.2.5

• Addresses for static (allocated at compile time) symbols

• Symbol puts located in shared library GLIBC_2.2.5 (GNU Cstandard library)

• Shared symbol puts not assigned memory until run time

link阶段主要是symbols赋地址,我觉得底层的变量应该是用相对位置来算的。

Static and dynamic linkage

• Functions, global variables must be allocated memory beforeuse

• Can allocate at compile time (static) or at run time (shared)

• Advantages/disadvantages to both

• Symbols in same file, other .o files, or static libraries(archives, .a files) – static linkage

• Symbols in shared libraries (.so files) – dynamic linkage

• gcc links against shared libraries by default, can forcestatic linkage using -static flag

个人更喜欢static,原因是以前遇到n多库版本不兼容引发的惨案。。。

Loading shared libraries

• Shared library located during compile-time linkage, but needsto be located again during run-time loading

• Shared libraries located at run-time using linker libraryld.so

• Whenever shared libraries on system change, need to runldconfig to update links seen by ld.so

• During loading, symbols in dynamic library are allocatedmemory and loaded from shared library file

现在知道为什么以前在解决链接库的版本问题时总是要用ldconfig,也怪我太懒,那会儿man一下就什么都知道了。

Loading shared libraries on demand

• In Linux, can load symbols from shared libraries on demandusing functions in dlfcn.h

• Open a shared library for loading:

void ∗ dlopen(const char ∗file, int mode);

values for mode: combination ofRTLD_LAZY (lazy loading of library), RTLD_NOW (load now), RTLD_GLOBAL(make symbols in library available to other libraries yet to be

loaded), RTLD_LOCAL (symbols loadedare accessible only to your code)

• Get the address of a symbol loaded from the library:

void ∗ dlsym(void ∗ handle, constchar ∗ symbol_name);

handle from call to dlopen; returnedaddress is pointer to variable or function identified by symbol_name

• Need to close shared library file handle after done withsymbols in library:

int dlclose(void ∗ handle);

• These functions are not part of C standard library; need tolink against library libdl: -ldl compiler flag

Symbol resolution issues

• Symbols can be defined in multiple places

• Suppose we define our own puts() function

• But, puts() defined in C standard library

• When we call puts(), which one gets used?

• Our puts() gets used since ours is static, and puts() in Cstandard library not resolved until run-time

• If statically linked against C standard library, linker findstwo puts() definitions and aborts (multiple definitions notallowed)

Symbol resolution issues

• How about if we define puts() in a shared library and attemptto use it within our programs?

• Symbols resolved in order they areloaded

• Suppose our library containingputs() is libhello.so, located in a standard library directory (like/usr/lib), and we compile our hello.c code against this library:

athena% gcc -g -Wall hello.c -lhello-o hello.o

• Libraries specified using -l flagare loaded in order specified, and before C standard library

• Which puts() gets used here?

athena% gcc -g -Wall hello.c -lc-lhello -o hello.o

C中静态链接的一个问题就是命名冲突,动态链接可以通过顺序来解决这个问题,有重载的味道。

Creating libraries

• Libraries contain C code like any other program

• Static or shared libraries compiled from (un-linked) objectfiles created using gcc

• Compiling a static library:

• compile, but do not link source files:

athena% gcc -g -Wall -c infile.c -ooutfile.o

• collect compiled (unlinked) files into an archive:

athena% ar -rcs libname.a outfile1.ooutfile2.o …

Creating shared libraries

• Compile and do not link files using gcc:

athena% gcc -g -Wall -fPIC -c infile.c-o outfile.o

• -fPIC option: create position-independent code, since codewill be repositioned during loading

• Link files using ld to create a shared object (.so) file:

athena% ld -shared -soname libname.so-o libname.so.version -lc outfile1.o outfile2.o ...

• If necessary, add directory to LD_LIBRARY_PATH environmentvariable, so ld.so can find file when loading at run-time

• Configure ld.so for new (or changed) library:

athena% ldconfig -v

不明白PIC的作用,在编译阶段Symbol不是没有地址吗?

B-tree structure

• Binary search tree with variable number of children (at leastt, up to 2t)

• Tree is balanced – all leaves at same level

• Node contains list of “keys” – divide range of elementsin children

Initializing a B-tree

• Initially, B-tree contains root node with no children (leafnode), no keys

• Note: root node exempt from minimum children requirement

B树中要求每个节点的子元素数目在t-12t-1之间,根节点在生长初期显然不能满足这个要求,因此它可以不受这个规则的约束。

Inserting elements

• Insertion complicated due to maximum number of keys

• At high level:

1. traverse tree down to leaf node

2. if leaf already full, split intotwo leaves:

(a) move median key element intoparent (splitting parent already full)

(b)split remaining keys into twoleaves (one with lower, one with higher elements)

3. add element to sorted list of keys

• Can accomplish in one pass, splitting full parent nodesduring traversal in step 1

B树节点中子元素的个数在t-12t-1之间,这种要求可以保证在插入和删除元素时节点的合并和分裂都符合这个要求。2t-1保证了中间元素只有一个,分裂之后的两个节点大小都为t-1,再加入一个元素则变为一个t和一个t-1。在节点合并时,如果从一个含t-1个元素的节点删除一个元素,删除后这个节点要和含2t-1个元素的相邻节点以及他们两个在上层节点中的父元素,共3t-3个元素合并,合并后超过2t-1,继续分裂,其平均值1.5t-1.5一定符合t-12t-1之间这个条件。如果两个都是t-1,合并后为2t-2,也满足这个条件。

Searching a B-tree

• Search like searching a binary search tree:

1. start at root.

2. if node empty, element not in tree

3. search list of keys for element(using linear or binary search)

4. if element in list, return element

5. otherwise, element between keys,and repeat search on child node for that range

• Tree is balanced – search takes O(log n) time

B树的搜索本质上与二叉树搜索没有什么区别,都是通过比较来进行搜索。不过B树中一个节点可以有多个元素,访问这些元素是随机访存,而二叉树则需要通过指针来访问,这方面B树效率要高一些。另外,B树在分裂和分裂时都会选节点中的中间元素,从而使得树更为平衡。当然,保证B树平衡最重要的兄弟节点间的合并和分裂,以及这些合并和分裂背后的最小元素数和最大元素数。

Deletion

• Deletion complicated by minimum children restriction

• When traversing tree to find element, need to ensure childnodes to be traversed have enough keys

• if adjacent child node has atleast t keys, move separating key from parent to child and closestkey in adjacent child to parent

• if no adjacent child nodes haveextra keys, merge child node with adjacent child

• When removing a key from a node with children, need torearrange keys again

• if child before or after removedkey has enough keys, move closest key from child to parent

• if neither child has enough keys,merge both children

• if child not a leaf, have torepeat this process

Delte是最复杂的部分,记得6.087的最后一个课后练习就是实现一个B树,当时B树的Detle搞的我压力很大,后来发现那个作业中不需要实现Delete。。。。

Priority queue

• Abstract data structure ordering elements by priority

• Elements enqueued with priority, dequeued in order of highestpriority

• Common implementations: heap or binary search tree

• Operations: insertion, peek/extract max-priority element,increase element priority

线性的队列,容易实现,感觉最简单就是使用插入排序算法实现。往往涉及大量的插入和删除,因此使用链表实现比较好。

Heaps

• Heap - tree with heap-ordering property: priority(child)≤priority(parent)

• More sophisticated heaps exist – e.g. binomial heap,Fibonacci heap

• We’ll focus on simple binary heaps

• Usually implemented as an array with top element at beginning

• Can sort data using a heap – O(n log n) worst case in-placesort!

Extracting data

• Heap-ordering property ⇒ maximum priority element at top ofheap

• Can peek by looking at top element

• Can remove top element, move last element to top, and swaptop element down with its children until it satisfies heap-orderingproperty:

1. start at top

2. find largest of element and leftand right child; if element is largest, we are done

3. otherwise, swap element withlargest child and repeat with element in new position

Inserting data/increasing priority

• Insert element at end of heap, set to lowest priority −∞

• Increase priority of element to real priority:

1. start at element

2. if new priority less thanparent’s, we are done

  1. otherwise, swap element with parent and repeat

树、堆等层次结构相对于线性结构的最大优势就是跳转距离比较大,访问效率高。缺点是实现稍微复杂。插入元素时两者的搜索顺序有很大不同,树的结构是从上到下,逐层寻找元素的位置;而最大堆则是将新入元素放到最底部,然后通过从底向上搜索,这样可以保证完美的树形结构。最大堆可以这么玩是因为里面的元素位置不是绝对固定,他们不需要保持特定顺序,只需要满足子节点比父节点小就可以,这样最多到达顶部就停止,而不会有从上到下的过程,因此其计算效率也是很高的。

你可能感兴趣的:(programming)