参考资料

[1]       Opher Kahn and Bob Valentine [June 2010]. Intel Next Generation Microarchitecture Codename Sandy Bridge: New Processor Innovations. Intel IDF2010 San Francisco, CA. http://www.intel.com/idf/audio_sessions.htm

[2]       T. Kilburn, D.B.G. Edwards, M.J. Lanigan, F.H. Sumner, IRE Trans [Apr. 1962]. One-Level Storage System. Electronic Computers April 1962.

[3]       Freescale. E500 TLB Entries.
http://forums.freescale.com/freescale/attachments/freescale/CWCFCOMM/2355/1/e500_tlb.pdf

[4]       Freescale. EREF: A Programmer’s Reference Manual for Freescale Embedded Processors .
http://cache.freescale.com/files/32bit/doc/ref_manual/EREFRM.pdf

[5]       Intel [May 2011]. Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: System Programming Guide, Part 1, Section 4.10.1 Process-Context Identifiers (PCIDs).
http://www.intel.com/Assets/pdf/manual/253668.pdf

[6]       Hans de Vries [Sep. 2003]. Understanding the detailed Architecture of AMD's 64 bit Core. http://www.chip-architect.com/news/2003_09_21_Detailed_Architecture_of_AMDs_64bit_Core.html.

[7]       David A. Patterson and John L. Hennessy [Jan. 2008]. Computer Architecture A Quantitative Approach, Fourth Edition. ISBN:13:978-0-12-370490-0 ISBN 10:0-12-370490-1. Original English language edition copyright by Elsevier Inc. Published by China Machine Press.

[8]       J. Navarro [Apr. 2004]. Transparent operating system support for superpages. PhD thesis, Rice University, Houston, Texas.

[9]       Juan E. Navarro [Apr. 2004]. Transparent operating system support for superpages.

[10]   Adam G. Litke [Jun. 2007]. “Turning the Page” on Hugetlb Interfaces. Proceedings of the Linux Symposium Volume One. June 27th–30th, 2007.

[11]   Narayanan Ganapathy and Curt Schimmel [1998]. General purpose operating system support for multiple page sizes. Proceeding ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference.

[12]   Michael E. Thomadakis, Ph.D [Jan. 2011].The Architecture of the Nehalem Processor and Nehalem-EP SMP Platforms. http://sc.tamu.edu/systems/eos/nehalem.pdf.

[13]   Hughes; William Alexander, Ramagopal; Hebbalalu S., Meyer; Derrick R., Conor; Stephen M. Snoop resynchronization mechanism to preserve read ordering. US Patent 6,473,837.

[14]   G. Reinman and B. Calder [Dec. 1998]. Predictive techniques for aggressive load speculation in 31st International Symposium on Microarchitecture.

[15]   G. Reinman and B. Calder [May. 2000]. A comparative Survey of Load Speculation Architectures. Journal of Instruction-Level Parallelism.

[16]   Digital Semiconductor [Jun. 1996]. Alpha 21064 and Alpha 21064A Microprocessors Hardware Reference Manual

[17]   Compag Computer Corporation [Dec. 1998]. Alpha 21164 Microprocessor Hardware Reference Manual.

[18]   Compag Computer Corporation [Jul. 1999]. Alpha 21264 Microprocessor Hardware Reference Manual.

[19]   A. Moshovos, G.S. Sohi [Dec. 1997]. Streamlining inter-operation memory communication via data dependence prediction. In 30th International Symposium on Microarchitecture.

[20]   Norman P. Jouppi [Jun. 1990]. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers, ACM SIGARCH Computer Architecture News, v.18 n.3a, p.364-373, June 1990.

[21]   G. Tyson, T. M. Austin [Dec. 1997]. Improving the accuracy and performance of memory communication through renaming. In 30th Annual International Symposium on Microarchitecture, pages 218–227.

[22]   Andrew Glew [Oct. 1998]. MLP yes! ILP no! ASPLOS Wild and Crazy Idea Session'98.

[23]   Alan Jay Smith [Sep. 1982]. Cache Memories. ACM Computing Surveys Volume 14 Issue 3.

[24]   JEDEC [Jul. 2010]. DDR3 SDRAM STANDARD.

[25]   Kostas Pagiamtzis, Student Member, IEEE, and Ali Sheikholeslami, Senior Member, IEEE [Mar. 2006]. Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey. IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 41, NO. 3.

[26]   André Seznec [May. 1993]. A case for two-way skewed-associative caches. ISCA '93 Proceedings of the 20th annual international symposium on computer architecture.

[27]   Chenxi Zhang, Xiaodong Zhang and Yong Yan [Sep. 1997]. Two Fast and High-Associativity Cache Schemes, IEEE Micro, v.17 n.5, p.40-49.

[28]   H. Vandierendonck and K. D. Bosschere [2008]. Constructing optimal XOR-functions to minimize cache conflict misses. In Proc. Int. Conf. on Architecture of Computing Systems (ARCS). Springer, pp. 261-272.

[29]   Antonio González, Mateo Valero, Nigel Topham and Joan M. Parcerisa [Jul. 1997]. Eliminating cache conflict misses through XOR-based placement functions, Proceedings of the 11th international conference on Supercomputing, p.76-83, July 07-11, 1997, Vienna, Austria.

[30]   R. E. Kessler, Mark D. Hill [Nov. 1992]. Page placement algorithms for large real-indexed caches, ACM Transactions on Computer Systems (TOCS), v.10 n.4, p.338-359.

[31]   Yehuda Afek, Dave Dice and Adam Morrison [Jun. 2011]. Cache Index-Aware Memory Allocation. ISMM’11, San Jose, California, USA.

[32]   Mark S. Papamarcos and Janak H. Patel [Jun. 1984]. A low-overhead coherence solution for multiprocessors with private cache memories.

[33]   P. Sweazey and A. J. Smith [Jun. 1986]. A class of compatible cache consistency protocols and their support by the IEEE futurebus, Proceedings of the 13th annual international symposium on Computer architecture, p.414-423, Tokyo, Japan.

[34]   Herbert H.J. Hum et al [Jul. 2005]. Forward State for use in Cache Coherency in a Multiprocessor System. U.S. Patent 6,922,756.

[35]   GURURAJ S. RAO [Jul. 1978]. Performance Analysis of Cache Memories. Journal of the ACM (JACM) Vol 25, No 3.

[36]   Hussein Al-Zoubi, Aleksandar Milenkovic, and Milena Milenkovic [Apr. 2004]. Performance Evaluation of Cache Replacement Policies for the SPEC CPU2000 Benchmark Suite. Proc. 42nd ACM Southeast Regional Conference, 2004.

[37]   Jan Reineke, Daniel Grund, Christoph Berg, and Reinhard Wilhelm [Nov. 2007]. Timing Predictability of Cache Replacement Policies. Real-Time Systems. Volume 37, Number 2.

[38]   Elizabeth J. O'Neil, Patrick E. O'Neil and Gerhard Weikum [Jun. 1993]. The LRU-K Page Replacement Algorithm for Database Disk Buffering. Proc. ACM SIGMOD, Washington, D.C., Volume 22 Issue 2.

[39]   Theodore Johnson and Dennis Shasha [Sep. 1994]. 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm. Proc. 20th VLDB Conf., Santiago, Chile, 1994.

[40]   Song Jiang and Xiaodong Zhang [Jun. 2002]. LIRS: An efficient low inter-reference recency set replacement policy to improve buffer cache performance. In Proc. ACM SIGMETRICS Conf., 2002.

[41]   Song Jiang and Xiaodong Zhang [Jun. 2002]. The PPT of LIRS: An efficient low inter-reference recency set replacement policy to improve buffer cache performance.
www.ece.eng.wayne.edu/~sjiang/Projects/LIRS/sig02.ppt.

[42]   Song Jiang, Feng Chen and Xiaodong Zhang [Apr. 2005]. CLOCK-Pro: an effective improvement of the CLOCK replacement. ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference. USENIX Association Berkeley, CA, USA.

[43]   Freescale [Apr. 2005]. PowerPC™ e500 Core Family Reference Manual Supports e500v1 and e500v2. Rev. 1, 4/2005. Pg. 384. 11-22.

[44]   Moinuddin K. Qureshi, Daniel N. Lynch, Onur Mutlu and Yale N. Patt. [May 2006]. A Case for MLP-Aware Cache Replacement, ACM SIGARCH Computer Architecture News, v.34 n.2, p.167-178.

[45]   Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely and Joel Emer [Jun. 2007]. Adaptive insertion policies for high performance caching, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA.

[46]   Jayesh Gaur, Mainak Chaudhuri and Sreenivas Subramoney [Jun. 2011]. Bypass and Insertion Algorithms for Exclusive Last-level Caches. ISCA'11, June 4-8, 2011, San Jose, California, USA.

[47]   Tse-Yu Yeh and Yale N. Patt [May 1992]. Alternative implementations of two-level adaptive branch prediction. ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture.

[48]   TOMASULO, R. M [Jun. 1967]. An efficient algorithm for exploiting multiple arithmetic units. IBM J Res. Dev 11, 1 (Jan. 1967), 25-33.

[49]   Gurindar S. Sohi and Manoj Franklin [Sep. 1990]. High-bandwidth data memory systems for superscalar processors, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.53-62, Santa Clara, California, United States。

[50]   Toni Juan, Juan J. Navarro and Olivier Temam [Jul. 1997]. Data caches for superscalar processors, Proceedings of the 11th international conference on Supercomputing, p.60-67, July 07-11, 1997, Vienna, Austria.

[51]   David Kroft [May 1981]. Lockup-free instruction fetch/prefetch cache organization. Proceedings of the 8th annual symposium on Computer Architecture, p.81-87, May 12-14, 1981, Minneapolis, Minnesota, United States.

[52]   David Kroft [1998]. Retrospective: lockup-free instruction fetch/prefetch cache organization. Published in: Proceeding ISCA '98 25 years of the international symposia on Computer architecture.

[53]   Keith I. Farkas, Paul Chow, Norman P. Jouppi and Zvonko Vranesic [Jun. 1997]. Memory-system design considerations for dynamically-scheduled processors, Proceedings of the 24th annual international symposium on Computer architecture, p.133-143, June 01-04, 1997, Denver, Colorado, United States.

[54]   Keith I. Farkas and Norman P. Jouppi [Apr. 1994]. Complexity/performance tradeoffs with non-blocking loads, Proceedings of the 21ST annual international symposium on Computer architecture, p.211-222, April 18-21, 1994, Chicago, Illinois, United States.

[55]   Sarita V. Adve and Kourosh Gharachorloo [Sep. 1995]. Shared Memory Consistency Models: A Tutorial. Western Research Laboratory Research Report 95/7, Digital Equipment Corporation Palo Alto, California 94301-1616.

[56]   Ajay D. Kshemkalyani and Mukesh Singhal [Mar. 2011]. Distributed Computing: Principles, Algorithms, and Systems, Section 12.2 Memory consistency models. Cambridge University Press; Reissue edition. ISBN-10: 0521189845. ISBN-13: 978-0521189842

[57]   Seth Gilbert and Nancy Lynch [Jun. 2002]. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services, ACM SIGACT News, v.33 n.2.

[58]   Werner Vogels [Jan. 2009]. Eventually consistent. Communications of the ACM, v.52 n.1.

[59]   Fredrik Dahlgren [May 1995]. Boosting the performance of hybrid snooping cache protocols. Proceeding ISCA '95 Proceedings of the 22nd annual international symposium on computer architecture.

[60]   James R. Goodman [Jun. 1983]. Using Cache Memory to Reduce Processor-Memory Traffic. Proceedings of the 10th Annual International Symposium on Computer Architecture, pp 124-131, 1983.

[61]   James R. Goodman and Philip J. Woest [May 1988]. The Wisconsin multicube: a new large-scale cache-coherent multiprocessor. Proceeding ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture.

[62]   AMD [May 2011]. AMD64 Technology--AMD64 Architecture Programmer’s Manual Volume 2: System Programming, Section 7.3 Memory Coherency and Protocol. Revision 3.18.

[63]   Jeffrey Kuskin, David Ofelt, Mark Heinrich, John Heinlein, Richard Simoni, Kourosh Gharachorloo, John Chapin, David Nakahira, Joel Baxter, Mark Horowitz, Anoop Gupta, Mendel Rosenblum, and John Hennessy [Apr. 1994]. The Stanford FLASH multiprocessor. Proceeding ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture.

[64]   Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta and John Hennessy [Jun. 1990]. The directory-based cache coherence protocol for the DASH multiprocessor. Proceeding ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture. ACM New York, NY, USA.

[65]   Mark S. Papamarcos and Janak H. Patel [Jun. 1984]. A Low-Overhead Coherence Solution for Multiprocessors with Private Cache Memories. Proceeding ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture.

[66]   Amin Firoozshahian [Dec 2008]. Smart Memories: A Reconfigurable Memory System Architecture. PhD thesis, Stanford University.

[67]   Eric Rotenberg, Steve Bennett and James E. Smith [Dec. 1996]. Trace cache: a low latency approach to high bandwidth instruction fetching, Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, p.24-35, December, 1996, Paris, France.

[68]   Tom Shanley [Jul. 2004]. Chapter 38 Pentium 4 core description, pg 901, Chapter 40. The Pentium 4 Caches, the unabridged Pentium 4 IA32 processor genealogy. Mindshare. Addison-Wesley. ISBN: 0-321-24656-X.

[69]   David Kanter [Sep. 2010]. Intel's Sandy Bridge Microarchitecture.
http://www.realworldtech.com/page.cfm?ArticleID=RWT091810191937&p=1

[70]   Steven Przybylski, John Hennessy, and Mark Horowitz [May 1989]. Characteristics of Performance-Optimal Multi-level Cache Hierarchies. In Proc. of the 16th Annual International Symposium on Computer Architecture.

[71]   STREAM "standard" results. http://www.cs.virginia.edu/stream/standard/Bandwidth.html

[72]   Chetana N. Keltcher, Kevin J. McGrath, Ardsher Ahmed and Pat Conway [Mar. 2003]. The AMD Opteron Processor for Multiprocessor Servers. IEEE Micro, vol. 23, no. 2, pp. 66-76.

[73]   AMD [Sep. 2005]. Software Optimization Guide for AMD Athlon™ 64 and AMD Opteron™ Processors. Revision 3.06.

[74]   David Kanter [Aug. 2010]. AMD's Bulldozer Microarchitecture.

[74]http://www.realworldtech.com/page.cfm?ArticleID=RWT082610181333&p=8

[75]   J. M. Tendler, S. Dodson, S. Fields, H. Le, and B. Sinharoy [Jan 2002]. Power4 System Microarchitecture. IBM Journal of Research and Development, 46(1), Jan 2002.

[76]   B. Sinharoy, R. Kalla, J. Tendler, R. Eickemeyer, and J. Joyner [Jul. 2005]. Power5 System Microarchitecture. IBM Journal of Research and Development, 49(4), July 2005.

[77]   H. Q. Le, W. J. Starke, J. S. Fields, F. P. O'Connell, D. Q. Nguyen, B. J. Ronchetti, W. M. Sauer, E. M. Schwarz and M. T. Vaden [Nov. 2007], IBM POWER6 microarchitecture, IBM Journal of Research and Development, v.51 n.6, p.639-662, November 2007.

[78]   Ron Kalla, Balaram Sinharoy, William J. Starke and Michael Floyd [Mar. 2010]. Power7: IBM's Next-Generation Server Processor, IEEE Micro, v.30 n.2, p.7-15, March 2010.

[79]   J. Kahl, M. Day, H. Hofstee, C. Johns, T. Maeurer, and D. Shippy [2005]. Introduction to the Cell Multiprocessor. IBM Journal of Research and Development, 49(4), 2005.

[80]   Tom R. Halfhill [Jul. 2010]. NetLogic Broadens XLP Family Multithreading and Four-Way Issue with One to Eight CPU Cores. Microprocessor Report.

[81]   Kunle Olukotun, Lance Hammond and James Laudon [2007]. Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency, Morgan and Claypool Publishers, 2007.

[82]   RIKEN Fujitsu Limited [Jun. 2011]. Supercomputer "K computer" Takes First Place in World. http://www.fujitsu.com/global/news/pr/archives/month/2011/20110620-02.html

[83]   Jean-Loup Baer and Wen-Hann Wang [May 1988]. On the inclusion properties for multi-level cache hierarchies, Proceedings of the 15th Annual International Symposium on Computer architecture, p.73-80, May 30-June 02, 1988, Honolulu, Hawaii, United States.

[84]   Bradford M. Beckmann, Michael R. Marty and David A. Wood [Dec. 2006]. ASR: Adaptive Selective Replication for CMP Caches, Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, p.443-454, December 09-13, 2006.

[85]   Wen-Hann Wang [1989]. Multilevel Cache Hierarchies. Ph.D. Dissertation. University of Washington. AAI9013828.

[86]   AMD [Jun. 2000]. AMD Athlon™ Processor and AMD Duron™ Processor with full-speed on-die L2 cache. June 19, 20

[87]   Pat Conway, Nathan Kalyanasundharam, Gregg Donley, Kevin Lepak, and Bill Hughes [Apr. 2010]. Cache hierarchy and memory subsystem of the AMD opteron processor. IEEE Micro, vol. 30, no. 2, pp. 16-29, Apr. 2010.

[88]   Michael Zhang, Krste Asanovic [Jun. 2005]. Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors, Proceedings of the 32nd annual international symposium on Computer Architecture, p.336-345, June 04-08, 2005.

[89]   Ying Zheng, Brain T. Davis, Matthew Jordan [Mar. 2004]. Performance evaluation of exclusive cache hierarchies, in: ISPASS ’04: Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, IEEE Computer Society, Washington, DC, USA, 2004, pp. 89–96.

[90]   Aamer Jaleel, Eric Borch, Malini Bhandaru, Simon C. Steely Jr. and Joel Emer [Dec. 2010]. Achieving Non-Inclusive Cache Performance with Inclusive Caches: Temporal Locality Aware (TLA) Cache Management Policies. Proceeding MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[91]   Michael R. Marty, Jesse D. Bingham, Mark D. Hill, Alan J. Hu, Milo M. K. Martin and David A. Wood [Feb. 2005]. Improving Multiple-CMP Systems Using Token Coherence, Proceedings of the 11th International Symposium on High-Performance Computer Architecture, p.328-339, February 12-16, 2005.

[92]   Yuichiro Ajima, Shinji Sumimoto and Toshiyuki Shimizu [Nov. 2009]. Tofu: A 6D Mesh/Torus Interconnect for Exascale Computers, Computer, v.42 n.11, p.36-40, November 2009.

[93]   http://www.gem5.org/dist/tutorials/isca_pres_2011.pdf.

[94]   GEM5 source code ./src/mem/protocol/MOESI_CMP_directory-L1cache.sm

[95]   http://gem5.org/Cache_Coherence_Protocols.

[96]   Herbert H. J. Hum and James R. Goodman [Jul. 2005]. Forward State for use in Cache Coherency in a Multiprocessor System. US Patent No. 6,922,756 B2. Original Assignee: Intel Corporation. July 26, 2005.

[97]   Norman P. Jouppi [May 1993]. Cache write policies and performance, Proceedings of the 20th annual international symposium on Computer architecture, p.191-201, May 16-19, 1993, San Diego, California, United States.

[98]   Intel [June 2011]. Intel 64 and IA-32 Architectures Optimization Reference Manual. Section 2.1.1 Intel microarchitecture code name Sandy Bridge Pipeline Overview.

[99]   David Kanter [Jul. 2011]. Sandy Bridge for Servers.
http://realworldtech.com/page.cfm?ArticleID=RWT072811020122&p=1

[100] Steven P. Vanderwiel and David J. Lilja [Jun. 2000]. Data prefetch mechanisms, ACM Computing Surveys (CSUR), v.32 n.2, p.174-199.

[101] Doug Joseph and Dirk Grunwald [May 1997]. Prefetching using Markov predictors. ISCA '97 Proceedings of the 24th annual international symposium on Computer architecture.

[102] Tien-Fu Chen and Jean-Loup Baer [Apr. 1994]. A performance study of software and hardware data prefetching schemes, Proceedings of the 21ST annual international symposium on Computer architecture, p.223-232, April 18-21, 1994, Chicago, Illinois, United States.

[103] Tien-Fu Chen and Jean-Loup Baer [May 1995]. Effective Hardware-Based Data Prefetching for High-Performance Processors, IEEE Transactions on Computers, v.44 n.5, p.609-623.

[104] G. S. Manku, M. R. Prasad, and D. A. Patterson [Dec. 1997]. A new voting based hardware data prefetch scheme. Proc. of IEEE Int. Conf. High-Performance Computing, pp.100 - 105, 1997.

[105] S. Palacharla and R. E. Kessler [Apr. 1994]. Evaluating stream buffers as a secondary cache replacement, Proceedings of the 21ST annual international symposium on Computer architecture, p.24-33, April 18-21, 1994, Chicago, Illinois, United States.

你可能感兴趣的:(参考资料)