系统设计中20个常见瓶颈--翻译

 

翻译自: http://highscalability.com/blog/2012/5/16/big-list-of-20-common-bottlenecks.html

 

数据库:

工作空间大小超过可用内存(Working size exceeds available RAM)

 

运行长时间和短时间查询(Long & short running queries)

 

写-写冲突(Write-write confilicts)

 

大的连接占用内存(Large joins taking up memory)

 

虚拟化 Virtualisation:

共享磁盘驱动器,磁盘定位死(Sharing a HDD, disk seek death)

 

云环境下网络I/O波动(Network I/O fluctuations in the cloud)

 

编程 Programming:

多线程:死锁,比事件更重量级,调试,非线性扩展等(Threads: deadlocks, heavyweight as compared to events, debugging, non-linear scalability, etc ...)

 

事件驱动编程:回调复杂性,如果在函数调用里存储状态等...(Event driven programming: callback complexity, how-to-store-state-in-function-calls, etc...)

 

缺少调优,缺少跟踪,缺少日志(Lack of profiling, lack of tracing, lack of logging)

 

不能单块扩展,单点故障,不能水平扩展等(One piece can't scale, SPOF, non horizontally scalable, etc ...)

 

有状态的应用(Stateful apps)

 

不良设计:开发者创建了一个在他们的计算机上运行良好的应用。这个应用进入产品,在几十个用户里运行良好。数月或数年以后,这个应用不能在几千个用户下运行,需要彻底重新架构和重写。(Bad design: The developers create an app which runs fine on their computer. The app goes into production, and runs find, with a couple of users. Months/Years later, the application can't run with thousands of users and needs to be totally re-architecture and rewritten.)

 

算法复杂性(Algorithm complexity)

 

独立服务器如DNS查询和其他任何可能导致阻塞的(Dependent services like DNS lookups and whatever else you may block on.)

 

栈空间(Stack space)

 

磁盘 Disk:

本地磁盘获取(Local disk access)

 

磁盘随机I/O->磁盘定位(Random disk I/O -> disk seeks)

 

磁盘碎片化(Disk fragmentation)

 

SSD性能下降,在已写数据超出SSD大小时(SSDs performance drop once data written is greater than SSD size)

 

操作系统 OS:

Fsync 刷新,linxu缓存充满(Fsync flushing, linux buffer cache filling up)

 

TCP缓存太小(TCP buffers too small)

 

文件描述符限制(File descriptor limits)

 

功率预算(Power budget)

 

缓存 Caching:

未使用memcached(数据库冲击)(Not using memcached (database pummeling))

 

In HTTP:headers, etags, not gzipping, etc...

 

没充分利用浏览器缓存(Not utilising the browser's cache enough)

 

字节码缓存(Byte code caches(e.g. PHP))

 

L1/L2 caches. This is a huge bottleneck. Keep important hot/data in L1/L2. This spans so much: snappy for network I/O, column DBs run algorithms directly on compressed data, etc. Then there are techniques to not destroy your TLB. The most important idea is to have a firm grasp on computer architecture in terms of CPUs multi-core, L1/L2, shared L3, NUMA RAM, data transfer bandwidth/latency from DRAM to chip, DRAM caches DiskPages, DirtyPages, TCP packets travel thru CPU<->DRAM<->NIC.

 

CPU:

CPU过载(CPU overload)

 

上下文切换。(Context switches -> too many threads on a core, bad luck w/ the linux scheduler, too many system calls, etc...)

 

IO等待(IO waits -> all CPUs wait at the same speed)

 

CPU Caches: Caching data is a fine grained process (In Java think volatile for instance), in order to find the right balance between having multiple instances with different values for data and heavy synchronization to keep the cached data consistent.

 

底板吞吐量(Backplane throughput)

 

网络 Network:

NIC maxed out, IRQ saturation, soft interrupts taking up 100% CPU

 

DNS查询(DNS lookups)

 

包丢失(droped packets)

 

在网络里的非预期路由(Unexpected routes with in the network)

 

网络磁盘获取(Network disk access)

 

Shared SANs

 

Server failure -> no answer anymore from the server

 

过程 Process:

测试的时间(Testing time)

 

开发的时间(Development time)

 

团队大小(Team size)

 

预算(Budget)

 

编码债务(code debt)

 

内存 memory:

内存溢出(Out of memory -> kills process, go into swap & grind to a halt)

 

内存溢出导致磁盘颠簸(与swap相关)(Out of memory causing Disk Thrashing(related to swap))

 

内存库的开销(Memory librar overhead)

 

内存碎片化(Memory fragmentation)

  • 在Java里请求GC 暂停(In Java requires GC pauses)
  • 在C,malloc调用开始永不返回(In C, malloc's start taking forever)

 

 

你可能感兴趣的:(性能优化,设计,系统设计)