互联网应用服务扩展的一点经验
http://blog.rebill.info/archives/wangdi-internet-service.html
FreeWheel:互联网MRM视频广告投放发布平台: B2B:content owner ->content distributiors
广告应用服务器: 匹配(用户请求和现有广告的匹配)
Log processor: map-reduce(hapdoop), ETL--->OLAP(数据仓库)
Pusher: OLTP DB cache in memory, Pull from DB, 预处理和数据准备。mmap structured memory dump
总结:
商业模式和业务决定设计方案
服务的高可靠性: 99.99% uptime
峰值比: 5:1 peak to Mean.
1.应用服务扩展
1.1 无状态的应用服务器:
把需要状态的信息都编码到url,消除服务器间网络通信依赖
应用服务可重启
1.2 复制和多层次cache.
Master->slave读写分离: 避免Master"长"写锁block slave读锁
Cache的expire time需要认真考虑
Dot server: 无逻辑的server, 为了避免广告应用服务器集群全部单机造成对外服务失效。
Dot server对请求返回一个cache的无广告标准输出.
日志处理:
使用google protocol buffer避免自己定义格式和写parser, 同时binary log减少日志体积, 扩展字段方便
小公司尽量少去重新造轮子.
2.数据仓库扩展:
De-Normalization 反范式, 允许冗余, SQL逻辑简单, 查询性能好, 标准BI工具建模容易
Pivot: 合并相同key的多行数据到一行, 提高
Long tail roll up(长尾成一个item)
Benchmarking:
提取mysql slow query log多次平均测量值, 每月选择top slow query优化
InnoDB buffer设置70%机器内存
不要为了优化而优化, 只有在需要时才考虑:Table partition(分表-垂直分割) and sharding(按客户分库-水平分割)
3.运营原则
系统容量扩展规划: 为峰值预留50%容量, 当系统平均负载>50%, 是扩容的信号。
N+1 Data center: 数据中心不同地理位置分布,备用ISP和CDN
监控:
1. 应用check live
2. 服务异常警报: 错误,延时等
3. 数据库master-slave同步
4. Slow query日报
5. 当日业务运营情况日报
多阶段部署: 建一个和生产环境等比例缩小的Lab, 拓扑结构和生产环境相同, 使用生产环境的真实数据做集成测试。
分阶段部署, 分批分时升级
测试: DEV vs QA: 1:1
以自动化回归测试为核心。
Netlog: What we learned about scalability & high availability
http://www.slideshare.net/folke/netlog-what-we-learned-about-scalability-high-availability-430211
Apache+PHP+eAccelerator+Keepalived(for HA)
Ngnix+Lighttpd+CDN: static files(css/js/image/photo/video)
Search: Sphinx, mysql full-text search is very slow.
DB partitioning(sharding): Divide data on primary key,
How: Mysql partitioning since 5.1
Memcached for sessoin/query result/processed data/generated html
Cache with TTL/Cache forever with invalidate/Cache forever with update
Global locking: use memcache as locking mechanism
Flooding detection by useing memcache[很通用的高效flooding判断方法]
User can only redo action A after a timeout
a guestbook message can only be posted once every
2 minutes
User can not do action A more than X times in T
minutes
only 12 failed login attempts per hour are allowed
Scalability, Availability & Stability Patterns
http://www.slideshare.net/jboner/scalability-availability-stability-patterns
Scalable Web Architectures: Common Patterns and Approaches
http://www.slideshare.net/techdude/scalable-web-architectures-common-patterns-and-approaches
应用架构设计的3个目标: Scale, HA, Performance.
What is scalability for ?
1. Traffic growth
2. Dataset growth
3. Maintainability
Scalability two kinds:
1. Vertical(get bigger): 有些时候增加一些硬件(内存)的代价要小于重新设计软件或者切分数据
比如Mysql性能不够时可以先加一些内存试试.
2. Horizontal(get more)
Share nothing的server容易扩展.
Queuing: with queue, it is easy to parallel in asynchronus method
Database is the toughest part to scale. Dual Intel64 system wtth 16GB+ of RAM can get you a long way.
Mysql: Master-Master+multi-slave(as hot/hot) is good for HA.
design schema/access to avoid collision(hashing users to servers)
No auto-inc columns for hot/hot
Data Federation:
Simple things first: Vertical partitioning + sharding+ central lookup
Multi-site HA:
GSLB: global server load balancing, easiest are DNS