#背景
众所周知,thrift是一款很优秀的rpc框架,公司今年在部门间推行thrift框架来提高部门间的通信效率,作者本人的工作内容主要是作为客户端(本人所在组为服务端,对于提供服务的其他部门来说是客户端)调用其他部门的接口,在工作过程中发现thrift有个较大的弊端,一般情况下服务端会向客户端提供一组服务IP,所有的负载均衡工作,连接是否可用等工作都需要客户端自己来维护,而apache本身提供的thrift、GenericObjectPool 并没有提供相应的机制来保证当服务端部分机器服务不可用时及时切换,TSocket.isOpen()方法只能检测到连接是否打开,在已打开的情况下然后网络中断或服务宕机,该方法是无法检测到连接的可用性,所以整个工作需要我们自己来维护,本文作者将介绍下自己在开发过程中碰到的问题以及改进方式,欢迎有更好的处理方式的同学分享经验。
本文前几节主要介绍优化过程中的策略,最后一节会用最新的代码来展示最新的策略实现。
#第一版大概设计
首先确定了连接池整体框架如下图:
在确定了整体框架后,设计具体策略如下:
制定策略的依据在于:虽然ThriftConnectionPoolFactory在创建连接时会轮循对方的服务IP,但是当部分服务IP宕机时,会创建新的连接集中打到正常的机器上,恢复后的机器无法收到任何请求,所以考虑到负载均衡问题,连接池采用maxActive=maxIdle=minIdle策略,项目启动时便均衡的创建到每个IP的连接,不remove有问题的连接来保证请求均匀打到对方机器,为了防止对方服务有问题造成我方大量线程阻塞的情况,又加入了“切服”机制进行降级处理。
采用此种策略后,在压测过程中发现,当后端服务宕机时,并不影响客户端机器的运行,并且达到了负载均衡的效果,后端可以均匀的收到的请求,但产生的较大的一个问题是:由于部分IP进入“切服”状态,到该IP的连接处理的请求的速度会非常快(直接return null比调用thrift接口快得非常多),进而导致绝大数请求拿到了空数据,而不是按照“切服”IP连接在连接池中的比例数取不到数据。
#第二版优化
为了解决部分IP进入“切服”导致的绝大多数请求拿不到数据的情况,优化策略如下:
#第三版优化
目前优化的最后一版,额外添加了zookeeper监控服务,zookeeper由运维提供给客户端,作者这里只负责调用zookeeper集群,具体策略如下:
加入zookeeper监控和统计连接功能后,能够达到实时负载均衡的功能,同时保证了本服务的高可用性,目前统计连接功能还未实现,后面的代码只包括zookeeper监控功能,整个框架图改进如下:
ZookeeperFactory:用于生成一个CuratorFramework客户端,维护更新AddressProvider的服务列表。
#代码实现
1.连接类 ThriftTSocket
package省略
import省略
public class ThriftTSocket extends TSocket {
private String hostThrift;//连接的ip
private int portThrift;//连接的port
private int timeoutThrift;//连接设置的timeout时长
public ThriftTSocket(String host, int port, int timeout) throws TTransportException {
super(host, port, timeout);
this.hostThrift = host;
this.portThrift = port;
this.timeoutThrift = timeout;
}
public String getHostThrift() {
return this.hostThrift;
}
public int getPortThrift() {
return this.portThrift;
}
public int getTimeoutThrift() {
return this.timeoutThrift;
}
}
2.第三方服务状态
package省略
import省略
public class ThriftServiceStatus {
private static Log serviceStatusLoger = LogFactory.getLog("serviceStatusLoger");
private static Log apiSwitchMonitor = LogFactory.getLog("apiSwitchMonitorLog");
private static final int INTERFACE_TOTAL_COUNT = 30;// 服务访问次数达到该值进行切服计算
private static final long INTERFACE_RECORD_COUNT_RESET_TIME = 1 * 60 * 1000;// 服务统计量重置时间间隔
private static final long INTERFACE_AUTO_NORMAL_TIME = 5 * 60 * 1000;// 接口自动恢复时间5分钟
/**
* 第三方服务异常次数
*/
private int count;
/**
* 第三方服务开始计数时间
*/
private long recordStartTime;
/**
* 第三方服务最近一次关闭时间
*/
private long closeTime;
/**
* 第三方服务名称
*/
private String serviceName;
private Lock lock;
public ThriftServiceStatus(String serviceName) {
this.recordStartTime = System.currentTimeMillis();
this.count = 0;
this.closeTime = 0;
this.lock = new ReentrantLock();
this.serviceName = serviceName;
}
/**
* 在切服时间内,服务处于不可用状态
* @return
*/
public boolean ifServiceUsable() {
return (System.currentTimeMillis() - this.closeTime) > INTERFACE_AUTO_NORMAL_TIME;
}
public void checkThriftServiceStatus() {
this.lock.lock();
try {
this.count++;
serviceStatusLoger.info("[this service " + this.serviceName + "]" + " has exceptions. count:["
+ this.count + "] recordStartTime:[" + TimeUtil.timestamp2date(recordStartTime) + "] nowTime:["
+ TimeUtil.timestamp2date(System.currentTimeMillis()) + "]");
// 服务异常次数 超过统计阀值时 进行 切服
if (this.count >= INTERFACE_TOTAL_COUNT) {
this.closeTime = System.currentTimeMillis();// 更新关闭时间
apiSwitchMonitor.info("[" + this.serviceName + "]" + " close time:["
+ TimeUtil.timestamp2date(this.closeTime) + "]" + "average response time:[10000]");
}
// 1分钟后 重置计数
if ((System.currentTimeMillis() - recordStartTime) > INTERFACE_RECORD_COUNT_RESET_TIME) {
this.count = 0;
this.recordStartTime = System.currentTimeMillis();
}
} finally {
this.lock.unlock();
}
}
}
3.CuratorFramework客户端,连接zookeeper集群
package省略
import省略
public class ZookeeperFactory implements FactoryBean, Closeable, InitializingBean {
private static Logger LOGGER = LoggerFactory.getLogger(ZookeeperFactory.class);
/**
* zookeeper集群地址
*/
private String zookeeperHosts;
// session超时
private int sessionTimeout = 3000;
private int connectionTimeout = 3000;
private CuratorFramework zkClient;
// 第三方未提供,所以暂时用不到
private String namespace;
public void setZookeeperHosts(String zookeeperHosts) {
this.zookeeperHosts = zookeeperHosts;
}
public void setSessionTimeout(int sessionTimeout) {
this.sessionTimeout = sessionTimeout;
}
public void setConnectionTimeout(int connectionTimeout) {
this.connectionTimeout = connectionTimeout;
}
public void setNamespace(String namespace) {
this.namespace = namespace;
}
public void setZkClient(CuratorFramework zkClient) {
this.zkClient = zkClient;
}
@Override
public CuratorFramework getObject() throws Exception {
return this.zkClient;
}
@Override
public void afterPropertiesSet() throws Exception {
if (StringUtil.isBlank(this.zookeeperHosts)) {
return;
}
this.zkClient = this.create(zookeeperHosts, sessionTimeout, connectionTimeout, namespace);
this.zkClient.start();
}
private CuratorFramework create(String connectString, int sessionTimeout, int connectionTimeout, String namespace) {
try {
CuratorFrameworkFactory.Builder builder = CuratorFrameworkFactory.builder();
return builder.connectString(connectString).sessionTimeoutMs(sessionTimeout).connectionTimeoutMs(30000)
.canBeReadOnly(true).namespace(namespace)
.retryPolicy(new ExponentialBackoffRetry(1000, Integer.MAX_VALUE)).defaultData(null).build();
} catch (Exception e) {
LOGGER.error("ZookeeperFactory create error", e);
throw e;
}
}
public void close() {
if (zkClient != null) {
zkClient.close();
}
}
@Override
public Class> getObjectType() {
return CuratorFramework.class;
}
@Override
public boolean isSingleton() {
return true;
}
}
4.保存服务IP类
package省略
import省略
public class AddressProvider {
private static Logger LOGGER = LoggerFactory.getLogger(AddressProvider.class);
/**
* 最新的服务器IP列表,由zookeeper来维护更新
*/
private List serverAddresses = new CopyOnWriteArrayList();
/**
* 没有配置zookeeper时使用原来配置文件中的IP列表
*/
private List backupAddresses = new LinkedList();
/**
* 轮循队列,获取IP时使用
*/
private Queue loop = new LinkedList();
private Lock loopLock = new ReentrantLock();
/**
* zookeeper 监控
*/
private PathChildrenCache cachedPath;
public AddressProvider() {
}
public AddressProvider(String backupAddress, CuratorFramework zkClient, String zookeeperPath) throws Exception {
// 默认使用配置文件中的IP列表
this.backupAddresses.addAll(this.transfer(backupAddress));
this.serverAddresses.addAll(this.backupAddresses);
Collections.shuffle(this.backupAddresses);
Collections.shuffle(this.serverAddresses);
// 配置zookeeper时,启动客户端
if (!StringUtil.isBlank(zookeeperPath) && zkClient != null) {
this.buildPathChildrenCache(zkClient, zookeeperPath, true);
cachedPath.start(StartMode.POST_INITIALIZED_EVENT);
}
}
public InetSocketAddress selectOne() {
loopLock.lock();
try {
if (this.loop.isEmpty()) {
this.loop.addAll(this.serverAddresses);
}
return this.loop.poll();
} finally {
loopLock.unlock();
}
}
public Iterator addressIterator() {
return this.serverAddresses.iterator();
}
/**
* 初始化 cachedPath,并添加监听器,当zookeeper上任意节点数据变动时,更新本地serverAddresses
* @param client
* @param path
* @param cacheData
* @throws Exception
*/
private void buildPathChildrenCache(final CuratorFramework client, String path, Boolean cacheData) throws Exception {
final String logPrefix = "buildPathChildrenCache_" + path + "_";
cachedPath = new PathChildrenCache(client, path, cacheData);
cachedPath.getListenable().addListener(new PathChildrenCacheListener() {
@Override
public void childEvent(CuratorFramework client, PathChildrenCacheEvent event) throws Exception {
PathChildrenCacheEvent.Type eventType = event.getType();
switch (eventType) {
case CONNECTION_RECONNECTED:
LOGGER.info(logPrefix + "Connection is reconection.");
break;
case CONNECTION_SUSPENDED:
LOGGER.info(logPrefix + "Connection is suspended.");
break;
case CONNECTION_LOST:
LOGGER.warn(logPrefix + "Connection error,waiting...");
return;
case INITIALIZED:
LOGGER.warn(logPrefix + "Connection init ...");
default:
}
// 任何节点的时机数据变动,都会rebuild,此处为一个"简单的"做法.
cachedPath.rebuild();
rebuild();
}
private void rebuild() throws Exception {
List children = cachedPath.getCurrentData();
if (CollectionUtils.isEmpty(children)) {
// 有可能所有的thrift server都与zookeeper断开了链接
// 但是 thrift client与thrift server之间的网络是良好的
// 因此此处是否需要清空serverAddresses,是需要多方面考虑的.
// 这里我们认为zookeeper的服务是可靠的,数据为空也是正确的
serverAddresses.clear();
LOGGER.error(logPrefix + "server ips in zookeeper is empty");
return;
}
List lastServerAddress = new LinkedList();
for (ChildData data : children) {
String address = new String(data.getData(), "utf-8");
lastServerAddress.add(transferSingle(address));
}
// 更新本地IP列表
serverAddresses.clear();
serverAddresses.addAll(lastServerAddress);
Collections.shuffle(serverAddresses);
}
});
}
/**
* 将String地址转换为InetSocketAddress
* @param serverAddress
* 10.183.222.59:1070
* @return
*/
private InetSocketAddress transferSingle(String serverAddress) {
if (StringUtil.isBlank(serverAddress)) {
return null;
}
String[] address = serverAddress.split(":");
return new InetSocketAddress(address[0], Integer.valueOf(address[1]));
}
/**
* 将多个String地址转为InetSocketAddress集合
* @param serverAddresses
* ip:port;ip:port;ip:port;ip:port
* @return
*/
private List transfer(String serverAddresses) {
if (StringUtil.isBlank(serverAddresses)) {
return null;
}
List tempServerAdress = new LinkedList();
String[] hostnames = serverAddresses.split(";");
for (String hostname : hostnames) {
tempServerAdress.add(this.transferSingle(hostname));
}
return tempServerAdress;
}
}
5.连接创建工厂
package省略
import省略
public class ThriftConnectionPoolFactory extends BasePoolableObjectFactory {
private static Logger LOGGER = LoggerFactory.getLogger(ThriftConnectionPoolFactory.class);
private final AddressProvider addressProvider;
private int timeout = 2000;
protected ThriftConnectionPoolFactory(AddressProvider addressProvider) throws Exception {
this.addressProvider = addressProvider;
}
@Override
public ThriftTSocket makeObject() throws Exception {
String logPrefix = "makeObject_";
ThriftTSocket thriftTSocket = null;
InetSocketAddress address = null;
Exception exception = null;
try {
address = this.addressProvider.selectOne();
thriftTSocket = new ThriftTSocket(address.getHostName(), address.getPort(), timeout);
thriftTSocket.open();
LOGGER.info(logPrefix + "connect server:[" + address.getHostName() + ":" + address.getPort() + "] success");
} catch (Exception e) {
LOGGER.error(logPrefix + "connect server[" + address.getHostName() + ":" + address.getPort() + "] error: ",
e);
exception = e;
thriftTSocket = null;// 这里是为了下面连接其他服务器
}
// 轮循所有ip
if (thriftTSocket == null) {
String hostName = address.getHostName();
int port = address.getPort();
Iterator addressIterator = this.addressProvider.addressIterator();
while (addressIterator.hasNext()) {
try {
address = addressIterator.next();
// 不再尝试连接之前已经连接失败的主机
if (address.getHostName().equals(hostName) && address.getPort() == port) {
continue;
}
thriftTSocket = new ThriftTSocket(address.getHostName(), address.getPort(), timeout);
thriftTSocket.open();
LOGGER.info(logPrefix + "connect server:[" + address.getHostName() + ":" + address.getPort()
+ "] success");
break;
} catch (Exception e) {
LOGGER.error(logPrefix + "connect server[" + address.getHostName() + ":" + address.getPort()
+ "] error: ", e);
exception = e;
thriftTSocket = null;
}
}
}
// 所有服务均无法建立连接时抛出异常
if (thriftTSocket == null) {
throw exception;
}
return thriftTSocket;
}
@Override
public void destroyObject(ThriftTSocket tsocket) throws Exception {
if (tsocket != null) {
try {
tsocket.close();
} catch (Exception e) {
}
}
}
@Override
public boolean validateObject(ThriftTSocket tsocket) {
if (tsocket == null) {
return false;
}
// 在成功创建连接后,将网络断掉,这里调用还是true
return tsocket.isOpen();
}
}
6.连接池
package省略
import省略
public class ThriftGenericObjectPool extends GenericObjectPool {
public ThriftGenericObjectPool(AddressProvider addressProvider, int maxActive, int maxIdle, int minIdle,
long maxWait) throws Exception {
/**
* 池策略:最大连接数,最大等待时间,最大空闲数,最小空闲数由人工配置,
* 最大连接数尽量=最大空闲数,最小空闲数尽量为0,以便清除无用线程
* 其他参数写死:
* GenericObjectPool.WHEN_EXHAUSTED_BLOCK:获取连接时阻塞超时时抛出异常
* GenericObjectPool.DEFAULT_TEST_ON_BORROW
* GenericObjectPool.DEFAULT_TEST_ON_RETURN
* 60*1000,检测空闲线程60秒运行一次
* 5,检测空闲线程运行一次检测 5条连接
* 60*10*1000,空闲线程最小空闲时间10分钟,超过10分钟后会被检测线程 remove
* GenericObjectPool.DEFAULT_TEST_WHILE_IDLE
* GenericObjectPool.DEFAULT_SOFT_MIN_EVICTABLE_IDLE_TIME_MILLIS
* GenericObjectPool.DEFAULT_LIFO 后入先出队列,保证不是所有的连接都在使用,及时被清除
*/
super(new ThriftConnectionPoolFactory(addressProvider), maxActive, GenericObjectPool.WHEN_EXHAUSTED_BLOCK,
maxWait, maxIdle, minIdle, GenericObjectPool.DEFAULT_TEST_ON_BORROW,
GenericObjectPool.DEFAULT_TEST_ON_RETURN, 60 * 1000, 5, 60 * 10 * 1000,
GenericObjectPool.DEFAULT_TEST_WHILE_IDLE,
GenericObjectPool.DEFAULT_SOFT_MIN_EVICTABLE_IDLE_TIME_MILLIS, GenericObjectPool.DEFAULT_LIFO);
}
}
7.执行器
package省略
import省略
public class ThriftInvocationHandler implements InvocationHandler {
private static Logger LOGGER = LoggerFactory.getLogger(ThriftInvocationHandler.class);
private static Log httpClientUtilLogger = LogFactory.getLog(HttpClientUtil.class);
private GenericObjectPool pool; // 连接池
private TServiceClientFactory tServiceClientFactory = null;
private Integer protocol;
private ThriftServiceStatus thriftServiceStatus;// 服务状态
private AddressProvider addressProvider;
public ThriftInvocationHandler(GenericObjectPool pool,
TServiceClientFactory tServiceClientFactory, Integer protocol, String serviceName,
AddressProvider addressProvider) {
this.pool = pool;
this.tServiceClientFactory = tServiceClientFactory;
this.protocol = protocol;
this.thriftServiceStatus = new ThriftServiceStatus(serviceName);
this.addressProvider = addressProvider;
}
@Override
public Object invoke(Object proxy, Method method, Object[] args) throws Exception {
String logPrefix = "ThriftInvocationHandler_";
ThriftTSocket thriftTSocket = null;
boolean ifBorrowException = true;
try {
// 服务处于“切服”状态时 直接返回null
if (!this.thriftServiceStatus.ifServiceUsable()) {
return null;
}
// 当第三方服务不可用时,会阻塞在这里一定时间后抛出异常,并进行服务状态统计
thriftTSocket = this.pool.borrowObject();
ifBorrowException = false;
String interfaceWholeName = this.getInterfaceName(method) + "&ip=" + thriftTSocket.getHostThrift() + ":"
+ thriftTSocket.getPortThrift();
LOGGER.info(logPrefix + interfaceWholeName + " borrowed:" + this.pool.getNumActive() + " idle:"
+ this.pool.getNumIdle() + " total :" + (this.pool.getNumActive() + this.pool.getNumIdle()));
long startTime = System.currentTimeMillis();
long costTime;
Object o = null;
try {
o = method.invoke(this.tServiceClientFactory.getClient(this.getTProtocol(thriftTSocket)), args);
costTime = System.currentTimeMillis() - startTime;
httpClientUtilLogger.info(this.getUrl(interfaceWholeName, args) + "|200|0|" + costTime + "|0");
} catch (Exception e) {
costTime = System.currentTimeMillis() - startTime;
httpClientUtilLogger.error(this.getUrl(interfaceWholeName, args) + "|000|0|" + costTime + "|1");
// 抛出异常的连接认为不可用,从池中remove掉
this.pool.invalidateObject(thriftTSocket);
thriftTSocket = null;
o = null;
}
return o;
} catch (Exception e) {
LOGGER.error("thrift invoke error", e);
if (ifBorrowException) {
this.thriftServiceStatus.checkThriftServiceStatus();
}
return null;
} finally {
if (thriftTSocket != null) {
this.pool.returnObject(thriftTSocket);
}
}
}
private String getInterfaceName(Method method) {
String interfaceName = method.getDeclaringClass().toString();
interfaceName = interfaceName.substring(10, interfaceName.length());
return interfaceName + "$" + method.getName();
}
private String getUrl(String service, Object[] args) {
StringBuilder wholeUrl = new StringBuilder("thrift://");
wholeUrl.append(service.substring(service.lastIndexOf("$") + 1, service.indexOf("&ip="))).append("/?")
.append("service=").append(service);
if (args != null) {
wholeUrl.append("&allParams=[ ");
for (Object object : args) {
wholeUrl.append(object);
}
wholeUrl.append(" ]");
}
return wholeUrl.toString();
}
private TProtocol getTProtocol(TSocket tSocket) {
// 服务端均为非阻塞类型
TTransport transport = new TFramedTransport(tSocket);
TProtocol tProtocol = null;
switch (this.protocol) {
case 1:
tProtocol = new TBinaryProtocol(transport);
break;
case 2:
tProtocol = new TCompactProtocol(transport);
break;
case 3:
tProtocol = new TJSONProtocol(transport);
break;
case 4:
tProtocol = new TSimpleJSONProtocol(transport);
break;
default:
tProtocol = new TBinaryProtocol(transport);
}
return tProtocol;
}
}
8.代理工厂
package省略
import省略
public class ThriftServiceClientProxyFactory implements FactoryBean
9.调用类
package省略
import省略
@Component
public class SearchTpDao {
private final static Logger log = LoggerFactory.getLogger(SearchTpDao.class);
@Resource
private GenericServing.Iface searchServing;
/**
* 搜索接口
*/
public GenericServingResponse search(String from, String dt, Integer page, Integer pageSize, String word,
Integer categoryId, Integer subCategoryId, String ph, String src, Integer mix, String order,
String searchContent, String splatid, String leIds, String eid, String repo_type, String ispay,
String albumFilter, String videoFilter, String jf, String sf, String stat, String countryArea,
String displayAppId, String displayPlatformId, CommonParam commonParam) {
String logPrefix = "search_" + from + "_" + dt + "_" + page + "_" + pageSize + "_" + word + "_" + categoryId
+ "_" + ph + "_" + src + "_" + mix + "_" + order + "_" + searchContent + "_" + splatid + "_" + leIds
+ "_" + eid + "_" + repo_type + "_" + ispay + "_" + albumFilter + "_" + videoFilter + "_" + jf + "_"
+ sf + "_" + stat + "_" + countryArea + "_" + displayAppId + "_" + displayPlatformId;
GenericServingResponse response = null;
try {
//getSearchRequest省略
response = this.searchServing.Serve(this.getSearchRequest(from, dt, page, pageSize, word, categoryId,
subCategoryId, ph, src, mix, order, searchContent, splatid, leIds, eid, repo_type, ispay,
albumFilter, videoFilter, jf, sf, stat, countryArea, null, displayAppId, displayPlatformId,
commonParam));
if (response != null && response.getSearch_response() != null) {
log.info(logPrefix + " " + response.getSearch_response().getEid());
}
} catch (Exception e) {
response = null;
log.error(logPrefix, e);
}
return response;
}
}
10.配置文件
11.thrift配置文件,
search.search.server.zookeeper=2.2.2.2:2222
search.search.server=1.1.1.1:1111
addressProvider已传递给Handler类,后期在连接执行完加入统计算法,给出该IP是否需要被remove。希望有thrift负载均衡、服务发现、高可用心得的同学分享下比较厉害的框架之类的。
今天看到有在服务端做负载均衡的Haproxy和nginx相关资料:
http://blog.csdn.net/dengzhilong_cpp/article/details/51729918
http://blog.csdn.net/ceasadan/article/details/52369045
http://www.07net01.com/2015/04/819651.html
参考:
http://blog.csdn.net/zhu_tianwei/article/details/44115667/