在Spring Cloud构建的分布式系统里,不可避免地会出现服务调用失败的情况,如:超时、异常等。如何能够保证在一个依赖出问题的情况下,不会导致整体服务失败。Hystrix提供了服务降级、服务熔断、线程隔离、请求缓存、请求合并以及服务监控等强大功能,能够在一个、或多个依赖出现问题时保证系统依然可用。
Hystrix容错保护同样是在服务消费方完成的,只要对前文中的eureka-client-consumer稍作修改即可。
1.直接使用Hystrix
1.1 添加Hystrix依赖;
compile('org.springframework.cloud:spring-cloud-starter-netflix-eureka-client:2.0.0.RELEASE')
compile('org.springframework.cloud:spring-cloud-starter-netflix-ribbon:2.0.0.RELEASE')
compile('org.springframework.cloud:spring-cloud-starter-openfeign:2.0.0.RELEASE')
compile('org.springframework.cloud:spring-cloud-starter-netflix-hystrix:2.0.0.RELEASE')
1.2 修改Spring Boot启动类,添加@EnableHystrix或@EnableCircuitBreaker注解;
@EnableCircuitBreaker
@EnableFeignClients
@SpringBootApplication
class Application
fun main(args: Array) {
runApplication(*args)
}
启动类中的注解实际上可以用@ SpringCloudApplication替代,因为@SpringCloudApplication包含了@SpringBootApplication
、
@EnableDiscoveryClient
(可省略)、@EnableCircuitBreaker
这三个注解,也说明了一个标准的Spring Cloud应用包含了Hystrix容错保护功能。
@Target({ElementType.TYPE})
@Retention(RetentionPolicy.RUNTIME)
@Documented
@Inherited
@SpringBootApplication
@EnableDiscoveryClient
@EnableCircuitBreaker
public @interface SpringCloudApplication {
}
1.3 修改TestController,使用@HystrixCommand注解指定降级的方法
@RestController
class TestController {
@Autowired
private lateinit var testService: TestService
@HystrixCommand(fallbackMethod = "fallback")
@GetMapping("/test")
fun test(): String? {
return testService.test()
}
fun fallback(): String{
return "fallback..."
}
}
1.4 测试验证
分别启动eureka-server、eureka-client-provider、eureka-client-consumer实例,请求localhost:30001/test结果能正常返回;此时停掉eureka-client-provider,再次访问该请求,返回了fallack的结果;重新启动eureka-client-provider,又能正常返回请求结果。
2.Feign整合Hystrix
以上内容是直接使用Hystrix来处理服务容错,前面的文章中介绍过Feign声明式服务,而Feign对Hystrix也提供了支持,所以我们在Feign的基础上使用Hystrix就变得更为简单了。
2.1 添加依赖
compile('org.springframework.cloud:spring-cloud-starter-netflix-eureka-client:2.0.0.RELEASE')
compile('org.springframework.cloud:spring-cloud-starter-netflix-ribbon:2.0.0.RELEASE')
compile('org.springframework.cloud:spring-cloud-starter-openfeign:2.0.0.RELEASE')
compile('org.springframework.cloud:spring-cloud-starter-netflix-hystrix:2.0.0.RELEASE')
2.2 在Spring Boot启动类添加@SpringCloudApplication及@EnableFeignClients注解
@EnableFeignClients
@SpringCloudApplication
class Application
fun main(args: Array) {
runApplication(*args)
}
2.3 在@FeignClient注解上指定Hystrix用于fallback的实现类,针对每个方法处理,并且去除原来的@HystrixCommand
TestController.kt
@RestController
class TestController {
@Autowired
private lateinit var testService: TestService
@GetMapping("/test")
fun test(): String? {
return testService.test()
}
}
TestService.kt
@FeignClient(value = "eureka-client-provider", fallback = TestServiceFallback::class, configuration = [(FeignLogConfiguration::class)])
interface TestService {
@GetMapping("/test")
fun test(): String
}
TestServiceFallback.kt
@Component
class TestServiceFallback : TestService {
override fun test(): String {
return "fallback"
}
}
2.4 开启feign的hystrix支持,这一步特别重要
feign:
hystrix:
enabled: true
2.5 其它步骤与1中一致,实现的效果是一致的,只不过以一种更为统一、方便的方式通过Feign整合了Hystrix。
3.Hystrix的更多功能
3.1 依赖隔离
开发者在使用@HystrixCommand等注解的时候,实际上是使用了Hystrix的命令模式,通过命令模式实现对服务调用操作的封装,命令在一个独立线程中进行执行。
Hystrix为每个命令创建一个独立的线程池,这样即使某个依赖的服务出现异常,也只是对该依赖服务的调用产生影响,而不会影响其他的服务。
3.2 断路器
当某个服务的错误率超过一定阀值时,Hystrix可以触发断路机制,停止向该服务请求一段时间。阀值有几个指标:1.一定时间(默认10s)内错误一定数量(默认20次);2.请求错误数量超过一定百分比(默认50%)。
当某个服务的断路器打开后,Hystrix将不会请求至该服务,直接fallback,这样对于已经确定的故障在一定时间内不会再尝试。
3.3 自动恢复
当断路器打开一段时间后,Hystrix会进入"半开"状态,断路器会允许一个请求尝试对服务进行请求,如果该服务可以调用成功,则关闭断路器,否则将继续保持断路器打开,并进入倒计时,倒计时结束后继续尝试自动恢复。
4.Hystrix监控
实现Hystrix监控非常简单,
添加需要spring-boot-starter-actuator
依赖,
compile('org.springframework.boot:spring-boot-starter-actuator')
并设置management.endpoints.web.exposure.include: hystrix.stream
management:
endpoints:
web:
exposure:
include: hystrix.stream
访问http://localhost:30001/actuator/hystrix.stream可以在页面上看到如下数据,这是对单机应用的监控。当然,我们需要访问提供的服务,才会出现这些统计数据。
data: {"type":"HystrixCommand","name":"test","group":"TestController","currentTime":1513135304152,"isCircuitBreakerOpen":false,"errorPercentage":33,"errorCount":1,"requestCount":3,"rollingCountBadRequests":0,"rollingCountCollapsedRequests":0,"rollingCountEmit":0,"rollingCountExceptionsThrown":0,"rollingCountFailure":0,"rollingCountFallbackEmit":0,"rollingCountFallbackFailure":0,"rollingCountFallbackMissing":0,"rollingCountFallbackRejection":0,"rollingCountFallbackSuccess":1,"rollingCountResponsesFromCache":0,"rollingCountSemaphoreRejected":0,"rollingCountShortCircuited":0,"rollingCountSuccess":2,"rollingCountThreadPoolRejected":0,"rollingCountTimeout":1,"currentConcurrentExecutionCount":0,"rollingMaxConcurrentExecutionCount":1,"latencyExecute_mean":0,"latencyExecute":{"0":0,"25":0,"50":0,"75":0,"90":0,"95":0,"99":0,"99.5":0,"100":0},"latencyTotal_mean":0,"latencyTotal":{"0":0,"25":0,"50":0,"75":0,"90":0,"95":0,"99":0,"99.5":0,"100":0},"propertyValue_circuitBreakerRequestVolumeThreshold":20,"propertyValue_circuitBreakerSleepWindowInMilliseconds":5000,"propertyValue_circuitBreakerErrorThresholdPercentage":50,"propertyValue_circuitBreakerForceOpen":false,"propertyValue_circuitBreakerForceClosed":false,"propertyValue_circuitBreakerEnabled":true,"propertyValue_executionIsolationStrategy":"THREAD","propertyValue_executionIsolationThreadTimeoutInMilliseconds":1000,"propertyValue_executionTimeoutInMilliseconds":1000,"propertyValue_executionIsolationThreadInterruptOnTimeout":true,"propertyValue_executionIsolationThreadPoolKeyOverride":null,"propertyValue_executionIsolationSemaphoreMaxConcurrentRequests":10,"propertyValue_fallbackIsolationSemaphoreMaxConcurrentRequests":10,"propertyValue_metricsRollingStatisticalWindowInMilliseconds":10000,"propertyValue_requestCacheEnabled":true,"propertyValue_requestLogEnabled":true,"reportingHosts":1,"threadPool":"TestController"}
data: {"type":"HystrixThreadPool","name":"TestController","currentTime":1513135304152,"currentActiveCount":0,"currentCompletedTaskCount":3,"currentCorePoolSize":10,"currentLargestPoolSize":3,"currentMaximumPoolSize":10,"currentPoolSize":3,"currentQueueSize":0,"currentTaskCount":3,"rollingCountThreadsExecuted":2,"rollingMaxActiveThreads":1,"rollingCountCommandRejections":0,"propertyValue_queueSizeRejectionThreshold":5,"propertyValue_metricsRollingStatisticalWindowInMilliseconds":10000,"reportingHosts":1}
ping:
5.Hystrix Dashboard
Hystrix统计的监控数据其实不是很友好,所以Hystrix Dashboard提供了可视化的界面来展示统计数据。
Hystrix Dashboard支持三种监控方式:
- 默认的集群监控:http://turbine-hostname:port/turbine.stream
- 指定的集群监控:http://turbine-hostname:port/turbine.stream?cluster=[clusterName]
- 单体应用的监控:http://hystrix-app:port/hystrix.stream
5.1 单体应用监控
1.新建一个Spring Boot应用hystrix-dashboard,添加Hystrix相关依赖
compile('org.springframework.cloud:spring-cloud-starter-netflix-hystrix-dashboard:2.0.0.RELEASE')
2.在Spring Boot启动类添加@EnableHystrixDashboard注解
@EnableHystrixDashboard
@SpringBootApplication
class Application
fun main(args: Array) {
runApplication(*args)
}
3.在application.properties中添加配置
spring:
application:
name: hystrix-dashboard
4.启动应用,访问http://localhost:30001/hystrix,进入Hystrix Dashboard页面
5.在Hystrix Dashboard页面第一行输入前面提到的地址http://localhost:30001/actuator/hystrix.stream,点击“Monitor Stream”按钮,就能进入详细的数据统计页面(如下图,这里用了之前的截图,端口号略有差别),调用相应的服务后页面会刷新监控数据。
5.2 集群监控
To Be Continued...