nacos 1.2.0时加入了基于RBAC的权限控制,这便于实际生产中使用。权限控制这一块的内容在官网有相应的博客介绍https://nacos.io/zh-cn/blog/nacos%201.2.0%20guide.html
在使用最新版本1.2.1集成到Spring Cloud的时候,使用创建的其他用户会出现访问指标接口/nacos/v1/ns/operator/metrics没有权限的情况,还有其他的一些接口也会出现这样的情况,本文就该问题一步步的分析,主要介绍的是一种寻找问题的思路,可能其中有对nacos不了解导致自己理解错误,希望大家能指出。
request: /nacos/v1/ns/operator/metrics failed, servers: [127.0.0.1:8848], code: 403, msg: Whitelabel Error Page
This application has no explicit mapping for /error, so you are seeing this as a fallback.
Sun Apr 19 17:38:06 CST 2020There was an unexpected error (type=Forbidden, status=403).authorization failed!
从上面这个问题,我们来一步步的探索为什么发生了这样的事情
在nacos中使用权限控制
如果没有权限控制,其他用户可以恶意调用Open-API注销服务、修改配置,这种情况对于生产来说不能容忍的。下面介绍如何在nacos启用权限功能,具体详情可参考官网博客。
在application.properties配置文件中有如下配置,可以配置权限相关的内容,可以看到在1.2.0的版本中,已经作废了spring.security.enabled的相关配置,使用新的权限配置,只需要配置nacos.core.auth.enabled=true,即可开启nacos的权限功能,在管理界面中配置好相应的命名空间,用户、角色、权限等,万事俱备。
注:命名空间最佳实践,在官网博客也有相关介绍,https://nacos.io/zh-cn/blog/namespace-endpoint-best-practices.html
#*************** Access Control Related Configurations ***************#
### If enable spring security, this option is deprecated in 1.2.0:
#spring.security.enabled=false
### The ignore urls of auth, is deprecated in 1.2.0:
nacos.security.ignore.urls=/,/error,/**/*.css,/**/*.js,/**/*.html,/**/*.map,/**/*.svg,/**/*.png,/**/*.ico,/console-fe/public/**,/v1/auth/**,/v1/console/health/**,/actuator/**,/v1/console/server/**
### The auth system to use, currently only 'nacos' is supported:
nacos.core.auth.system.type=nacos
### If turn on auth system:
nacos.core.auth.enabled=true
### The token expiration in seconds:
nacos.core.auth.default.token.expire.seconds=18000
### The default token:
nacos.core.auth.default.token.secret.key=SecretKey012345678901234567890123456789012345678901234567890123456789
### Turn on/off caching of auth information. By turning on this switch, the update of auth information would have a 15 seconds delay.
nacos.core.auth.caching.enabled=false
在Spring Cloud中使用nacos的服务发现
引入相关的依赖
dependencies {
implementation 'com.alibaba.cloud:spring-cloud-starter-alibaba-nacos-discovery'
}
启用服务发现
@EnableDiscoveryClient
@SpringBootApplication
public class DemoApplication {
public static void main(String[] args) {
SpringApplication.run(DemoApplication .class, args);
}
}
在配置文件中配置nacos的相关内容,配置文件为bootstrap.properties,并非application.properties,两者的区别可以百度搜索一下
spring.cloud.nacos.server-addr=127.0.0.1:8848
spring.cloud.nacos.username=test
spring.cloud.nacos.password=123456780
spring.cloud.nacos.discovery.namespace=local
spring.cloud.nacos.discovery.metadata.info.name=${spring.application.name}
spring.cloud.nacos.discovery.metadata.user.name=${spring.security.user.name}
spring.cloud.nacos.discovery.metadata.user.password=${spring.security.user.password}
spring.cloud.nacos.config.namespace=local
注:这里需要配置config.namespace才能正常启动,但是实际只引用了discovery相关的内容,有空再去研究下原因
当你的项目还引用了actuator的依赖,启动就会发现文章开始的时候出现的错误了,访问/nacos/v1/ns/operator/metrics没有权限。
分析问题
当出现这个问题的时候,我以为是真的没有权限导致,或者我的帐号密码不对,当我给新加的角色加上public的读写权限发现还是出现403没有权限,于是开始debug从源码中找到问题
启动服务,发现actuator健康状态异常
找到对应的健康检查代码实现,根据源码发现是调用namingService.getServerStatus();来判断服务是否正常
public class NacosDiscoveryHealthIndicator extends AbstractHealthIndicator {
private final NamingService namingService;
public NacosDiscoveryHealthIndicator(NamingService namingService) {
this.namingService = namingService;
}
@Override
protected void doHealthCheck(Health.Builder builder) throws Exception {
// Just return "UP" or "DOWN"
String status = namingService.getServerStatus();
// Set the status to Builder
builder.status(status);
switch (status) {
case "UP":
builder.up();
break;
case "DOWN":
builder.down();
break;
default:
builder.unknown();
break;
}
}
}
注:这个类NacosDiscoveryHealthIndicator 是在NacosDiscoveryEndpointAutoConfiguration 中自动配置,使用的名字是nacos-discovery,含有“-”的名字在actuator中是有警告的,Endpoint ID 'nacos-discovery' contains invalid characters, please migrate to a valid format.这个希望官方在后续修改为符合规范的命名吧。
@Configuration(proxyBeanMethods = false)
@ConditionalOnClass(Endpoint.class)
@ConditionalOnNacosDiscoveryEnabled
public class NacosDiscoveryEndpointAutoConfiguration {
@Bean
@ConditionalOnEnabledHealthIndicator("nacos-discovery")
public HealthIndicator nacosDiscoveryHealthIndicator(NacosDiscoveryProperties nacosDiscoveryProperties) {
return new NacosDiscoveryHealthIndicator(nacosDiscoveryProperties.namingServiceInstance());
}
}
上面我们发现状态的判断是在namingService.getServerStatus()中进行,跟踪调试,发现实际在NamingProxy类的方法中出现了我们的目标,在这里发现调用了/operator/metrics这个地址,里面也可以跟进去debug,其实是构造发送http请求,最终返回的403错误。
public boolean serverHealthy() {
try {
String result = reqAPI(UtilAndComs.NACOS_URL_BASE + "/operator/metrics",
new HashMap(2), HttpMethod.GET);
JSONObject json = JSON.parseObject(result);
String serverStatus = json.getString("status");
return "UP".equals(serverStatus);
} catch (Exception e) {
return false;
}
}
还是不懂,既然带了token,为啥还会出现403错误呢。我们用nacos的Open-API来验证
第一步先登录获取token
POST http://127.0.0.1:8848/nacos/v1/auth/users/login
Content-Type: application/x-www-form-urlencoded
username=test&password=123456780
返回内容
{
"globalAdmin": false,
"tokenTtl": 18000,
"accessToken": "eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ0ZXN0IiwiZXhwIjoxNTg3MzA5OTQwfQ.hqH9NOfKJIr8TcPAHFx0yqnPqYWSIFIjSkP3fklQP_w"
}
我们拿到token再去调用其他的接口,Bearer这个权限验证方式可以百度搜索,发现返回的内容是config data not exist,这表示没有配置文件,但是token验证是通过的
GET http://127.0.0.1:8848/nacos/v1/cs/configs?dataId=demo-admin-server.properties&group=DEFAULT_GROUP&tenant=local
Authorization: Bearer eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ0ZXN0IiwiZXhwIjoxNTg3MzA5OTQwfQ.hqH9NOfKJIr8TcPAHFx0yqnPqYWSIFIjSkP3fklQP_w
然后调试我们本文的主角,发现返回403 Forbidden authorization failed!,为啥不行呢???!!!
GET http://127.0.0.1:8848/nacos/v1/ns/operator/metrics
Authorization: Bearer eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ0ZXN0IiwiZXhwIjoxNTg3MzA5OTQwfQ.hqH9NOfKJIr8TcPAHFx0yqnPqYWSIFIjSkP3fklQP_w
启动大招,上GitHub拿到nacos的源码,找问题
全局搜索,在OperatorController类中找到了metrics接口的定义,@GetMapping比较了解,@Secured是权限相关的配置
@Secured(resource = "naming/metrics", action = ActionTypes.READ)
@GetMapping("/metrics")
public JSONObject metrics(HttpServletRequest request) {
JSONObject result = new JSONObject();
int serviceCount = serviceManager.getServiceCount();
int ipCount = serviceManager.getInstanceCount();
int responsibleDomCount = serviceManager.getResponsibleServiceCount();
int responsibleIPCount = serviceManager.getResponsibleInstanceCount();
result.put("status", serverStatusManager.getServerStatus().name());
result.put("serviceCount", serviceCount);
result.put("instanceCount", ipCount);
result.put("raftNotifyTaskCount", raftCore.getNotifyTaskCount());
result.put("responsibleServiceCount", responsibleDomCount);
result.put("responsibleInstanceCount", responsibleIPCount);
result.put("cpu", SystemUtils.getCPU());
result.put("load", SystemUtils.getLoad());
result.put("mem", SystemUtils.getMem());
return result;
}
全局搜索"authorization failed!",发现出现在NacosAuthManager类中,根据角色服务判断是否有权限,permission是请求的资源的权限,需要根据用户的权限来匹配。
@Override
public void auth(Permission permission, User user) throws AccessException {
if (Loggers.AUTH.isDebugEnabled()) {
Loggers.AUTH.debug("auth permission: {}, user: {}", permission, user);
}
if (!roleService.hasPermission(user.getUserName(), permission)) {
throw new AccessException("authorization failed!");
}
}
更进一步,在NacosRoleServiceImpl类中,根据用户获取所有的角色,如果有ROLE_ADMIN角色直接开启管理员模式,其他的就需要一步步的判断权限是否正确。
public boolean hasPermission(String username, Permission permission) {
List roleInfoList = getRoles(username);
if (Collections.isEmpty(roleInfoList)) {
return false;
}
// Global admin pass:
for (RoleInfo roleInfo : roleInfoList) {
if (GLOBAL_ADMIN_ROLE.equals(roleInfo.getRole())) {
return true;
}
}
// Old global admin can pass resource 'console/':
if (permission.getResource().startsWith(NacosAuthConfig.CONSOLE_RESOURCE_NAME_PREFIX)) {
return false;
}
// For other roles, use a pattern match to decide if pass or not.
for (RoleInfo roleInfo : roleInfoList) {
List permissionInfoList = getPermissions(roleInfo.getRole());
if (Collections.isEmpty(permissionInfoList)) {
continue;
}
for (PermissionInfo permissionInfo : permissionInfoList) {
String permissionResource = permissionInfo.getResource().replaceAll("\\*", ".*");
String permissionAction = permissionInfo.getAction();
if (permissionAction.contains(permission.getAction()) &&
Pattern.matches(permissionResource, permission.getResource())) {
return true;
}
}
}
return false;
}
上面的代码中有权限Permission的相关判断,主要的两个属性action和resource是什么呢?我们从数据库中来看,action表示读写权限,直接字符串包含即可判断;Resource启动了正则匹配,那问题就出现在这里了。
我们看InstanceController类中与OperatorController类中接口的定义,区别在于resource的定义,一个是字符串,另一个是根据方法生成的字符串
@GetMapping("/list")
@Secured(parser = NamingResourceParser.class, action = ActionTypes.READ)
public JSONObject list(HttpServletRequest request) throws Exception {}
@Secured(resource = "naming/metrics", action = ActionTypes.READ)
@GetMapping("/metrics")
public JSONObject metrics(HttpServletRequest request) {}
public class NamingResourceParser implements ResourceParser {
private static final String AUTH_NAMING_PREFIX = "naming/";
@Override
public String parseName(Object request) {
HttpServletRequest req = (HttpServletRequest) request;
String namespaceId = req.getParameter(CommonParams.NAMESPACE_ID);
String serviceName = req.getParameter(CommonParams.SERVICE_NAME);
String groupName = req.getParameter(CommonParams.GROUP_NAME);
if (StringUtils.isBlank(groupName)) {
groupName = NamingUtils.getGroupName(serviceName);
}
serviceName = NamingUtils.getServiceName(serviceName);
StringBuilder sb = new StringBuilder();
if (StringUtils.isNotBlank(namespaceId)) {
sb.append(namespaceId);
}
sb.append(Resource.SPLITTER);
if (StringUtils.isBlank(serviceName)) {
sb.append("*")
.append(Resource.SPLITTER)
.append(AUTH_NAMING_PREFIX)
.append("*");
} else {
sb.append(groupName)
.append(Resource.SPLITTER)
.append(AUTH_NAMING_PREFIX)
.append(serviceName);
}
return sb.toString();
}
}
所以Permission中resource的定义为namespaceId:groupName:serviceName,实际在管理界面配置的时候还没有具体的groupName与serviceName配置。可能官方也还在开发吧。问题找到了,实际就是权限的资源格式不统一导致的,可能是我才疏学浅没领悟到精髓,也可能是官方还在Coding...