aws codedeploy 部署到ec2实例可能出现的常见问题和报错日志

参考资料

  • 如何对 Amazon EC2 实例上 CodeDeploy 部署失败问题进行故障排查?

codedeploy-agent目录结构

codedeploy-agent目录

$ ls -al
drwxr-xr-x 2 root root    69 Mar 27 16:28 bin
drwxr-xr-x 2 root root    84 Mar 27 16:28 certs
-rwxr--r-- 1 root root  1094 Feb 21 15:37 codedeploy_agent.gemspec
drwxr-xr-x 4 root root    76 Mar 27 16:30 deployment-root
-rwxr--r-- 1 root root   353 Feb 21 15:37 Gemfile
drwxr-xr-x 4 root root   176 Mar 27 16:28 lib
-rwxr--r-- 1 root root 10174 Feb 21 15:37 LICENSE
drwxr-xr-x 3 root root    18 Mar 27 16:28 state
drwxr-xr-x 4 root root    40 Mar 27 16:28 vendor
-r--r--r-- 1 root root    36 Feb 21 15:37 .version

部署的主目录,存放每次部署的tar包

$ pwd
/opt/codedeploy-agent/deployment-root

$ tree -L 2
├── deployment-instructions
│   ├── e2403155-2f1e-44f2-8da0-9a302e75d2d4-cleanup
│   ├── e2403155-2f1e-44f2-8da0-9a302e75d2d4-install.json
│   ├── e2403155-2f1e-44f2-8da0-9a302e75d2d4_last_successful_install
│   └── e2403155-2f1e-44f2-8da0-9a302e75d2d4_most_recent_install
├── deployment-logs
│   └── codedeploy-agent-deployments.log
├── e2403155-2f1e-44f2-8da0-9a302e75d2d4
│   ├── d-3EBSQMKPL
│   ├── d-50C3A4EF9
│   ├── d-6QWGKLLPL
│   ├── d-76X0BFLPL
│   └── d-B9OF9AEF9
└── ongoing-deployment

触发部署

2023-03-27 16:30:28 INFO  [codedeploy-agent(2905)]: [Aws::CodeDeployCommand::Client 200 0.029365 0 retries] get_deployment_specification(deployment_execution_id:"CodeDeploy/cn-north-1/prod/orpheus:public003/xxxxxxxxx:d-50C3A4EF9",host_identifier:"arn:aws-cn:ec2:cn-north-1:xxxxxxxxx:instance/i-0c701a0ba6d48c7f6")

部署成功

2023-03-27 16:30:28 INFO  [codedeploy-agent(2905)]: [Aws::CodeDeployCommand::Client 200 0.059 0 retries] put_host_command_complete(command_status:"Succeeded",diagnostics:{format:"JSON",payload:"{\"error_code\":0,\"script_name\":\"\",\"message\":\"Succeeded: \",\"log\":\"\"}"},host_command_identifier:"eCI6MX0=")

codedeploy服务角色没有权限

https://docs.amazonaws.cn/codedeploy/latest/userguide/getting-started-create-service-role.html

在这里插入图片描述

ec2没有权限

ec2实例没有权限访问s3(控制台显示access denied)

aws codedeploy 部署到ec2实例可能出现的常见问题和报错日志_第1张图片

查看codedeploy-agent日志

2023-03-27 16:33:42 INFO  [codedeploy-agent(2905)]: [Aws::CodeDeployCommand::Client 200 0.046698 0 retries] put_host_command_complete(command_status:"Failed",diagnostics:{format:"JSON",payload:"{\"error_code\":5,\"script_name\":\"\",\"message\":\"Access Denied\",\"log\":\"\"}"},host_command_identifier:"eyJiYXRjaElkIjoiNmNhNWUyODA0MjxfQ==")
2023-03-27 16:33:42 ERROR [codedeploy-agent(2905)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Error during perform: Aws::S3::Errors::AccessDenied - Access Denied

ec2没有配置instance profile

控制台出现缺失凭证报错

aws codedeploy 部署到ec2实例可能出现的常见问题和报错日志_第2张图片

agent日志

//启动时
2023-03-27 17:53:54 ERROR [codedeploy-agent(7587)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Missing credentials - please check if this instance was started with an IAM instance profile
// 运行时
2023-03-27 17:38:21 INFO  [codedeploy-agent(18867)]: [Aws::CodeDeployCommand::Client 200 0.036674 0 retries] put_host_command_complete(command_status:"Failed",diagnostics:{format:"JSON",payload:"{\"error_code\":5,\"script_name\":\"\",\"message\":\"missing credentials, provide credentials with one of the following options:\\n  - :access_key_id and :secret_access_key\\n  - :credentials\\n  - :credentials_provider\",\"log\":\"\"}"},host_command_identifier:"eyJiYXRjaEV4IjoxfQ==")
2023-03-27 17:38:21 ERROR [codedeploy-agent(18867)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Error during perform: Aws::Sigv4::Errors::MissingCredentialsError - missing credentials, provide credentials with one of the following options:
  - :access_key_id and :secret_access_key
  - :credentials
  - :credentials_provider - /opt/codedeploy-agent/vendor/gems/aws-sigv4-1.5.1/lib/aws-sigv4/signer.rb:654:in `extract_credentials_provider'
...
2023-03-27 17:38:21 INFO  [codedeploy-agent(18867)]: [Aws::CodeDeployCommand::Client 200 0.015171 0 retries] put_host_command_complete(command_status:"Failed",diagnostics:{format:"JSON",payload:"{\"error_code\":5,\"script_name\":\"\",\"message\":\"missing credentials, provide credentials with one of the following options:\\n  - :access_key_id and :secret_access_key\\n  - :credentials\\n  - :credentials_provider\",\"log\":\"\"}"},host_command_identifier:"eyJiYXRjaElkI=")

agent未运行

https://docs.amazonaws.cn/codedeploy/latest/userguide/codedeploy-agent-operations-verify.html

停止agent

$ sudo service codedeploy-agent status
The AWS CodeDeploy agent is running as PID 2901
$ sudo service codedeploy-agent stop

agent日志

2023-03-27 16:57:05 INFO  [codedeploy-agent(11110)]: Stopping master 2901
2023-03-27 16:57:05 INFO  [codedeploy-agent(2901)]: master 2901: Received TERM - stopping children and shutting down
2023-03-27 16:57:05 INFO  [codedeploy-agent(2905)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller of master 2901: Received TERM - setting internal shutting down flag and possibly finishing last run
2023-03-27 16:57:05 INFO  [codedeploy-agent(2905)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Gracefully shutting down agent child threads now, will wait up to 7200 seconds
2023-03-27 16:57:05 INFO  [codedeploy-agent(2905)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: All agent child threads have been shut down
2023-03-27 16:57:05 INFO  [codedeploy-agent(2905)]: agent exiting now

控制台deploy卡在pending

aws codedeploy 部署到ec2实例可能出现的常见问题和报错日志_第3张图片

超时之后,和无法访问公网报同样错误

aws codedeploy 部署到ec2实例可能出现的常见问题和报错日志_第4张图片

如果启动agent时无法访问公网

2023-03-27 17:08:48 ERROR [codedeploy-agent(16307)]: InstanceAgent::Plugins::CodeDeployPlugin::CodeDeployControl: Error during certificate verification on codedeploy endpoint https://codedeploy-commands.cn-north-1.amazonaws.com.cn
2023-03-27 17:08:48 ERROR [codedeploy-agent(16307)]: Error validating the SSL configuration: Invalid server certificate
2023-03-27 17:08:48 ERROR [codedeploy-agent(16307)]: booting child: error during start or run: SystemExit - Stopping CodeDeploy agent due to SSL validation error. - /opt/codedeployagent/lib/instance_agent/plugins/codedeploy/command_poller.rb:65:in `abort'
...
2023-03-27 17:08:48 ERROR [codedeploy-agent(16307)]: booting child: error during start or run: SystemExit - exit - /opt/codedeploy-agent/lib/instance_agent/runner/child.rb:98:in `exit'

无法访问s3终端节点

增加gateway endpoint并在策略中deny任何访问,直接报access denied

aws codedeploy 部署到ec2实例可能出现的常见问题和报错日志_第5张图片

agent日志

2023-03-27 17:15:15 INFO  [codedeploy-agent(18867)]: [Aws::CodeDeployCommand::Client 200 0.021775 0 retries] put_host_command_complete(command_status:"Failed",diagnostics:{format:"JSON",payload:"{\"error_code\":5,\"script_name\":\"\",\"message\":\"Access Denied\",\"log\":\"\"}"},host_command_identifier:"eyJiYXRjaElkIjoiYWfQ==")

无法访问全部公网

全部阶段卡住pending状态,控制台没有报错

aws codedeploy 部署到ec2实例可能出现的常见问题和报错日志_第6张图片

一段时间(5分钟)后出现报错,提示查看agent日志

aws codedeploy 部署到ec2实例可能出现的常见问题和报错日志_第7张图片

agent日志,在执行过程中如果出现联网问题

2023-03-27 18:15:57 INFO  [codedeploy-agent(17611)]: [Aws::CodeDeployCommand::Client 0 113.264982 3 retries] poll_host_command(host_identifier:"arn:aws-cn:ec2:cn-north-1:xxxxxxxxx:insta nce/i-0c701a0ba6d48c7f6") Seahorse::Client::NetworkingError execution expired
2023-03-27 16:47:36 ERROR [codedeploy-agent(2905)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Error polling for host commands: Seahorse::Client::NetworkingError - execution expired - /usr/share/ruby/net/http.rb:878:in `initialize'
/usr/share/ruby/net/http.rb:878:in `open'
/usr/share/ruby/net/http.rb:878:in `block in connect'
/usr/share/ruby/net/http.rb:877:in `connect'
/usr/share/ruby/net/http.rb:862:in `do_start'
/usr/share/ruby/net/http.rb:857:in `start'

...
2023-03-27 16:47:36 ERROR [codedeploy-agent(2905)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Network error: #

后续即使网络恢复仍旧会卡住不动,按照部署期间终止的情况会持续1小时

aws codedeploy 部署到ec2实例可能出现的常见问题和报错日志_第8张图片

只能访问s3服务

卡住不动,agent没有日志,因为无法和codedeploy server通信
aws codedeploy 部署到ec2实例可能出现的常见问题和报错日志_第9张图片
5分钟后超时
aws codedeploy 部署到ec2实例可能出现的常见问题和报错日志_第10张图片
agent仍旧是网络错误

2023-03-28 03:58:58 ERROR [codedeploy-agent(24144)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Network error: #<Seahorse::Client::NetworkingError: execution expired>
2023-03-28 03:58:59 INFO  [codedeploy-agent(24144)]: Version file found in /opt/codedeploy-agent/.version with agent version OFFICIAL_1.5.0-57_rpm.
2023-03-28 04:00:02 INFO  [codedeploy-agent(24144)]: [Aws::CodeDeployCommand::Client 0 62.105124 3 retries] poll_host_command(host_identifier:"arn:aws-cn:ec2:cn-north-1:xxxxxxxxx:instance/i-0c701a0ba6d48c7f6") Seahorse::Client::NetworkingError execution expired

2023-03-28 04:00:02 ERROR [codedeploy-agent(24144)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Error polling for host commands: Seahorse::Client::NetworkingError - execution expired - /usr/share/ruby/net/http.rb:878:in `initialize'

只能访问codedeploy服务

在vpc中codedeploy interface endpoint,但不添加s3路由

仍旧报错
aws codedeploy 部署到ec2实例可能出现的常见问题和报错日志_第11张图片

配置终端节点后需要修改agent配置文件并增加额外权限

$ sudo vim /etc/codedeploy-agent/conf/codedeployagent.yml
---
:log_aws_wire: false
:log_dir: '/var/log/aws/codedeploy-agent/'
:pid_dir: '/opt/codedeploy-agent/state/.pid/'
:program_name: codedeploy-agent
:root_dir: '/opt/codedeploy-agent/deployment-root'
:verbose: false
:wait_between_runs: 1
:proxy_uri:
:max_revisions: 5
:enable_auth_policy: true

增加的权限如下

{
  "Statement": [
    {
      "Action": [
        "codedeploy-commands-secure:GetDeploymentSpecification",
        "codedeploy-commands-secure:PollHostCommand",
        "codedeploy-commands-secure:PutHostCommandAcknowledgement",
        "codedeploy-commands-secure:PutHostCommandComplete"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}

没有权限的报错如下

2023-03-28 04:05:30 ERROR [codedeploy-agent(20997)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Cannot reach InstanceService: Aws::CodeDeployCommand::Errors::AccessDeniedException - User: arn:aws-cn:sts::xxxxxxxxx:assumed-role/CodeDeploy-EC2-Instance-Profile/i-0c701a0ba6d48c7f6 is not authorized to perform: codedeploy-commands-secure:PollHostCommand because no identity-based policy allows the codedeploy-commands-secure:PollHostCommand action

增加权限后agent尝试拉取文件,但是超时
aws codedeploy 部署到ec2实例可能出现的常见问题和报错日志_第12张图片
agent日志如下

2023-03-28 04:09:18 INFO  [codedeploy-agent(20997)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandExecutor: Downloading artifact bundle from bucket 'codebuild-bjs-output-bucket' and key 'web-project.zip', version '', etag ''
...
2023-03-28 04:10:20 INFO  [codedeploy-agent(20997)]: [Aws::CodeDeployCommand::Client 200 0.174302 0 retries] put_host_command_complete(command_status:"Failed",diagnostics:{format:"JSON",payload:"{\"error_code\":5,\"script_name\":\"\",\"message\":\"execution expired\",\"log\":\"\"}"},host_command_identifier:"eyJV4IjoxfQ==")

2023-03-28 04:10:20 ERROR [codedeploy-agent(20997)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Error during perform: Seahorse::Client::NetworkingError - execution expired - /usr/share/ruby/net/http.rb:878:in `initialize'

增加s3路由后尝试部署,能够正常部署,意味着私有网络内agent只需要访问s3和codedepoly服务即可

部署期间agent终止或实例终止

如果在生命周期事件进行中(例如,如果实例终止或 CodeDeploy 代理关闭)脚本未运行到完成,则部署状态可能需要一个小时才能显示为 “失败”。即使脚本中指定的超时时段不到 1 小时,也会发生此情况

部署开始后就中断,处于pending状态

aws codedeploy 部署到ec2实例可能出现的常见问题和报错日志_第13张图片

或者由于网络问题中间卡顿

aws codedeploy 部署到ec2实例可能出现的常见问题和报错日志_第14张图片

指定错误的revision位置

明确的报错

aws codedeploy 部署到ec2实例可能出现的常见问题和报错日志_第15张图片

agent日志

2023-03-27 17:56:39 INFO  [codedeploy-agent(9395)]: [Aws::CodeDeployCommand::Client 200 0.02019 0 retries] put_host_command_complete(command_status:"Failed",diagnostics:{format:"JSON",payload:"{\"error_code\":5,\"script_name\":\"\",\"message\":\"The specified key does not exist.\",\"log\":\"\"}"},host_command_identifier:"eyJiYXfQ==")

revesion所在s3桶策略限制

手动拉取s3对象没有权限

$ aws s3 cp s3://codebuild-bjs-output-bucket/web-project.zip .
fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden

仍旧很明显

aws codedeploy 部署到ec2实例可能出现的常见问题和报错日志_第16张图片

agent日志

2023-03-27 18:06:42 INFO  [codedeploy-agent(9395)]: [Aws::CodeDeployCommand::Client 200 0.022363 0 retries] put_host_command_complete(command_status:"Failed",diagnostics:{format:"JSON",payload:"{\"error_code\":5,\"script_name\":\"\",\"message\":\"Access Denied\",\"log\":\"\"}"},host_command_identifier:"eyJiYXjoxfQ==")

执行部署报错

在appspec.yml中指定错误代码

#!/bin/bash
sudo systemctl start tomcat.service
sudo systemctl enable tomcat.service
sudo systemctl start httpd.service
sudo systemctl enable httpd.service
exit 1

查看控制台
在这里插入图片描述

agent日志如下,可以看到具体的报错脚本和位置

2023-03-28 06:10:58 INFO  [codedeploy-agent(20997)]: [Aws::CodeDeployCommand::Client 200 0.064576 0 retries] put_host_command_complete(command_status:"Failed",diagnostics:{format:"JSON",payload:"{\"error_code\":4,\"script_name\":\"scripts/start_server.sh\",\"message\":\"Script at specified location: scripts/start_server.sh run as user root failed with exit code 1\",\"log\":\"LifecycleEvent - ApplicationStart\\nScript - scripts/start_server.sh\\n\"}"},host_command_identifier:"eyJiYXRjaOjF9")

2023-03-28 06:10:58 ERROR [codedeploy-agent(20997)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Error during perform: InstanceAgent::Plugins::CodeDeployPlugin::ScriptError - Script at specified location: scripts/start_server.sh run as user root failed with exit code 1 - /opt/codedeploy-agent/lib/instance_agent/plugins/codedeploy/hook_executor.rb:196:in `execute_script'

删除日志的问题

手动删除codedeploy-agent.log之后并不会自动创建新的文件,需要重启agent

sudo service codedeploy-agent stop
sudo service codedeploy-agent start

分析部署日志

https://docs.amazonaws.cn/codedeploy/latest/userguide/deployments-view-logs.html

打开 CodeDeploy 代理日志文件:

less /var/log/aws/codedeploy-agent/codedeploy-agent.log

打开 CodeDeploy 脚本日志文件:

less /opt/codedeploy-agent/deployment-root/deployment-group-ID/deployment-ID/logs/scripts.log

使用ssm排查

aws codedeploy 部署到ec2实例可能出现的常见问题和报错日志_第17张图片

基本没啥用

aws codedeploy 部署到ec2实例可能出现的常见问题和报错日志_第18张图片

你可能感兴趣的:(AWS,aws,java,云计算)