参考资料
codedeploy-agent目录
$ ls -al
drwxr-xr-x 2 root root 69 Mar 27 16:28 bin
drwxr-xr-x 2 root root 84 Mar 27 16:28 certs
-rwxr--r-- 1 root root 1094 Feb 21 15:37 codedeploy_agent.gemspec
drwxr-xr-x 4 root root 76 Mar 27 16:30 deployment-root
-rwxr--r-- 1 root root 353 Feb 21 15:37 Gemfile
drwxr-xr-x 4 root root 176 Mar 27 16:28 lib
-rwxr--r-- 1 root root 10174 Feb 21 15:37 LICENSE
drwxr-xr-x 3 root root 18 Mar 27 16:28 state
drwxr-xr-x 4 root root 40 Mar 27 16:28 vendor
-r--r--r-- 1 root root 36 Feb 21 15:37 .version
部署的主目录,存放每次部署的tar包
$ pwd
/opt/codedeploy-agent/deployment-root
$ tree -L 2
├── deployment-instructions
│ ├── e2403155-2f1e-44f2-8da0-9a302e75d2d4-cleanup
│ ├── e2403155-2f1e-44f2-8da0-9a302e75d2d4-install.json
│ ├── e2403155-2f1e-44f2-8da0-9a302e75d2d4_last_successful_install
│ └── e2403155-2f1e-44f2-8da0-9a302e75d2d4_most_recent_install
├── deployment-logs
│ └── codedeploy-agent-deployments.log
├── e2403155-2f1e-44f2-8da0-9a302e75d2d4
│ ├── d-3EBSQMKPL
│ ├── d-50C3A4EF9
│ ├── d-6QWGKLLPL
│ ├── d-76X0BFLPL
│ └── d-B9OF9AEF9
└── ongoing-deployment
触发部署
2023-03-27 16:30:28 INFO [codedeploy-agent(2905)]: [Aws::CodeDeployCommand::Client 200 0.029365 0 retries] get_deployment_specification(deployment_execution_id:"CodeDeploy/cn-north-1/prod/orpheus:public003/xxxxxxxxx:d-50C3A4EF9",host_identifier:"arn:aws-cn:ec2:cn-north-1:xxxxxxxxx:instance/i-0c701a0ba6d48c7f6")
部署成功
2023-03-27 16:30:28 INFO [codedeploy-agent(2905)]: [Aws::CodeDeployCommand::Client 200 0.059 0 retries] put_host_command_complete(command_status:"Succeeded",diagnostics:{format:"JSON",payload:"{\"error_code\":0,\"script_name\":\"\",\"message\":\"Succeeded: \",\"log\":\"\"}"},host_command_identifier:"eCI6MX0=")
https://docs.amazonaws.cn/codedeploy/latest/userguide/getting-started-create-service-role.html
ec2实例没有权限访问s3(控制台显示access denied)
查看codedeploy-agent日志
2023-03-27 16:33:42 INFO [codedeploy-agent(2905)]: [Aws::CodeDeployCommand::Client 200 0.046698 0 retries] put_host_command_complete(command_status:"Failed",diagnostics:{format:"JSON",payload:"{\"error_code\":5,\"script_name\":\"\",\"message\":\"Access Denied\",\"log\":\"\"}"},host_command_identifier:"eyJiYXRjaElkIjoiNmNhNWUyODA0MjxfQ==")
2023-03-27 16:33:42 ERROR [codedeploy-agent(2905)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Error during perform: Aws::S3::Errors::AccessDenied - Access Denied
控制台出现缺失凭证报错
agent日志
//启动时
2023-03-27 17:53:54 ERROR [codedeploy-agent(7587)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Missing credentials - please check if this instance was started with an IAM instance profile
// 运行时
2023-03-27 17:38:21 INFO [codedeploy-agent(18867)]: [Aws::CodeDeployCommand::Client 200 0.036674 0 retries] put_host_command_complete(command_status:"Failed",diagnostics:{format:"JSON",payload:"{\"error_code\":5,\"script_name\":\"\",\"message\":\"missing credentials, provide credentials with one of the following options:\\n - :access_key_id and :secret_access_key\\n - :credentials\\n - :credentials_provider\",\"log\":\"\"}"},host_command_identifier:"eyJiYXRjaEV4IjoxfQ==")
2023-03-27 17:38:21 ERROR [codedeploy-agent(18867)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Error during perform: Aws::Sigv4::Errors::MissingCredentialsError - missing credentials, provide credentials with one of the following options:
- :access_key_id and :secret_access_key
- :credentials
- :credentials_provider - /opt/codedeploy-agent/vendor/gems/aws-sigv4-1.5.1/lib/aws-sigv4/signer.rb:654:in `extract_credentials_provider'
...
2023-03-27 17:38:21 INFO [codedeploy-agent(18867)]: [Aws::CodeDeployCommand::Client 200 0.015171 0 retries] put_host_command_complete(command_status:"Failed",diagnostics:{format:"JSON",payload:"{\"error_code\":5,\"script_name\":\"\",\"message\":\"missing credentials, provide credentials with one of the following options:\\n - :access_key_id and :secret_access_key\\n - :credentials\\n - :credentials_provider\",\"log\":\"\"}"},host_command_identifier:"eyJiYXRjaElkI=")
https://docs.amazonaws.cn/codedeploy/latest/userguide/codedeploy-agent-operations-verify.html
停止agent
$ sudo service codedeploy-agent status
The AWS CodeDeploy agent is running as PID 2901
$ sudo service codedeploy-agent stop
agent日志
2023-03-27 16:57:05 INFO [codedeploy-agent(11110)]: Stopping master 2901
2023-03-27 16:57:05 INFO [codedeploy-agent(2901)]: master 2901: Received TERM - stopping children and shutting down
2023-03-27 16:57:05 INFO [codedeploy-agent(2905)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller of master 2901: Received TERM - setting internal shutting down flag and possibly finishing last run
2023-03-27 16:57:05 INFO [codedeploy-agent(2905)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Gracefully shutting down agent child threads now, will wait up to 7200 seconds
2023-03-27 16:57:05 INFO [codedeploy-agent(2905)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: All agent child threads have been shut down
2023-03-27 16:57:05 INFO [codedeploy-agent(2905)]: agent exiting now
控制台deploy卡在pending
超时之后,和无法访问公网报同样错误
如果启动agent时无法访问公网
2023-03-27 17:08:48 ERROR [codedeploy-agent(16307)]: InstanceAgent::Plugins::CodeDeployPlugin::CodeDeployControl: Error during certificate verification on codedeploy endpoint https://codedeploy-commands.cn-north-1.amazonaws.com.cn
2023-03-27 17:08:48 ERROR [codedeploy-agent(16307)]: Error validating the SSL configuration: Invalid server certificate
2023-03-27 17:08:48 ERROR [codedeploy-agent(16307)]: booting child: error during start or run: SystemExit - Stopping CodeDeploy agent due to SSL validation error. - /opt/codedeployagent/lib/instance_agent/plugins/codedeploy/command_poller.rb:65:in `abort'
...
2023-03-27 17:08:48 ERROR [codedeploy-agent(16307)]: booting child: error during start or run: SystemExit - exit - /opt/codedeploy-agent/lib/instance_agent/runner/child.rb:98:in `exit'
增加gateway endpoint并在策略中deny任何访问,直接报access denied
agent日志
2023-03-27 17:15:15 INFO [codedeploy-agent(18867)]: [Aws::CodeDeployCommand::Client 200 0.021775 0 retries] put_host_command_complete(command_status:"Failed",diagnostics:{format:"JSON",payload:"{\"error_code\":5,\"script_name\":\"\",\"message\":\"Access Denied\",\"log\":\"\"}"},host_command_identifier:"eyJiYXRjaElkIjoiYWfQ==")
全部阶段卡住pending状态,控制台没有报错
一段时间(5分钟)后出现报错,提示查看agent日志
agent日志,在执行过程中如果出现联网问题
2023-03-27 18:15:57 INFO [codedeploy-agent(17611)]: [Aws::CodeDeployCommand::Client 0 113.264982 3 retries] poll_host_command(host_identifier:"arn:aws-cn:ec2:cn-north-1:xxxxxxxxx:insta nce/i-0c701a0ba6d48c7f6") Seahorse::Client::NetworkingError execution expired
2023-03-27 16:47:36 ERROR [codedeploy-agent(2905)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Error polling for host commands: Seahorse::Client::NetworkingError - execution expired - /usr/share/ruby/net/http.rb:878:in `initialize'
/usr/share/ruby/net/http.rb:878:in `open'
/usr/share/ruby/net/http.rb:878:in `block in connect'
/usr/share/ruby/net/http.rb:877:in `connect'
/usr/share/ruby/net/http.rb:862:in `do_start'
/usr/share/ruby/net/http.rb:857:in `start'
...
2023-03-27 16:47:36 ERROR [codedeploy-agent(2905)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Network error: #
后续即使网络恢复仍旧会卡住不动,按照部署期间终止的情况会持续1小时
卡住不动,agent没有日志,因为无法和codedeploy server通信
5分钟后超时
agent仍旧是网络错误
2023-03-28 03:58:58 ERROR [codedeploy-agent(24144)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Network error: #<Seahorse::Client::NetworkingError: execution expired>
2023-03-28 03:58:59 INFO [codedeploy-agent(24144)]: Version file found in /opt/codedeploy-agent/.version with agent version OFFICIAL_1.5.0-57_rpm.
2023-03-28 04:00:02 INFO [codedeploy-agent(24144)]: [Aws::CodeDeployCommand::Client 0 62.105124 3 retries] poll_host_command(host_identifier:"arn:aws-cn:ec2:cn-north-1:xxxxxxxxx:instance/i-0c701a0ba6d48c7f6") Seahorse::Client::NetworkingError execution expired
2023-03-28 04:00:02 ERROR [codedeploy-agent(24144)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Error polling for host commands: Seahorse::Client::NetworkingError - execution expired - /usr/share/ruby/net/http.rb:878:in `initialize'
在vpc中codedeploy interface endpoint,但不添加s3路由
配置终端节点后需要修改agent配置文件并增加额外权限
$ sudo vim /etc/codedeploy-agent/conf/codedeployagent.yml
---
:log_aws_wire: false
:log_dir: '/var/log/aws/codedeploy-agent/'
:pid_dir: '/opt/codedeploy-agent/state/.pid/'
:program_name: codedeploy-agent
:root_dir: '/opt/codedeploy-agent/deployment-root'
:verbose: false
:wait_between_runs: 1
:proxy_uri:
:max_revisions: 5
:enable_auth_policy: true
增加的权限如下
{
"Statement": [
{
"Action": [
"codedeploy-commands-secure:GetDeploymentSpecification",
"codedeploy-commands-secure:PollHostCommand",
"codedeploy-commands-secure:PutHostCommandAcknowledgement",
"codedeploy-commands-secure:PutHostCommandComplete"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
没有权限的报错如下
2023-03-28 04:05:30 ERROR [codedeploy-agent(20997)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Cannot reach InstanceService: Aws::CodeDeployCommand::Errors::AccessDeniedException - User: arn:aws-cn:sts::xxxxxxxxx:assumed-role/CodeDeploy-EC2-Instance-Profile/i-0c701a0ba6d48c7f6 is not authorized to perform: codedeploy-commands-secure:PollHostCommand because no identity-based policy allows the codedeploy-commands-secure:PollHostCommand action
增加权限后agent尝试拉取文件,但是超时
agent日志如下
2023-03-28 04:09:18 INFO [codedeploy-agent(20997)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandExecutor: Downloading artifact bundle from bucket 'codebuild-bjs-output-bucket' and key 'web-project.zip', version '', etag ''
...
2023-03-28 04:10:20 INFO [codedeploy-agent(20997)]: [Aws::CodeDeployCommand::Client 200 0.174302 0 retries] put_host_command_complete(command_status:"Failed",diagnostics:{format:"JSON",payload:"{\"error_code\":5,\"script_name\":\"\",\"message\":\"execution expired\",\"log\":\"\"}"},host_command_identifier:"eyJV4IjoxfQ==")
2023-03-28 04:10:20 ERROR [codedeploy-agent(20997)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Error during perform: Seahorse::Client::NetworkingError - execution expired - /usr/share/ruby/net/http.rb:878:in `initialize'
增加s3路由后尝试部署,能够正常部署,意味着私有网络内agent只需要访问s3和codedepoly服务即可
如果在生命周期事件进行中(例如,如果实例终止或 CodeDeploy 代理关闭)脚本未运行到完成,则部署状态可能需要一个小时才能显示为 “失败”。即使脚本中指定的超时时段不到 1 小时,也会发生此情况
部署开始后就中断,处于pending状态
或者由于网络问题中间卡顿
明确的报错
agent日志
2023-03-27 17:56:39 INFO [codedeploy-agent(9395)]: [Aws::CodeDeployCommand::Client 200 0.02019 0 retries] put_host_command_complete(command_status:"Failed",diagnostics:{format:"JSON",payload:"{\"error_code\":5,\"script_name\":\"\",\"message\":\"The specified key does not exist.\",\"log\":\"\"}"},host_command_identifier:"eyJiYXfQ==")
手动拉取s3对象没有权限
$ aws s3 cp s3://codebuild-bjs-output-bucket/web-project.zip .
fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
仍旧很明显
agent日志
2023-03-27 18:06:42 INFO [codedeploy-agent(9395)]: [Aws::CodeDeployCommand::Client 200 0.022363 0 retries] put_host_command_complete(command_status:"Failed",diagnostics:{format:"JSON",payload:"{\"error_code\":5,\"script_name\":\"\",\"message\":\"Access Denied\",\"log\":\"\"}"},host_command_identifier:"eyJiYXjoxfQ==")
在appspec.yml中指定错误代码
#!/bin/bash
sudo systemctl start tomcat.service
sudo systemctl enable tomcat.service
sudo systemctl start httpd.service
sudo systemctl enable httpd.service
exit 1
agent日志如下,可以看到具体的报错脚本和位置
2023-03-28 06:10:58 INFO [codedeploy-agent(20997)]: [Aws::CodeDeployCommand::Client 200 0.064576 0 retries] put_host_command_complete(command_status:"Failed",diagnostics:{format:"JSON",payload:"{\"error_code\":4,\"script_name\":\"scripts/start_server.sh\",\"message\":\"Script at specified location: scripts/start_server.sh run as user root failed with exit code 1\",\"log\":\"LifecycleEvent - ApplicationStart\\nScript - scripts/start_server.sh\\n\"}"},host_command_identifier:"eyJiYXRjaOjF9")
2023-03-28 06:10:58 ERROR [codedeploy-agent(20997)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Error during perform: InstanceAgent::Plugins::CodeDeployPlugin::ScriptError - Script at specified location: scripts/start_server.sh run as user root failed with exit code 1 - /opt/codedeploy-agent/lib/instance_agent/plugins/codedeploy/hook_executor.rb:196:in `execute_script'
手动删除codedeploy-agent.log
之后并不会自动创建新的文件,需要重启agent
sudo service codedeploy-agent stop
sudo service codedeploy-agent start
https://docs.amazonaws.cn/codedeploy/latest/userguide/deployments-view-logs.html
打开 CodeDeploy 代理日志文件:
less /var/log/aws/codedeploy-agent/codedeploy-agent.log
打开 CodeDeploy 脚本日志文件:
less /opt/codedeploy-agent/deployment-root/deployment-group-ID/deployment-ID/logs/scripts.log
基本没啥用