Use:olocalhost:2020/api/v1/metrics
The endpoint mentioned at 3:18 in this video is no longer available in the trading agent. For other options, seeAgent is running but data is not ingested.
This document helps diagnose problems installing or running the Operations Agent.
Agent Diagnostic Tool for Virtual Machines
The Agent Diagnostic Tool collects critical local debug information from your virtual machines for all of the following agents: Operations Agent, Legacy Logging Agent, and Legacy Monitoring Agent. Debug information includes things like project information, VM information, agent configuration, agent logs, agent service status, and information that often requires manual work to collect. The tool also checks the local virtual machine environment to ensure that it meets certain requirements for the agents to function properly, such as network connectivity and required permissions.
When submitting a customer case to an agent in a virtual machine, run the agentdiagnostics tool and attach the collected information to the case. Before attaching information to the support case, remove all sensitive information, such as passwords. Providing this information reduces the time it takes to resolve your support case.
The agent diagnostic tool must be run from within the virtual machine, so you will usually need to SSH into the virtual machine first. The following command retrieves the agent diagnostic tool and runs it:
linux
curl -sSO https://dl.google.com/cloudagents/diagnose-agents.shsudo bash diagnostic-agents.sh
windows
(New-Object Net.WebClient).DownloadFile("https://dl.google.com/cloudagents/diagnose-agents.ps1", "${env:UserProfile}\diagnose-agents.ps1")Invoke-Expression " ${env:Perfil de usuário}\diagnose-agents.ps1"
Follow the output from running the script to locate the files that contain the collected information. You can usually find them on/var/tmp/google-agents
directory in Linux and in$env:LOCALAPPDATE/Temp
on Windows, unless you customized the output directory when running the script.
For detailed information, see thediagnostic-agents.sh
script no linux oudiagnostic-agents.ps1
script no Windows.
O agente não instala
You may encounter the following errors when running theinstall script.
The operating system is not supported. The error message may be similar to the following:
linux
https://packages.cloud.google.com/yum/repos/google-cloud-ops-agent-el6-x86_64-all/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22 - "The requested URL returned a error: 404 Not found" Trying with another mirror. To resolve this issue, refer to the following wiki article https://wiki.centos.org/yum-errors If the above article does not help resolve this issue, use https://bugs. centos.org/.Error: Unable to retrieve repository metadata (repomd.xml) for repository: google-cloud-ops-agent. Check your path and try again.
The VM already has thecloud registrar agentor thecloud monitoring agentinstalled and conflicts with the new agent. The error message may be similar to the following:
linux
Error: Problem: problem with installed package stackdriver-agent-6.0.5-1.el8.x86_64 - package google-cloud-ops-agent-0.1.0-1.el8.x86_64 conflicts with stackdriver-agent provided by stackdriver-agent -6.0.5-1.el8.x86_64
Ops Agent uses new configuration files that are not supported by older agents. For more information, see theConfigure the operations agentguide.
To fix this error, do the following:
Save custom configuration files for thecloud monitoring agentit's himcloud registrar agent.
uninstall the old onecloud monitoring agentycloud registrar agent.
After uninstalling the agent, the Google Cloud Console may take up to an hour to report this change.
Agent is installed but not running
Agent services are not running
When the agent service is running as expected, you might see the following status:
for linux
computer@debian9:~$ sudo systemctl status google-cloud-ops-agent"*"● google-cloud-ops-agent.service - Google Cloud Ops agent loaded: loaded (/lib/systemd/system/google-cloud-ops -agent.service; enabled; provider default: enabled) Active: active (outgoing) since Thu 2021-08-05 20:33:44 UTC; 7 s ago Process: 2240 ExecStart=/bin/true (code=exited, status=0/SUCCESS) Process: 2214 ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -en /etc/google-cloud - ops-agent/config.yaml (code=exited, status=0/SUCCESS) Main PID: 2240 (code=exited, status=0/SUCCESS) Tasks: 0 (threshold: 4915) CGroup: /system.slice/google - cloud-ops-agent.serviceAug 05 20:33:44 debian9 systemd[1]: Starting Google Cloud Ops Agent...Aug 05 20:33:44 debian9 systemd[1]: Starting Google Cloud Ops Agent.● google- cloud -ops-agent-fluent-bit.service - Google Cloud Ops Agent - Logging agent loaded: loaded (/lib/systemd/system/google-cloud-ops-agent-fluent-bit.service; static; default value Provider : activated) Drop-In: /lib/systemd/system/google-cloud-ops-agent-fluent-bit.service.d └─directories.conf Active: active (running) since Thursday 2021-08-05 20:33:44 UTC; 7 sec ago Process: 2234 ExecStartPre=/bin/mkdir -p ${RUNTIME_DIRECTORY} ${STATE_DIRECTORY} ${LOGS_DIRECTORY} (code=exited, state=0/SUCCESS) Process: 2216 ExecStartPre=/opt/google-cloud-ops - agent/libexec/google_cloud_ops_agent_engine -service=fluentbit -in /etc/google-cloud-ops-agent/config.yaml -logs ${LOGS_DIRECTORY} -state ${STATE_DIRECTORY} (code=exit, state=0/SUCCESS) PID main: 2247 (fluent bit) Tasks: 22 (threshold: 4915) CGroup: /system.slice/google-cloud-ops-agent-fluent-bit.service └─2247 /opt/google-cloud-ops-agent/subagents /fluent-bit/bin/fluent-bit --config /run/google-cloud-ops-agent-fluent-bit/fluent_bit_main.conf --parser /run/google-cloud-ops-agent-fluent-bit/fluent_bit_parser .conf --log_file /var/log/google-cloud-ops-agent/subagents/logging-module.log --storage_path /var/lib/google-cloud-ops-agent/fluent-bit/buffers August 5th at 20:33:44 debian9 systemd[1]: Starting Google Cloud Ops Agent - Logging Agent... Aug 5th 20:33:44 debian9 systemd[ 1]: Started Google Cloud Ops Agent - Logging Agent.Aug 05 20:33:44 debian9 fluent-bit[2247]: Fluent Bit v1.7.8Aug 05 20:33:44 debian9 fluent-bit[2247]: * Copyright ( C) 2019 -2021 The Fluent Bit AuthorsAug 05 20:33:44 debian9 fluent-bit[2247]: * Copyright (C) 2015-2018 Treasure DataAug 05 20:33:44 debian9 fluent-bit[2247]: * Fluent Bit is a CNCF subproject under the umbrella of FluentdAug 05 20:33:44 debian9 fluent-bit[2247]: * https://fluentbit.io● google-cloud-ops-agent-opentelemetry-collector.service - Google Cloud Ops Agent - Metrics Agent Loaded: loaded ( /lib/systemd/system/google-cloud-ops-agent-opentelemetry-collector.service; static; vendor default: enabled) Drop-In: /lib/systemd/system/google-cloud-ops-agent-opentelemetry-collector.service.d └─directories.conf Active: active (running) since Thursday 05/ 2021/08 20:33:44 UTC; 7 sec ago Process: 2237 ExecStartPre=/bin/mkdir -p ${RUNTIME_DIRECTORY} ${STATE_DIRECTORY} ${LOGS_DIRECTORY} (code=exited, state=0/SUCCESS) Process: 2215 ExecStartPre=/opt/google-cloud-ops - agent/libexec/google_cloud_ops_agent_engine -service=otel -in /etc/google-cloud-ops-agent/config.yaml -logs ${LOGS_DIRECTORY} (code=exited, status=0/SUCCESS) Main PID: 2251 (otelopscol) Tasks: 6 (limit: 4915) CGroup: /system.slice/google-cloud-ops-agent-opentelemetry-collector.service └─2251 /opt/google-cloud-ops-agent/subagents/opentelemetry-collector/otelopscol - - add-instance-id=false --config=/run/google-cloud-ops-agent-opentelemetry-collector/otel.yamlAug 05 20:33:45 debian9 otelopscol[2251]: 2021-08-05T20:33: 45.234 Z info builder/pipelines_builder.go:51 Pipeline is starting... {"pipeline_name": "metrics/system", "pipeline_datatype": "metrics"}Aug 5th 8:33:45 PM debian9 otelopscol[2251]: 2021 - 08 -05T20:33:45.234Z info builder/pipelines_builder.go:62 Pipeline started. {"pipeline_name": "metrics/system", "pipeline_datatype": "metrics"} Aug 5 20:33:45 debian9 otelopscol[2251]: 2021-08-05T20:33:45.234Z info service/service.go: 192 Starting receivers... Aug 5 20:33:45 debian9 otelopscol[2251]:2021-08-05T20:33:45.235Z info builder/receivers_builder.go:70 Receiver is starting... {"kind" : "receiver" , "name": "hostmetrics/hostmetrics"}5 Aug 20:33:45 debian9 otelopscol[2251]: 2021-08-05T20:33:45.235Z info builder/receivers_builder.go:75 Receiver started. {"type": "receiver", "name": "hostmetrics/hostmetrics"} Aug 5 8:33:45 PM debian9 otelopscol[2251]: 2021-08-05T20:33:45.236Z info builder/receivers_builder.go: 70 Receiver is starting... {"type": "receiver", "name": "prometheus/agent"} Aug 5th 20:33:45 debian9 otelopscol[2251]:2021-08-05T20:33:45,236 Z information discovery /manager.go:195 Initial Provider {"type": "receiver", "name": "prometheus/agent", "level": "debug", "provider": "static/0", " subs": " [otel-collector]"}5 Aug 20:33:45 debian9 otelopscol[2251]: 2021-08-05T20:33:45.236Z info builder/receivers_builder.go:75 Receiver started. {"type": "receiver", "name": "prometheus/agent"} Aug 5 20:33:45 debian9 otelopscol[2251]: 2021-08-05T20:33:45.236Z info service/collector.go: 182 Everything is ready. Start running and processing data. Aug 05 20:33:45 debian9 otelopscol[2251]:2021-08-05T20:33:45.256Z info discovery/manager.go:213 Discovery channel closed {"type": "receiver", " name": " prometheus / agent", "level": "debug", "vendor": "static/0"}
for Windows
Get-Service google-cloud-ops-agent*Status Name DisplayName------ ---- -----------Running google-cloud-op... Google Cloud Ops AgentRunning google - cloud-op... Google Cloud Ops Agent - Logging AgentRunning google-cloud-op... Google Cloud Ops Agent - Metrics Agent
If the agent service is not running, you might see the following status:
linux
$ sudo service google-cloud-ops-agent status● google-cloud-ops-agent.service - Google Cloud Ops Agent loaded: loaded (/lib/systemd/system/google-cloud-ops-agent.service; enabled; default provider: enabled) Active: inactive (dead) since Wed 2021-06-30 21:20:43 UTC; 6 seconds ago
windows
Get-Service google-cloud-ops-agentStatus Name DisplayName------ ---- -----------Detenido google-cloud-ops-agent Agente de Google Cloud Ops
To fix this error, run the following command to start the service:
linux
start service sudo google-cloud-ops-agent
windows
google-cloud-ops-agent startup service
If the service does not start, the configuration may be invalid.
Conflict with currently installed agents
The VM already has thecloud registrar agentor thecloud monitoring agentinstalled and its configuration conflicts with the configuration of the new agent. The error message may be similar to the following:
windows
We detected an existing Windows service for the StackdriverLogging agent, which is not supported by the operations agent when the operations agent configuration has a non-empty log section. Either remove the logging hive from the operations agent configuration or disable the StackdriverLogging agent and try again to enable the operations agent.
To fix this error, you have two options:
Disable the conflicting section of the Ops Agent configuration file. For more information, see theConfigure the operations agentguide.
disable conflictcloud registrar agentor thecloud monitoring agent.
- Save custom configuration files for thecloud registrar agent.
- uninstall the old onecloud monitoring agentycloud registrar agent.
After uninstalling the agent, the Google Cloud Console may take up to an hour to report this change.
invalid configuration
If the configuration is invalid, you may see the following error when trying to restart the agent service:
linux
$ sudo service google-cloud-ops-agent restart \ && sudo service google-cloud-ops-agent status● google-cloud-ops-agent-fluent-bit.service - Google Cloud Ops Agent - Agente de registro Carregado: Carregado (/ usr/lib/systemd/system/google-cloud-ops-agent-fluent-bit.service; static; padrão do fornecedor: desativado) Drop-In: /usr/lib/systemd/system/google-cloud-ops -agent- fluent-bit.service.d └─directories.conf Ativo: falhou (Resultado: código de saída) desde quarta-feira 30/06/2021 22:21:08 UTC; 2 segundos atrás Processo: 1141421 ExecStart=/opt/google-cloud-ops-agent/subagents/fluent-bit/bin/fluent-bit --config ${RUNTIME_DIRECTORY}/fluent_bit_main.conf --parser ${RUNTIME_DIRECTORY}/fluent_bit_parser . conf --log_> Processo: 1141847 ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -service=fluentbit -in /etc/google-cloud-ops-agent/config.yaml -logs ${LOGS_DIRECTORY} -state ${STATE_DIR> PID principal: 1141421 (code=exited, state=0/SUCCESS) 30 de junho 22:21:08 centos8-2 systemd[1]: google-cloud-ops-agent-fluent-bit.service : Processo de controle encerrado, code=exitstatus=1 Jun 30 22:21:08 centos8-2 systemd[1]: google-cloud-ops-agent-fluent-bit.service: error with result ' exit-code'. 30 de junho 22:21:08 centos8-2 systemd[1]: Falha ao iniciar o Google Cloud Ops Agent - Logging Agent. 30 de junho 22:21:08 centos8-2 systemd[1]: google-cloud-ops-agent-fluent-bit. service: Service RestartSec=100ms expirou, programando reinicialização. 30 de junho 22:21:08 centos8-2 systemd[1]: google-cloud-ops-agent-fluent-bit.se rvice: Trabalho de reinicialização agendado, contador de reinicialização em 5. 30 de junho 22:21:08 centos8-2 systemd[1]: Google Cloud Ops Agent interrompido - Agente de registro. 30 de junho 22:21:08 centos8-2 systemd[1]: google-cloud-ops-agent-fluent-bit.service: solicitação de início repetida muito rápido. 30 de junho 22:21:08 centos8-2 systemd[1]: google-cloud-ops-agent-fluent-bit.service: Error with result 'exit code'. 30 de junho 22:21:08 centos8-2 systemd[1]: Falha ao iniciar o Google Cloud Ops Agent - Logging Agent.
Usectl diary
to get the exact error message:
sudo diaryctl -xe | grep "google_cloud_ops_agent_engine"
You may see a message similar to the following:
Jun 30 22:00:26centos8-2 google_cloud_ops_agent_engine[1141491]:2021-06-30 22:00:26 agent config file is invalid YAML. verbose error: yaml: line 21: expected key not found
windows
could not generate config files: could not parse config: yaml: line 20: could not find expected ':'
To correct the error, correct the invalid configuration and restart the agent. For reference, see theConfigure the operations agentguide.
Agent is running but data is not ingested
Use Metrics Explorer to query the agentuptime
metric and verify that the agent component,Google-cloud-ops-agent-metrics
ogoogle-cloud-ops-registration-agent
, is writing in the metric.
In the Google Cloud console, selectSurveillanceor click on the following button:
Go to Monitoring
In the navigation pane, select
metrics explorer.
Select theMQLEyelash.
Type the following query and clickRun:
get gce_instance| metric 'agent.googleapis.com/agent/uptime'| alignment rate (1m) | every 1m
Does the agent send logs to Cloud Logging?
Check local metrics
These steps require you to log into the virtual machine using SSH.
- Is the registration module working? Use the following commands to verify:
linux
sudo systemctl estado google-cloud-ops-agent"*"
windows
Open Windows PowerShell as an administrator and run:
Get the google-cloud-ops-agent service
You can also check the status of the service in the Services app and inspect running processes in the Task Manager app.
Check registry module registration
This step requires you to log into the virtual machine via SSH.
You can find the log module logs at/var/log/google-cloud-ops-agente/subagentes/*.log
for Linux andC:\ProgramData\Google\Cloud Operations\Ops Agent\log\logging-module.log
for Windows. If there are no logs, then the agent service is not running correctly. I go to theAgent is installed but not runningsection first to correct this condition.
You may see 403 permission errors when writing to the LoggingAPI. For example:
[2020/10/13 18:55:09] [warning] [output:stackdriver:stackdriver.0] error{"error": { "code": 403, "message": "Failed Cloud Logging API used in project 147627806769 before or disabled. Enable it by visiting https://console.developers.google.com/apis/api/logging.googleapis.com/overview?project=147627806769 and try again. If you enabled this API recently, please wait a few minutes for the action to propagate to our systems and try again.", "status": "PERMISSION_DENIED", "details": [ { "@type": "type.googleapis.com/google.rpc.Help" , " links": [ { "description": "Google Developer Console API Enablement", "url": "https://console.developers.google.com/apis/api/logging.googleapis.com/overview?project= 147627806769" } ] } ]}}
To fix this error,enable registration APIand set therecord writerOccupation.
You may see a quota issue for the Registration API. For example:
error="8:Insufficient tokens for quota 'logging.googleapis.com/write_requests' and limit 'WriteRequestsPerMinutePerProject' for service 'logging.googleapis.com' for consumer 'project_number:648320274015'." error_code="8"
To fix this error, increase the quota or reduce the performance of the registry.
You may see the following errors in the module log:
{"error":"invalid_request","error_description":"Service account not enabled on this instance"}
o
unable to get token from metadata server
These errors might indicate that you deployed the agent without a specified service account or credentials. For information about how to resolve this issue, seeAuthorize the operations agent.
Does the agent send metrics to Cloud Monitoring?
Check metrics module log
This step requires you to log into the virtual machine via SSH.
You can find the metrics module logs at syslog. If there are no logs, then the agent service is not running correctly. I go to theAgent is installed but not runningsection first to correct this condition.
you can see
permission denied
write errors in the monitoring API. This error occurs if the permissions for the operations agent are not set correctly. For example:Nov 2 14:51:27 test-ops-agent-error otelopscol[412]: 2021-11-02T14:51:27.343Z#011info#011exporterhelper/queued_retry.go:231#011Export failed. It will retry the request after the interval.#011{"type": "exporter", "name": "googlecloud", "error": "[rpc error: code = Permission denied desc = Monitoring Permission.timeSeries.create denied (or resource may not exist).; error rpc: code = Permission denied desc = Permission monitor.timeSeries.create denied (or resource may not exist).]", "interval": "6.934781228s"}
To fix this error,enable monitoring APIand set theMonitoring Metrics WriterOccupation.
you can see
ResourceExhausted
write errors in the monitoring API. This error occurs if the project is reaching the monitoring API quota limit. For example:Nov 2 18:48:32 test-ops-agent-error otelopscol[441]:2021-11-02T18:48:32.175Z#011info#011exporterhelper/queued_retry.go:231#011Export failed. Retry the request after the timeout.#011{ "type": "exporter", "name": "googlecloud", "error": "rpc error: code = ResourceExhausted desc = Quota Exceeded for quota metric 'Requests totals' and limit 'Total requests per minute per user' for service 'monitoring.googleapis.com' for consumer 'project_number:8563942476'.\nerror details: name = ErrorInfo reason = RATE_LIMIT_EXCEEDED domain = googleapis.com metadata = map [consumer:projects /8563942476 quota_limit :DefaultRequestsPerMinutePerUser quota_metric:monitoring.googleapis.com/default_requests service:monitoring.googleapis.com]", "interval": "2.641515416s"}
To fix this error, increase the quota or reduce the performance of the metrics.
You may see the following errors in the module log:
{"error":"invalid_request","error_description":"Service account not enabled on this instance"}
o
unable to get token from metadata server
These errors might indicate that you deployed the agent without a specified service account or credentials. For information about how to resolve this issue, seeAuthorize the operations agent.
Inspect automatic agent logs
If the agent is unable to ingest logs into Cloud Logging, you may need to inspect the logs locally on the virtual machine to troubleshoot the issue.
linux
To inspect your own records that are written todiary
, run the following command:
journalctl -u google-cloud-ops-agente*
To inspect the autologs that the logging module writes to disk, run the following command:
vim /var/log/google-cloud-ops-agent/subagents/logging-module.log
windows
To inspect your own records that are written toWindows event logs
, run the following command:
Get-WinEvent -FilterHashtable @{ Logname='Aplicativo'; ProviderName='google-cloud-ops-agent*' } | Format-Tabela -AutoSize -Fit
To inspect the autologs that the logging module writes to disk, run the following command:
bloco de notas "C:\ProgramData\Google\Cloud Operations\Ops Agent\log\logging-module.log"
To inspect the records of theWindows Service Control Manager
for Ops Agent services, run the following command:
Get-WinEvent -FilterHashtable @{ Logname='System'; ProviderName='Service Control Administrator' } | Where-Object -Property Message -Match 'Google Cloud Ops Agent' | Format-Table -AutoSize -Adjust
Configure automatic rotation of log files on Linux virtual machines
To limit the size of the registry sub-registration agent in/var/log/google-cloud-ops-agent/subagentes/logging-module.log
, install and configure thelogloader
Utility.
install the
logloader
utility by running the following command:In Debian and Ubuntu
sudo apt install logrotate
Not CentOS, RHEL and Fedora
sudo yum install logrotate
create a
logloader
config file in/etc/logrotate.d/google-cloud-ops-agent.conf
.sudo tee /etc/logrotate.d/google-cloud-ops-agent.conf > /dev/null << EOF# logrotate config to rotate Google Cloud Ops Agent automatic log file.# See https://manpages. debian.org /jessie/logrotate/logrotate.8.en.html for# the complete options./var/log/google-cloud-ops-agent/subagents/logging-module.log{ # All log files are rotated the days. daily # Log files are rotated so many times before they are removed. This # effectively limits the disk space used by the Ops Agent autolog files. rotate 30 # The log files are rotated when they grow above the maximum size, even before the # additionally specified time interval maxsize 256M # Ignore the rotation if the log file is missing. lackok # Do not rotate the register if it is empty. notifempty # Older versions of log files are gzipped by default. compress # Defer compression of the previous log file to the next # rotation cycle. delay compression}EOF
To set up
crontab
osystemdtimer
to trigger thelogloader
utility periodically.
After log rotation takes effect, you will see the rotated files in the/var/log/google-cloud-ops-agent/subagents/
directory. The results are similar to the following output:
/var/log/google-cloud-ops-agent/subagents$ ls -lhtotal 24K-rw-r--r-- 1 root root 717 3 de setembro 19:54 logging-module.log-rw-r--r - - 1 root root 6.8K Set 3 19:51 logging-module.log.1-rw-r--r-- 1 root root 874 Set 3 19:50 logging-module.log.2.gz-rw -r - -r-- 1 root root 873 Set 3 19:50 logging-module.log.3.gz-rw-r--r-- 1 root root 3.2K Set 3 19:34 logging-module.log.4. gz
To test log rotation, do the following:
Temporarily reduce the size of the file on which rotation is triggered by setting the
maximum size
valor a1 k
no/etc/logrotate.d/google-cloud-ops-agent.conf
office hour.Enable the agent's autolog file to be greater than 1K by restarting the agent multiple times:
restart service sudo google-cloud-ops-agent
they wait in
crontab
osystem timer
take effect to trigger thelogloader
utility or activate thelogloader
utility manually by running this command:sudo logrotate /etc/logrotate.d/google-cloud-ops-agent.conf
Make sure you see the rotated log files in the
/var/log/google-cloud-ops-agent/subagents/
directory.Reset log rotation settings by restoring the original
maximum size
valeria.
Fully reset agent state
If the agent goes into an unrecoverable state, follow these steps to restore the agent to a new state.
linux
Stop the agent service:
sudo service google-cloud-ops-stop agent
Remove the agent package:
curl -sSO https://dl.google.com/cloudagents/add-google-cloud-ops-agent-repo.shsudo bash add-google-cloud-ops-agent-repo.sh --uninstall --remove-repo
Delete the automatic agent logs on disk:
sudo rm -rf /var/log/google-cloud-ops-agent
Delete agent local buffers on disk:
sudo rm -rf /var/lib/google-cloud-ops-agent/fluent-bit/buffers/*/
Reinstall and restart the agent:
curl -sSO https://dl.google.com/cloudagents/add-google-cloud-ops-agent-repo.shsudo bash add-google-cloud-ops-agent-repo.sh --also-installsudo service google- cloud operations agent restart
windows
Stop the agent service:
Stop-Service google-cloud-ops-agent -Force;Get-Service google-cloud-ops-agent* | %{sc.exe delete $_};taskkill /f /fi "SERVICES eq google-cloud-ops-agent*";
Remove the agent package:
(New-Object Net.WebClient).DownloadFile("https://dl.google.com/cloudagents/add-google-cloud-ops-agent-repo.ps1", "${env:UserProfile}\add-google -cloud-ops-agent-repo.ps1");$env:REPO_SUFFIX="";Invoke-Expression "${env:UserProfile}\add-google-cloud-ops-agent-repo.ps1 -Desinstalar -RemoveRepo"
Delete the automatic agent logs on disk:
rmdir -R -ErrorAction SilentlyContinue "C:\ProgramData\Google\Cloud Operations\Ops Agent\log";
Delete agent local buffers on disk:
Get-ChildItem -Path "C:\ProgramData\Google\Cloud Operations\Ops Agent\run\buffers\" -Directory -ErrorAction SilentlyContinue | %{rm -r -Ruta $_.Nome completo}
Reinstall and restart the agent:
(New-Object Net.WebClient).DownloadFile("https://dl.google.com/cloudagents/add-google-cloud-ops-agent-repo.ps1", "${env:UserProfile}\add-google -cloud-ops-agent-repo.ps1");$env:REPO_SUFFIX="";Invoke-Expression "${env:UserProfile}\add-google-cloud-ops-agent-repo.ps1 -AlsoInstall"
Reset but save buffer files
If the virtual machine does not have bad buffer fragments (that is, there are noformat check failed
messages in the operations agent autolog file), you can ignore the above commands that flush local buffers when resetting the agent state.
If the virtual machine has corrupted buffer fragments, you will need to remove them. The following options describe different ways of handling buffers. The other steps described inFully reset agent statestill apply.
Option 1:delete all
tampons
directory. This is the easiest option, but may result in the loss of uncorrupted buffered records or duplicate records due to missing position files.linux
sudo rm -rf /var/lib/google-cloud-ops-agent/fluent-bit/buffers
windows
rmdir -R -ErrorAction SilentlyContinue "C:\ProgramData\Google\Cloud Operations\Ops Agent\run\buffers";
Option 2:Remove buffer subdirectories from
tampons
directory, but leave the position files. This approach is described inFully reset agent state.Option 3:If you do not want to delete all buffer files, you can extract the invalid buffer file names from the agent autologs and delete only the invalid buffer files.
linux
grep "format check error" /var/log/google-cloud-ops-agent/subagents/logging-module.log | sed 's|.*format check error: |/var/lib/google-cloud-ops-agent/fluent-bit/buffers/|' | xargs sudo rm -f
windows
$oalogspath="C:\ProgramData\Google\Cloud Operations\Ops Agent\log\logging-module.log";if (Test-Path $oalogspath) { Select-String "failed format check" $oalogspath | %{$_ -replace '.*Format check error: (.*)/(.*)', '$1\$2'} | %{rm -ErrorAction SilentlyContinue -Path ('C:\ProgramData\Google\Cloud Operations\Ops Agent\run\buffers\' + $_)}};
Option 4:If there are a lot of bad buffers and you want to reprocess all the log files, you can use option 3 commands and also delete the position files (which store the operations agent progress per log file). Deleting position files can result in duplicate records for any records that have already been successfully ingested. This option only reprocesses the current recordrecords; it does not reprocess files that have already been rotated or logs from other sources such as a TCP port. Position files are stored in the
tampons
but they are stored as files. Local buffers are stored as subdirectories intampons
directory,linux
grep "format check error" /var/log/google-cloud-ops-agent/subagents/logging-module.log | sed 's|.*format check error: |/var/lib/google-cloud-ops-agent/fluent-bit/buffers/|' | xargs sudo rm -fsudo find /var/lib/google-cloud-ops-agent/fluent-bit/buffers -max depth 1 -type f -delete
windows
$oalogspath="C:\ProgramData\Google\Cloud Operations\Ops Agent\log\logging-module.log";if (Test-Path $oalogspath) { Select-String "comprovação de formato falido" $oalogspath | %{$_ -replace '.*Erro na verificação de formato: (.*)/(.*)', '$1\$2'} | %{rm -ErrorAction SilentlyContinue -Path ('C:\ProgramData\Google\Cloud Operations\Ops Agent\run\buffers\' + $_)}};Get-ChildItem -Path "C:\ProgramData\Google\Cloud Operations \Ops Agent\run\buffers\" -File -ErrorAction SilentlyContinue | %{$_.Borrar()}
known issues
The following section contains known common issues. For those that have already been patched or mitigated, follow the specific instructions to recover the patch.
non-harmful records
Errors when extracting metrics from pseudo-processes or constrained processes
The following logs are not harmful and can be safely ignored. To remove them, update Ops Agent to version 2.10.0 or higher.
Jul 13 17:28:55 debian9-trouble otelopscol[2134]:2021-07-13T17:28:55.848Z error scraperhelper/scrapercontroller.go:205 Failed to extract metrics {"type": "receiver", "name" : " hostmetrics/hostmetrics", "error": "[error reading process name for pid 2: readlink /proc/2/exe: no such file or directory; error reading process name for pid 3: readlink / proc/3/exe: no files or directories; error reading process name for pid 4: readlink /proc/4/exe: no files or directories; error reading process name for pid5: readlink /proc/5 /exe: no files or directories; error reading process name for pid 6: readlink /proc/6/exe: no files or directories; error reading process name for pid 7: readlink /proc/7/exe: no file or directory; error reading process name for pid 8: readlink /proc/8/exe: no such file or directory; error reading process name process for pid 9: readlink /proc/9/exe: none file or directory; process name error reading for pid 10: readlink /proc/10/exe: no such file or directory; r error reading process name for pid 11: readlink /proc/11/exe: no such file or directory; error reading process name for pid 12: readlink /proc/12/exe: no such file or directory; error reading process name for pid 13: readlink /proc/13/exe: no such file or directory; error reading process name for pid 14: readlink /proc/14/exe: no such file or directory; error reading process name for pid 15: readlink /proc/15/exe: no such file or directory; error reading process name for pid 16: readlink /proc/16/exe: no such file or directory; error reading process name for pid 17: readlink /proc/17/exe: no such file or directory; error reading process name for pid 18: readlink /proc/18/exe: no such file or directory; error reading process name for pid 19: readlink /proc/19/exe: no such file or directory; error reading process name for pid 20: readlink /proc/20/exe: no such file or directory; error reading process name for pid 21: readlink /proc/21/exe: no such file or directory; error reading process name for pid 22: readlink /proc/22/exe: no such file or directory; error reading process name for pidJul 13 17:28:55 debian9-trouble otelopscol[2134]: 23: readlink /proc/23/exe: no such file or directory; error reading process name for pid 24: readlink /proc/24/exe: no such file or directory; error reading process name for pid 25: readlink /proc/25/exe: no such file or directory; error reading process name for pid 26: readlink /proc/26/exe: no such file or directory; error reading process name for pid 27: readlink /proc/27/exe: no such file or directory; error reading process name for pid 28: readlink /proc/28/exe: no such file or directory; error reading process name for pid 30: readlink /proc/30/exe: no such file or directory; error reading process name for pid 31: readlink /proc/31/exe: no such file or directory; error reading process name for pid 43: readlink /proc/43/exe: no such file or directory; error reading process name for pid 44: readlink /proc/44/exe: no such file or directory; error reading process name for pid 45: readlink /proc/45/exe: no such file or directory; error reading process name for pid 90: readlink /proc/90/exe: no such file or directory; error reading process name for pid 92: readlink /proc/92/exe: no such file or directory; error reading process name for pid 106: readlink /proc/106/exe: no such file or directory; error reading process name for pid 360: readlink /proc/360/exe: no such file or directory; error reading process name for pid 375: readlink /proc/375/exe: no such file or directory; error reading process name for pid 384: readlink /proc/384/exe: no such file or directory; error reading process name for pid 386: readlink /proc/386/exe: no such file or directory; error reading process name for pid 387: readlink /proc/387/exe: no such file or directory; error reading process name for pid 422: readlink /proc/422/exe: no such file or directory; error reading process name for pid 491: readlink /proc/491/exe: no such file or directory; error reading process name for pid 500: readlink /proc/500/exe: no such file or directory; error reading process name for pid 2121: readlink /proc/2121/exe: no such file or directory; read error Jul 13 17:28:55 debian9-trouble otelopscol[2134]: process name for pid 2127: readlink /proc/2127/exe: no such file or directory]"} Jul 13 17:28:55 debian9 -trouble otelopscol [ 2134]: go.pentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReportJul 13 17:28:55 debian9-trouble otelopscol[2134]: /root/go/pkg/mod/ go. opentelemetry.io / Collector@v0.29.0/receiver/scraperhelper/scrapercontroller.go:205Jul 13 17:28:55 debian9-trouble otelopscol[2134]: go.pentelemetry.io/collector/receiver/scraperhelper.(*controller ). startScraping.func1Jul 13 17:28:55 debian9-trouble otelopscol[2134]: /root/go/pkg/mod/go.opentelemetry.io/collector@v0.29.0/receiver/scraperhelper/scrapercontroller.go:186
Errors when removing first data point from cumulative metrics:
The following logs are not harmful and can be safely ignored.
Jul 13 17:28:03 debian9-trouble otelopscol[2134]:2021-07-13T17:28:03.092Z info exporterhelper/queued_retry.go:316 Failed to export. It will repeat the request after the break. {"type": "exporter", "name": "googlecloud/agent", "error": "rpc error: code = invalid argument desc = field timeSeries[1].points[0].interval.start_time had an invalid value of \"2021-07-13T10:25:18.061-07:00\": the start time must be earlier than the end time (2021-07-13T10:25:18.061-07:00) for the metric without flag ' agent .googleapis.com/agent/uptime'.", "interval": "23.491024535s"}July 13 17:28:41 debian9-trouble otelopscol[2134]: 2021-07-13T17:28:41.269Z info exporterhelper/ queued_retry .go:316 Export failed. It will repeat the request after the break. {"type": "exporter", "name": "googlecloud/agent", "error": "rpc error: code = invalid argument desc = field timeSeries[0].points[0].interval.start_time had an invalid value of \"2021-07-13T10:26:18.061-07:00\": start time must be earlier than end time (2021-07-13T10:26:18.061-07:00) for metric without indicator 'agent.googleapis.com/agent/monitoring/point_count'.", "interval": "21.556591578s"}
Some of the metrics are missing or inconsistent
There are a small number of metrics that Ops Agent version 2.0.0 and higher handle differently than the "preview" versions of the Ops Agent (versions prior to 2.0.0) or the monitoring agent.
The following table describes the differences in data ingested by the operations agent and the monitoring agent.
Metric type, omittingagente.googleapis.com | Operations Agent (GA)† | Operations Agent (Preview)† | tracking agent |
---|---|---|---|
disk/used_bytesy disk/percent_used | Ingested with full path indevice label; for example,/dev/sda15 .Not ingested for virtual devices like | ingested without/under development on the road indevice label; for example,sda15 .Ingested to virtual devices like | ingested without/under development on the road indevice label; for example,sda15 .Ingested to virtual devices like |
† oGeorgiaThe column refers to Ops Agent versions 2.0.0 and later. OadvanceColumn refers to Ops Agent versions prior to 2.0.0.
Removed agent reported by Google Cloud console as installed
After uninstalling the agent, the Google Cloud Console may take up to an hour to report this change.
Automatic agent registrations consume a lot of CPU, memory and disk space
Older versions of Ops Agent can consume large amounts of CPU, memory and disk space with/var/log/google-cloud-ops-agent/subagentes/logging-module.log
files in Linux virtual machines orC:\ProgramData\Google\Cloud Operations\Ops Agent\log\logging-module.log
files on Windows virtual machines due to corrupted buffer fragments. When this happens, you will see a large number of messages like the following on thelogging module
office hour.
[2022/04/30 05:23:38] [error] [input chunk] error writing tail.2 instance data [2022/04/30 05:23:38] [error] [storage] check format failed: tail . 2/2004860-1650614856.691268293.flb [2022/04/30 05:23:38] [error] [storage] format check failed: tail.2/2004860-1650614856.691268293.flb [2022/04/30 05:23:38 ] [error] [storage] [cio file] file not mmap() ed: tail.2:2004860-1650614856.691268293.flb
To solve this problem,update the OpsAgentfor version 2.17 or higher, andFully reset agent state.
Corrupted performance counters in Windows
If the metrics sub-agent fails to start, you might see one of the following errors in Cloud Logging:
Error retrieving performance counter object 'LogicalDisk' Error retrieving performance counter object 'Memory' Error retrieving performance counter object 'System'
These errors can occur if system performance counters become corrupted. You can resolve the errors by rebuilding the performance counters. InPowerShell as administrator, run:
cd C:\Windows\system32lodctr /R
The above command may occasionally fail; in that case, reload PowerShell and try again until successful.
After the command succeeds, restart the Ops Agent:
Restart service -Name google-cloud-ops-agent -Force
Event log timestamps are incorrect on Windows
Timestamps associated with Windows event logs in Cloud Logging may be incorrect depending on your system's time zone settings. If you notice this happening, try one of the following solutions.
Use a UTC timezone
In PowerShell, run the following commands as an administrator:
Set-TimeZone -Id "UTC"Restart-Service -Name "google-cloud-ops-agent-fluent-bit" -Force
Override time zone settings for registry sub-agent service only
In PowerShell, run the following commands as an administrator:
Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\google-cloud-ops-agent-fluent-bit" -Name "Environment" -Type "MultiString" -Value "TZ=UTC0"Restart-Service -Name " google-cloud-ops-agent-fluidez-bit" -Fuerza