컨테이너 인사이트 문제 해결

2025-06-19

이 문서에서는 Container Insights를 사용하여 Kubernetes 클러스터를 모니터링할 때 발생하는 몇 가지 일반적인 문제 및 문제 해결 단계를 설명합니다.

중복 경고가 생성되고 있습니다.

컨테이너 인사이트 권장 경고를 사용하지 않도록 설정하지 않고 Prometheus 경고 규칙을 사용하도록 설정했을 수 있습니다. 컨테이너 인사이트 권장 경고에서 Prometheus 권장 경고 규칙(미리 보기)으로 마이그레이션을 참조하세요.

클러스터 권한

클러스터에 필요한 권한이 없으면 오류 메시지가 표시 될 수 있습니다. You do not have the right cluster permissions which will restrict your access to Container Insights features. Please reach out to your cluster admin to get the right permission.

Container Insights는 이전에 사용자가 Log Analytics 작업 영역의 액세스 권한에 따라 Azure Portal 환경에 액세스할 수 있도록 허용했습니다. 이제 Azure Portal 환경에 대한 액세스를 제공하기 위해 클러스터 수준 권한을 확인합니다. 이 권한을 할당하려면 클러스터 관리자가 필요할 수 있습니다.

기본 읽기 전용 클러스터 수준 액세스의 경우 다음 형식의 클러스터에 대해 모니터링 읽기 권한자 역할을 할당합니다.

Kubernetes RBAC(역할 기반 액세스 제어) 권한 부여가 사용하도록 설정되지 않은 AKS
Microsoft Entra SAML 기반 Single Sign-On으로 사용하도록 설정된 AKS
Kubernetes RBAC 권한 부여를 사용하도록 설정된 AKS
클러스터 역할 바인딩 clusterMonitoringUser로 구성된 AKS
Azure Arc 지원 Kubernetes 클러스터

AKS에 대한 역할을 할당하는 방법에 대한 자세한 내용은 사용자 또는 그룹에 역할 권한 할당을 참조하고 역할 할당에 대한 자세한 내용은 AKS(Azure Kubernetes Service)에 대한 액세스 및 ID 옵션을 참조하세요.

온보딩 및 업데이트 문제

다음 섹션에서는 클러스터에서 컨테이너 인사이트를 온보딩하거나 업데이트할 때 발생할 수 있는 문제에 대해 설명합니다.

누락된 구독 등록

오류가 Missing Subscription registration표시되면 Log Analytics 작업 영역의 구독에 리소스 공급자 Microsoft.OperationsManagement 를 등록합니다. 리소스 공급자 등록 오류 해결을 참조하세요.

권한 부여 오류

컨테이너 인사이트를 사용하도록 설정하거나 클러스터를 업데이트하면 다음과 같은 오류가 발생할 수 있습니다. The client <user's identity> with object id <user's objectId> does not have authorization to perform action Microsoft.Authorization/roleAssignments/write over scope.

온보딩 또는 업데이트 프로세스 중에 모니터링 메트릭 게시자 역할을 클러스터 리소스에 할당하려고 시도합니다. 프로세스를 시작하는 사용자는 AKS 클러스터 리소스 범위에 대한 Microsoft.Authorization/roleAssignments/write 권한에 액세스할 수 있어야 합니다. 소유자 및 사용자 액세스 관리자 기본 제공 역할의 구성원만이 이 사용 권한에 대한 액세스 권한을 부여 받습니다. 보안 정책에서 세분화된 수준의 권한을 할당해야 하는 경우 Azure 사용자 지정 역할을 참조하고 필요한 사용자에게 권한을 할당합니다. Azure Portal을 사용하여 Azure 역할 할당의 지침을 사용하여 Azure Portal에서 모니터링 메트릭에 게시자 역할을 할당합니다.

클러스터를 업그레이드할 수 없습니다.

AKS 클러스터에 Container Insights를 설치한 후 업그레이드할 수 없다면, 해당 클러스터가 데이터를 전송하던 Log Analytics 작업 영역이 삭제되었을 수 있습니다. 클러스터에 대한 모니터링을 사용하지 않도록 설정하고 다른 작업 영역을 사용하여 컨테이너 인사이트를 다시 사용하도록 설정합니다.

Azure Monitor 컨테이너 확장 설치 실패

이 오류 manifests contain a resource that already exists 는 Container Insights 에이전트의 리소스가 Azure Arc 지원 Kubernetes 클러스터에 이미 있음을 나타냅니다. 즉, Container Insights 에이전트가 이미 설치되어 있습니다. Container Insights 에이전트의 기존 리소스를 정리한 다음 Azure Monitor 컨테이너 확장을 사용하도록 설정하여 이 문제를 해결합니다.

AKS 클러스터

다음 명령을 실행하고 Azure Monitor 에이전트 추가 기능 프로필을 찾아 AKS 모니터링 추가 기능이 사용하도록 설정되었는지 확인합니다.

az  account set -s <clusterSubscriptionId>
az aks show -g <clusterResourceGroup> -n <clusterName>

출력에 Log Analytics 작업 영역 리소스 ID가 있는 Azure Monitor 에이전트 추가 기능 프로필 구성이 포함된 경우 AKS 모니터링 추가 기능이 사용하도록 설정되며 다음 명령을 사용하여 사용하지 않도록 설정해야 합니다.

az aks disable-addons -a monitoring -g <clusterResourceGroup> -n <clusterName>

AKS가 아닌 클러스터

다음 명령을 클러스터에 대해 실행하여 azmon-containers-release-1 Helm 차트 릴리스가 존재하는지 확인합니다.

helm list  -A

azmon-containers-release-1가 출력에 포함되어 있는 경우, 다음 명령을 사용하여 Helm 차트 릴리스를 삭제합니다.

helm del azmon-containers-release-1

누락된 데이터

클러스터에서 Container Insights를 활성화한 후 데이터가 나타나기까지 최대 15분이 소요될 수 있습니다. 15분 후에 데이터가 표시되지 않는 경우 잠재적인 문제 및 해결 방법은 다음 섹션을 참조하세요.

데이터 검색 오류 메시지

클러스터에서 데이터를 전송하던 Log Analytics 작업 영역이 삭제된 경우 오류 메시지가 Error retrieving data 나타날 수 있습니다. 이 경우 클러스터에 대한 모니터링을 사용하지 않도록 설정하고 다른 작업 영역을 사용하여 컨테이너 인사이트를 다시 사용하도록 설정합니다 .

로컬 인증 사용 안 함

다음 CLI 명령을 사용하여 Log Analytics 작업 영역이 로컬 인증에 대해 구성되어 있는지 확인합니다.

az resource show --ids "/subscriptions/[Your subscription ID]/resourcegroups/[Your resource group]/providers/microsoft.operationalinsights/workspaces/[Your workspace name]"

이면 disableLocalAuth = true다음 명령을 실행합니다.

az resource update --ids "/subscriptions/[Your subscription ID]/resourcegroups/[Your resource group]/providers/microsoft.operationalinsights/workspaces/[Your workspace name]" --api-version "2021-06-01" --set properties.features.disableLocalAuth=False |

일별 한도 충족

Log Analytics 작업 영역에 대한 일일 한도가 충족되면 재설정 시간까지 데이터 수집을 중지합니다. Log Analytics 일일 한도를 참조하세요.

Terraform과 함께 배포되지 않은 DCR

Terraform와 msi_auth_for_monitoring_enabled을 사용하여 컨테이너 인사이트가 활성화되고 true으로 설정된 경우, 로그 수집을 위해 DCR 및 DCRA 리소스도 배포해야 합니다. 컨테이너 인사이트 사용을 참조하세요.

어떤 정보도 보고하지 않는 컨테이너 인사이트

상태 정보를 볼 수 없거나 로그 쿼리에서 결과가 반환되지 않는 경우 다음 단계를 사용합니다.

다음 명령을 사용하여 에이전트의 상태를 확인합니다.

kubectl get ds ama-logs --namespace=kube-system

Pod 수는 클러스터의 Linux 노드 수와 같아야 합니다. 출력은 다음 예제와 유사해야 하며, 이 출력은 제대로 배포된 것을 나타냅니다.
```
User@aksuser:~$ kubectl get ds ama-logs --namespace=kube-system
NAME       DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR    AGE
ama-logs   2         2         2         2            2           <none>           1d
```
Windows Server 노드가 있는 경우 다음 명령을 실행하여 에이전트의 상태를 확인합니다.

kubectl get ds ama-logs-windows --namespace=kube-system

Pod 수는 클러스터의 Windows 노드 수와 같아야 합니다. 출력은 다음 예제와 유사해야 하며, 이 출력은 제대로 배포된 것을 나타냅니다.
```
User@aksuser:~$ kubectl get ds ama-logs-windows --namespace=kube-system
NAME                   DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR    AGE
ama-logs-windows           2         2         2         2            2           <none>       1d
```
다음 명령을 사용하여 배포 상태를 확인합니다.

kubectl get deployment ama-logs-rs --namespace=kube-system

출력은 다음 예제와 유사해야 하며, 이 출력은 제대로 배포된 것을 나타냅니다.
```
User@aksuser:~$ kubectl get deployment ama-logs-rs --namespace=kube-system
NAME          READY   UP-TO-DATE   AVAILABLE   AGE
ama-logs-rs   1/1     1            1           24d
```

Pod의 상태를 확인하여 kubectl get pods --namespace=kube-system 명령을 통해 실행 중인지 확인합니다.

출력은 ama-logs의 상태가 Running인 다음 예와 유사해야 합니다.

User@aksuser:~$ kubectl get pods --namespace=kube-system
NAME                                READY     STATUS    RESTARTS   AGE
aks-ssh-139866255-5n7k5             1/1       Running   0          8d
azure-vote-back-4149398501-7skz0    1/1       Running   0          22d
azure-vote-front-3826909965-30n62   1/1       Running   0          22d
ama-logs-484hw                      1/1       Running   0          1d
ama-logs-fkq7g                      1/1       Running   0          1d
ama-logs-windows-6drwq              1/1       Running   0          1d

Pod가 실행 중 상태이지만 Log Analytics에 데이터가 없거나 데이터가 하루 중 특정 시간에만 전송되는 것처럼 보이는 경우 일일 한도가 충족된 것을 나타낼 수 있습니다. 이 제한이 매일 충족되면 데이터가 Log Analytics 작업 영역으로 더 이상 수집되지 않고 다시 설정 시간에 다시 설정됩니다. 자세한 내용은 Log Analytics 일일 한도를 참조하세요.

메트릭이 수집되지 않습니다.

다음 CLI 명령을 사용하여 모니터링 메트릭 게시자 역할 할당이 존재하는지 확인합니다.
```
az role assignment list --assignee "SP/UserassignedMSI for Azure Monitor Agent" --scope "/subscriptions/<subid>/resourcegroups/<RG>/providers/Microsoft.ContainerService/managedClusters/<clustername>" --role "Monitoring Metrics Publisher"
```
MSI가 있는 클러스터의 경우 Azure Monitor 에이전트에 대해 사용자가 할당한 클라이언트 ID는 모니터링을 사용하거나 사용하지 않도록 설정할 때마다 변경되므로 현재 MSI 클라이언트 ID에 역할 할당이 있어야 합니다.
Microsoft Entra Pod ID를 사용하도록 설정하고 MSI를 사용하는 클러스터의 경우:
- 다음 명령을 사용하여 필수 레이블 kubernetes.azure.com/managedby: aks가 Azure Monitor 에이전트 Pod에 있는지 확인합니다.
  
  kubectl get pods --show-labels -n kube-system | grep ama-logs
- https://github.com/Azure/aad-pod-identity#1-deploy-aad-pod-identity에서 지원되는 메서드 중 하나를 사용하여 Pod ID를 사용하도록 설정한 경우 예외가 사용하도록 설정되는지 확인합니다.
  
  다음 명령을 실행하여 확인합니다.
  
  kubectl get AzurePodIdentityException -A -o yaml
  
  다음 예와 유사한 출력이 표시되어야 합니다.
```
apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzurePodIdentityException
metadata:
name: mic-exception
namespace: default
spec:
podLabels:
app: mic
component: mic
---
apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzurePodIdentityException
metadata:
name: aks-addon-exception
namespace: kube-system
spec:
podLabels:
kubernetes.azure.com/managedby: aks
```

Azure가 아닌 클러스터에 있는 노드 및 컨테이너의 CPU 또는 메모리는 성능 차트에 표시되지 않습니다.

컨테이너 인사이트 에이전트 Pod는 노드 에이전트의 cAdvisor 엔드포인트를 사용하여 성능 메트릭을 수집합니다. 성능 메트릭을 수집하기 위해 클러스터의 모든 노드에서 cAdvisor secure port: 10250 또는 cAdvisor unsecure port: 10255를 열수 있도록 노드의 컨테이너화된 에이전트가 구성되어 있는지 확인합니다. 하이브리드 Kubernetes 클러스터에 대한 필수 구성 요소를 참조하세요.

ContainerLog 테이블에 이미지 및 이름 값이 채워지지 않음

에이전트 버전 ciprod12042019 이상에서는 수집된 로그 데이터에 발생하는 비용을 최소화하기 위해 모든 로그 줄에 대해 이러한 두 속성이 기본적으로 채워지지 않습니다. 이러한 속성의 컬렉션을 사용하도록 설정하거나 다른 테이블의 이러한 속성을 포함하도록 쿼리를 수정할 수 있습니다.

Image 속성에 조인하여 ImageTag 테이블의 ContainerInventory 및 ContainerID 속성을 포함하도록 쿼리를 수정합니다. Name 속성에 조인하여 ContainerLog 테이블의 KubepodInventory 필드에서 ContainerName 속성(이전에는 ContainerID 테이블에 표시됨)을 포함할 수 있습니다.

다음 샘플 쿼리에서는 사용 조인을 가져와 이러한 값을 검색하는 방법을 보여 줍니다.

//Set the time window for the query
let startTime = ago(1h);
let endTime = now();
//
//Get the latest Image & ImageTag for every containerID
let ContainerInv = ContainerInventory | where TimeGenerated >= startTime and TimeGenerated < endTime | summarize arg_max(TimeGenerated, *)  by ContainerID, Image, ImageTag | project-away TimeGenerated | project ContainerID1=ContainerID, Image1=Image ,ImageTag1=ImageTag;
//
//Get the latest Name for every containerID
let KubePodInv  = KubePodInventory | where ContainerID != "" | where TimeGenerated >= startTime | where TimeGenerated < endTime | summarize arg_max(TimeGenerated, *)  by ContainerID2 = ContainerID, Name1=ContainerName | project ContainerID2 , Name1;
//
//Join the above to get a jointed table that has name, image & imagetag. Outer left is used in case there are no kubepod records or if they're latent
let ContainerData = ContainerInv | join kind=leftouter (KubePodInv) on $left.ContainerID1 == $right.ContainerID2;
//
//Join ContainerLog table with the jointed table above, project-away redundant fields/columns, and rename columns that were rewritten. Outer left is used so logs aren't lost even if no container metadata for loglines is found.
ContainerLog
| where TimeGenerated >= startTime and TimeGenerated < endTime
| join kind= leftouter (
  ContainerData
) on $left.ContainerID == $right.ContainerID2 | project-away ContainerID1, ContainerID2, Name, Image, ImageTag | project-rename Name = Name1, Image=Image1, ImageTag=ImageTag1

경고

노드가 50개 이상인 큰 클러스터에는 속성을 사용하지 않는 것이 좋습니다. 클러스터의 모든 노드에서 API 서버 호출을 생성하고 수집된 모든 로그 줄에 대한 데이터 크기도 증가합니다.

쿼리를 수정할 필요가 없도록 이러한 필드의 컬렉션을 사용하도록 설정하려면 데이터 수집 구성 설정에 설명된 대로 에이전트 구성 맵에서 설정을 log_collection_settings.enrich_container_logs 사용하도록 설정합니다.

Azure 로컬 클러스터에서 로그가 수집되지 않음

2023년 11월 이전에 Azure Local에 대한 클러스터 및/또는 구성된 Insights를 등록한 경우 Arc for Servers Insights, VM Insights, Container Insights, Defender for Cloud 또는 Microsoft Sentinel과 같은 Azure Local에서 Azure Monitor 에이전트를 사용하는 기능이 로그 및 이벤트 데이터를 제대로 수집하지 못할 수 있습니다. Azure Local에 대한 에이전트 및 인사이트를 다시 구성하는 단계는 Azure Local용 AMA 에이전트 복구를 참조하세요.

큰 클러스터의 누락된 데이터

다음 테이블 중 하나에서 데이터가 누락된 경우 많은 수의 Pod 또는 노드로 인해 큰 페이로드의 구문 분석과 관련된 문제가 발생할 수 있습니다. 이는 기본 PODS_CHUNK_SIZE가 1000이기 때문에 큰 JSON 페이로드를 구문 분석하는 데 있어 Ruby 플러그인의 알려진 문제입니다.

이 문제를 해결하기 위해 기본 PODS_CHUNK_SIZE 값을 더 작은 값으로 조정할 계획이 있습니다.

KubePodInventory (쿠베 포드 인벤토리)
KubeNodeInventory
KubeEvents
KubePVInventory (쿠버네티스 PV 인벤토리)
KubeServices

다음 명령을 사용하여 클러스터에서 더 작은 PODS_CHUNK_SIZE 값을 구성했는지 확인합니다.

# verify if kube context being set for right cluster
kubectl cluster-info

# check if the configmap configured with smaller PODS_CHUNK_SIZE chunksize already
kubectl logs <ama-logs-rs pod name> -n kube-system -c ama-logs | grep PODS_CHUNK_SIZE

# If it's configured, the output will be similar to "Using config map value: PODS_CHUNK_SIZE = 10"

클러스터가 더 작은 PODS_CHUNK_SIZE 값으로 이미 구성된 경우, 클러스터를 큰 클러스터에 사용할 수 있도록 설정해야 합니다.
클러스터가 기본값 PODS_CHUNK_SIZE=1000을 사용하는 경우 클러스터에 많은 수의 Pod 또는 노드가 있는지 확인합니다.
```
# check the total number of PODS
kubectl get pods -A -o wide | wc -l

# check the total number of NODES
kubectl get nodes -o wide | wc -l
```

Pod 및 노드 수가 합리적으로 높고 클러스터가 기본값 PODS_CHUNK_SIZE=1000 을 사용하고 있는지 확인한 후 다음 명령을 사용하여 configmap을 구성합니다.

# Check if the cluster has container-azm-ms-agentconfig configmap in kube-system namespace
kubectl get cm -n kube-system | grep container-azm-ms-agentconfig

# If there is no existing container-azm-ms-agentconfig configmap, then configmap needs to be downloaded  and applied
curl -L https://raw.githubusercontent.com/microsoft/Docker-Provider/refs/heads/ci_prod/kubernetes/container-azm-ms-agentconfig.yaml -o container-azm-ms-agentconfig
kubectl apply -f container-azm-ms-agentconfig

# Edit the configmap and uncomment agent_settings.chunk_config and PODS_CHUNK_SIZE lines under agent-settings: |- in the configmap
kubectl edit cm -n kube-system  container-azm-ms-agentconfig -o yaml

에이전트 OOM이 종료됨

Daemonset 컨테이너가 OOM으로 종료됨

먼저 다음 명령을 사용하여 OOM이 종료되는 컨테이너를 식별합니다. 이렇게 하면 ama-logs, ama-logs-prometheus 중 하나 또는 둘 다 식별됩니다.

# verify if kube context being set for right cluster
kubectl cluster-info

# get the ama-logs pods and status
kubectl get pods -n kube-system -o custom-columns=NAME:.metadata.name | grep -E ama-logs-[a-z0-9]{5}

# from the result of above command, find out which ama-logs pod instance getting OOM killed
kubectl describe pod <ama-logs-pod> -n kube-system

# review the output of the above command to findout which ama-logs container is getting OOM killed

다음 명령을 사용하여 로그 파일에 네트워크 오류가 mdsd.err 있는지 확인합니다.

mkdir log
# for ama-logs-prometheus container use -c ama-logs-prometheus instead of -c ama-logs
kubectl cp -c ama-logs kube-system/<ama-logs pod name>:/var/opt/microsoft/linuxmonagent/log log
cd log
cat mdsd.err

아웃바운드 엔드포인트가 차단되어 오류가 발생하는 경우 엔드포인트 요구 사항에 대한 Kubernetes 클러스터 모니터링을 위한 네트워크 방화벽 요구 사항을 참조하세요.
DCE(데이터 수집 엔드포인트) 또는 DCR(데이터 수집 규칙)이 누락되어 오류가 발생하는 경우 Kubernetes 클러스터에 대한 모니터링 사용의 지침을 사용하여 컨테이너 인사이트를 다시 활성화합니다.
오류가 없으면 로그 크기 조정과 관련이 있을 수 있습니다. Container Insights(미리 보기)에서 대규모 로그 컬렉션을 참조하세요.

Replicaset 컨테이너가 OOM으로 종료됨

다음 명령을 사용하여 ama-logs-rs pod가 OOM으로 인해 종료되는 빈도를 식별합니다.

# verify if kube context being set for right cluster
kubectl cluster-info

# get the ama-logs pods and status
kubectl get pods -n kube-system -o wide | grep ama-logs-rs

# from the result of above command, find out which ama-logs pod instance getting OOM killed
kubectl describe pod <ama-logs-rs-pod> -n kube-system

# review the output of the above command to confirm the OOM kill

ama-logs-rs가 OOM을 종료하는 경우 다음 명령을 사용하여 네트워크 오류가 있는지 확인합니다.

 mkdir log
 kubectl cp -c ama-logs kube-system/<ama-logs-rs pod name>:/var/opt/microsoft/linuxmonagent/log log
 cd log
 cat mdsd.err

아웃바운드 엔드포인트가 차단되어 오류가 발생하는 경우 엔드포인트 요구 사항에 대한 Kubernetes 클러스터 모니터링을 위한 네트워크 방화벽 요구 사항을 참조하세요.
DCE(데이터 수집 엔드포인트) 또는 DCR(데이터 수집 규칙)이 누락되어 오류가 발생하는 경우 Kubernetes 클러스터에 대한 모니터링 사용의 지침을 사용하여 컨테이너 인사이트를 다시 활성화합니다.

네트워크 오류가 없는 경우 configmap에서 [prometheus_data_collection_settings.cluster] 설정을 검토하여 클러스터 수준 prometheus 스크래핑이 사용하도록 설정되어 있는지 확인합니다.

# Check if the cluster has container-azm-ms-agentconfig configmap in kube-system namespace
kubectl get cm -n kube-system | grep container-azm-ms-agentconfig
# If there is no existing container-azm-ms-agentconfig configmap, then means cluster level prometheus data collection not enabled

노드 및 Pod 수 측면에서 클러스터 크기를 확인합니다.

# Check if the cluster has container-azm-ms-agentconfig configmap in kube-system namespace
NodeCount=$(kubectl get nodes | wc -l)
echo "Total number of nodes: ${NodeCount}"
PodCount=$(kubectl get pods -A -o wide | wc -l)
echo "Total number of pods: ${PodCount}"

# If there is no existing container-azm-ms-agentconfig configmap, then means cluster level prometheus data collection is not enabled.

문제가 클러스터의 규모와 관련이 있다고 판단되면 ama-logs-rs 메모리 제한을 늘려야 합니다. Microsoft에서 지원 사례를 열어 이 요청을 수행합니다.

대기 시간 문제

기본적으로 Container Insights는 데이터 수집 설정을 구성하거나 변환을 추가하지 않는 한 60초마다 모니터링 데이터를 수집합니다. Log Analytics 작업 영역에서 대기 시간 및 예상 수집 시간에 대한 자세한 내용은 Azure Monitor의 로그 데이터 수집 시간을 참조하세요.

다음 쿼리를 사용하여 클러스터에 연결된 로그 분석 작업 영역에서 보고된 테이블 및 시간 창에 대한 대기 시간을 확인합니다.

let clusterResourceId = "/subscriptions/<subscriptionId>/resourceGroups/<rgName>/providers/Microsoft.ContainerService/managedClusters/<clusterName>";
let startTime = todatetime('2024-11-20T20:34:11.9117523Z');
let endTime = todatetime('2024-11-21T20:34:11.9117523Z');
KubePodInventory #Update this table name to the one you want to check
| where _ResourceId =~ clusterResourceId
| where TimeGenerated >= startTime and TimeGenerated <= endTime
| extend E2EIngestionLatency = ingestion_time() - TimeGenerated
| extend AgentLatency = _TimeReceived - TimeGenerated
| summarize max(E2EIngestionLatency), max(AgentLatency) by Computer
| project Computer, max_AgentLatency, max_ingestionLatency = (max_E2EIngestionLatency -  max_AgentLatency),max_E2EIngestionLatency

에이전트 대기 시간이 긴 경우 Container Insights DCR에서 기본값인 60초와 다른 로그 수집 간격을 구성했는지 확인합니다.

# set the subscriptionId of the cluster
az account set -s "<subscriptionId>"
# check if ContainerInsightsExtension  data collection rule association exists
az monitor data-collection rule association list --resource <clusterResourceId>
# get the data collection rule resource id associated to ContainerInsightsExtension from above step
az monitor data-collection rule show  --ids  <dataCollectionRuleResourceIdFromAboveStep>
# check if there are any data collection settings related to interval from the output of the above step

여러 줄 로깅 문제

configmap을 사용하여 여러 줄 로그 기능을 사용하도록 설정할 수 있으며 다음 시나리오를 지원합니다.

기본 제한인 16KB 대신 최대 64KB의 로그 메시지를 지원합니다.
지원되는 언어 .NET, Go, Python 및 Java에 대한 예외 호출 스택 추적을 붙입니다.

다음 명령을 사용하여 다중 줄 기능 및 ContainerLogV2 스키마가 사용하도록 설정되어 있는지 확인합니다.

    # get the list of ama-logs and these pods should be in Running state
    # If these are not in Running state, then this needs to be investigated
    kubectl get po -n kube-system | grep ama-logs

    # exec into any one of the ama-logs daemonset pod and check for the environment variables
    kubectl exec -it  ama-logs-xxxxx -n kube-system -c ama-logs -- bash

    # after exec into the container run this command
    env | grep AZMON_MULTILINE

    # result should have environment variables which indicates the multiline and languages enabled
    AZMON_MULTILINE_LANGUAGES=java,go
    AZMON_MULTILINE_ENABLED=true

    # check if the containerlog v2 schema enabled or not
    env | grep AZMON_CONTAINER_LOG_SCHEMA_VERSION

    # output should be v2. If not v2, then check whether this is being enabled through DCR
    AZMON_CONTAINER_LOG_SCHEMA_VERSION=v2