Fix Kubernetes Pods Stuck in Init and MySQL in ContainerCreating with Longhorn (Complete Troubleshooting Guide)

When running stateful applications like Laravel + MySQL inside a Kubernetes cluster (K3s, Rancher, OKD, etc.), you may encounter a frustrating situation:

  • Application pod stuck in Init:0/X
  • MySQL stuck in ContainerCreating
  • Deployment shows Updating forever
  • FailedAttachVolume errors in events
  • Private registry images not pulling

This article provides a comprehensive, production-grade troubleshooting guide for resolving:

  • Image pull secret issues
  • Longhorn volume attachment failures
  • StatefulSet rescheduling strategies
  • Control-plane node storage pitfalls

All steps are anonymized and safe for real-world production clusters.


🔍 Symptoms

You may see:

kubectl get pods -n <namespace>

Output:

app-xxxxx        0/2   Init:0/5            0   10m
mysql-0          0/1   ContainerCreating   0   2d

And from:

kubectl describe pod mysql-0

You may see:

FailedAttachVolume
rpc error: DeadlineExceeded
failed to attach volume

Or for the app pod:

FailedToRetrieveImagePullSecret
Unable to retrieve image pull secrets

These are two separate root causes that often occur together.


🚨 Root Cause #1: Image Pull Secret Not Available in Namespace

Kubernetes secrets are namespace-scoped.

Even if your Docker registry secret exists, it must:

  • Exist in the same namespace as the pod
  • Be attached via imagePullSecrets
  • Be accessible by the ServiceAccount

✅ Fix: Create Registry Secret in Correct Namespace

kubectl create secret docker-registry regcred \
  --docker-server=registry.example.com \
  --docker-username="username" \
  --docker-password="token_or_password" \
  --docker-email="dev@example.com" \
  -n app-namespace

Verify:

kubectl get secret regcred -n app-namespace

✅ Attach to Default ServiceAccount

If your pod uses:

Service Account: default

Attach secret:

kubectl patch serviceaccount default -n app-namespace \
  -p '{"imagePullSecrets":[{"name":"regcred"}]}'

Verify:

kubectl get sa default -n app-namespace -o yaml

Now restart the stuck pod:

kubectl delete pod <app-pod> -n app-namespace

Your private images should now pull correctly.


🚨 Root Cause #2: Longhorn Volume Fails to Attach

If MySQL (or any StatefulSet) is stuck in:

ContainerCreating

And events show:

FailedAttachVolume
rpc error: DeadlineExceeded

This indicates a Longhorn volume attachment failure, not a MySQL issue.

Common causes:

  • Node reboot
  • Kernel upgrade
  • CSI plugin instability
  • Control-plane node mount issues
  • Stale attachment metadata

🔧 Step-by-Step Fix for Longhorn Attach Failures

1️⃣ Identify the Problem Volume

From:

kubectl describe pod mysql-0 -n app-namespace

Locate:

AttachVolume.Attach failed for volume "pvc-xxxxx"

Copy the PVC ID.


2️⃣ Check Longhorn Nodes

kubectl -n longhorn-system get nodes

Ensure nodes are Ready.


3️⃣ Force-Detach the Volume

kubectl -n longhorn-system patch volume pvc-xxxxx \
  --type merge \
  -p '{"spec":{"nodeID":""}}'

Wait 15–30 seconds.


4️⃣ Reschedule StatefulSet Away from Control Plane (Recommended)

Stateful workloads should not run on control-plane nodes when possible.

Temporarily cordon the problematic node:

kubectl cordon problematic-node-name

Delete MySQL pod:

kubectl delete pod mysql-0 -n app-namespace

Kubernetes will reschedule it to another node.


5️⃣ Watch Recovery

kubectl get pods -n app-namespace -w

Expected result:

mysql-0     1/1   Running
app-xxxxx   2/2   Running

Once MySQL starts, application init containers waiting on the database will complete automatically.


⚠️ Why This Happens Frequently on Control-Plane Nodes

Longhorn + control-plane nodes can be sensitive due to:

  • Taints
  • CSI mount path permissions
  • Engine process crashes
  • Networking instability
  • Reboot timing issues

Best practice:

  • Keep stateful workloads on worker nodes
  • Avoid attaching Longhorn volumes to control-plane nodes
  • Use node affinity rules

🛡️ Preventive Best Practices

1️⃣ Keep MySQL Off Control Plane

Add node affinity to StatefulSet:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: node-role.kubernetes.io/control-plane
              operator: DoesNotExist

2️⃣ Harden Init Containers

Avoid infinite wait loops:

Instead of:

until nc -z db 3306; do sleep 2; done

Use timeout logic to prevent permanent blocking.


3️⃣ Regular Longhorn Health Checks

kubectl -n longhorn-system get volumes
kubectl -n longhorn-system get nodes

Monitor:

  • Volume robustness
  • Replica health
  • Attachment status

🧠 Understanding the Full Failure Chain

SymptomRoot Cause
Init:0/5 stuckWaiting on DB or image pull failure
FailedToRetrieveImagePullSecretSecret missing in namespace
ContainerCreatingVolume mount not complete
FailedAttachVolumeLonghorn CSI issue
Deployment Updating foreverMinimum availability not reached

🏁 Final Outcome

Once:

  • Registry secrets are properly configured
  • Volume is detached and reattached correctly
  • MySQL reschedules successfully

The entire stack recovers automatically.

No data loss. No PVC deletion required.


📌 Key Takeaways

  • Kubernetes secrets are namespace-scoped
  • Longhorn attachment failures often require manual detachment
  • Control-plane nodes are poor candidates for stateful workloads
  • Init container chains amplify underlying storage issues
  • Most “application failures” are actually infrastructure problems
This article is inspired by real-world challenges we tackle in our projects. If you're looking for expert solutions or need a team to bring your idea to life,

Let's talk!

    Please fill your details, and we will contact you back

      Please fill your details, and we will contact you back