NgoziN

@EngeeJon

I’m a DevOps Engineer with a strong foundation in cloud infrastructure, CI/CD pipelines, containerization, GitOps, and Infrastructure as Code.

Joined May 2022

83 Following

23 Followers

17 Posts

NgoziN

@EngeeJon

about 15 hours ago

Builds Not Running? How to Troubleshoot Broken Jenkins Agents Your pipeline is queued, but nothing happens. The Jenkins controller is healthy, yet builds refuse to start. Often, the real issue is a broken or disconnected agent. Common causes: - Agent disconnected from the Jenkins controller - SSH, authentication, or credential failures - Insufficient disk space or system resources - Missing tools required by the pipeline (Docker, Java, Git, etc.) How to troubleshoot: - Check the agent status in Jenkins and review connection logs - Verify CPU, memory, and disk utilization on the agent - Test agent authentication and network connectivity - Confirm required tools and dependencies are installed A failing pipeline isn’t always a pipeline problem; sometimes, sometimes the worker responsible for running it needs attention. Follow me for more Jenkins troubleshooting tips and share a Jenkins agent issue you've had to debug!

NgoziN

@EngeeJon

1 day ago

ImagePullBackOff? Why Your Kubernetes Containers Won’t Start Your Pod is created, but the container never starts. Instead, Kubernetes shows ImagePullBackOff. This means it can't pull the container image needed to launch the workload. Common causes: - Incorrect image name or tag - Missing or invalid registry credentials - Image doesn't exist in the registry - Network or connectivity issues reaching the registry How to troubleshoot: - Run kubectl describe pod and check Events - Verify the image name and tag actually exist - Check image pull secrets and registry permissions - Confirm nodes can reach the container registry ImagePullBackOff isn't an application problem; it's Kubernetes telling you the container image isn't available. Follow me for more Kubernetes troubleshooting tips and share an image pull issue you've had to debug!

NgoziN

@EngeeJon

2 days ago

DevOps Quiz Your Kubernetes deployment shows: ✅ Pods Running ✅ Service Created ✅ Ingress Created But the application is still unreachable from the browser. What's the MOST likely next troubleshooting step? A. Check pod logs B. Verify ingress controller is running C. Restart the deployment D. Scale the deployment to 5 replicas What's your answer and why? 👇 #DevOps #Kubernetes #SRE #PlatformEngineering

NgoziN

@EngeeJon

3 days ago

When Bash Scripts Stop Scaling: The Case for Infrastructure as Code Bash scripts are great for quick automation. But as environments grow, what started as a few scripts can become difficult to maintain, troubleshoot, and share. Signs you may have outgrown scripts: - Multiple scripts managing the same infrastructure - No clear record of what changed and when - Inconsistent deployments across environments - Manual fixes required after every execution Why IaC helps: - Infrastructure becomes version-controlled and reviewable - Changes are visible before they're applied - Deployments become repeatable and predictable - Teams can collaborate using a common source of truth Scripts automate tasks. Infrastructure as Code manages systems. Follow me for more Terraform and IaC insights and share the moment you realized scripts were no longer enough!

NgoziN

@EngeeJon

3 days ago

Hey @X algorithm I'm looking to connect with: - Cloud Engineers - DevOps Engineers - Kubernetes Enthusiasts - Platform Engineers - SREs Drop 👋 and follow. Let's connect and grow together #DevOps #AWS #Kubernetes #Terraform #SRE #CloudEngineering #PlatformEngineering

NgoziN

@EngeeJon

6 days ago

Rollback Successful, But the Problem Still Exists? You rolled back the deployment, but users are still experiencing issues. That usually means the deployment wasn’t the real root cause. Common reasons rollbacks fail to fix incidents: - Database or schema changes weren’t reversible - Cached data or stale configurations still active - External dependencies continued failing - Infrastructure or environment drift existed before deployment How to troubleshoot properly: - Confirm what actually changed during the release - Check logs, metrics, and dependency health, not just deployment status - Separate application issues from infrastructure problems - Test rollback procedures before production incidents happen A rollback only works if the deployment caused the problem in the first place. Follow me for more DevOps tips and share a time a rollback didn’t solve the incident!

NgoziN

@EngeeJon

7 days ago

Ansible Using the “Wrong” Variable? It’s Probably Variable Precedence Your playbook runs, but Ansible keeps using a value you didn’t expect. Most of the time, the issue isn’t the variable itself; it’s where that variable was defined. Common causes: - Variables overridden by extra-vars or inventory vars - Conflicts between group_vars and host_vars - Role defaults being replaced unexpectedly - Cached facts or old variable definitions still in use How to troubleshoot: - Use the debug module to print actual variable values - Check Ansible’s variable precedence hierarchy carefully - Keep variable definitions organized and predictable - Test with minimal inventories to isolate conflicts Follow me for more DevOps troubleshooting tips and share a variable precedence issue you’ve had to debug!

NgoziN

@EngeeJon

8 days ago

Jenkins Build Keeps Failing? Here’s How to Find the Real Problem A Jenkins build failure is usually just the symptom, not the actual root cause. The key is learning how to trace failures systematically instead of restarting builds blindly. Common causes: - Dependency or package version mismatches - Expired credentials or permission issues - External services timing out or unreachable - Environment differences between local and CI How to troubleshoot effectively: - Start with the first meaningful error in the logs - Compare successful builds against failed ones - Re-run individual stages to isolate the failure point - Check recent changes in code, plugins, or infrastructure The faster you trace the real issue, the more reliable your CI/CD pipeline becomes. Follow me for more Jenkins troubleshooting tips, and share a build failure that took longer than expected to debug!

NgoziN

@EngeeJon

8 days ago

https://t.co/t6BMORos3T

NgoziN

@EngeeJon

9 days ago

Kubernetes Service Not Reachable? Start Debugging the Network Path Your Pods are running, but the Service still isn’t reachable. In Kubernetes, networking issues can happen at multiple layers, not just the application itself. Common causes: - Service selectors not matching any Pods - NetworkPolicies blocking traffic - Incorrect Service type or exposed ports - DNS resolution failures inside the cluster How to troubleshoot effectively: - Check if the Service has active endpoints - Verify Pod labels match the Service selector - Test connectivity from inside the cluster using temporary debug Pods - Review NetworkPolicies, Ingress rules, and DNS resolution A running Pod doesn’t guarantee reachable traffic; the network path still has to be correct. Follow me for more Kubernetes troubleshooting tips, and share a networking issue you’ve had to debug!

NgoziN

@EngeeJon

10 days ago

Infrastructure Drift: When Manual Changes Break Your Terraform Reality Everything was working fine until terraform plan suddenly showed unexpected changes. That’s usually a sign of infrastructure drift when someone modifies resources manually outside Terraform. Common causes of drift: - Emergency fixes made directly in the cloud console - Manual scaling or security group changes - Resources updated outside the IaC workflow - Multiple teams changing infrastructure independently How to troubleshoot and recover: - Run terraform plan regularly to detect drift early - Compare Terraform state with actual cloud resources - Reconcile changes back into code whenever possible - Limit direct production access to reduce unmanaged changes Drift doesn’t just create configuration problems; it creates uncertainty about what your infrastructure actually looks like. Follow me for more Terraform troubleshooting tips, and share a drift issue you’ve had to investigate!

NgoziN

@EngeeJon

12 days ago

Why Terraform Keeps Recreating Resources You run terraform plan expecting small update, but Terraform wants to destroy and recreate resources again. Common causes: - Changes to attributes that require replacement - Dynamic values causing constant diffs - Manual infrastructure changes outside Terraform - Incorrect use of count, for_each, or resource naming How to troubleshoot: - Review the exact attribute triggering replacement - Check for drift between state and real infrastructure - Use lifecycle rules carefully (ignore_changes, create_before_destroy) - Keep resource identifiers stable across deployments Follow me for more DevOps troubleshooting tips and share a time Terraform tried to recreate something unexpectedly!

NgoziN

@EngeeJon

13 days ago

HPA Not Scaling? Here’s What Kubernetes Might Be Telling You Traffic increases, but pods stay the same, and performance starts dropping. When the Horizontal Pod Autoscaler (HPA) doesn’t scale, the issue is often deeper than “autoscaling is broken.” Common causes: - Missing or incorrect resource requests on Pods - Metrics Server not working or unavailable - Scaling thresholds set too high - Cluster lacks enough node capacity to schedule new Pods How to troubleshoot: - Check HPA events with kubectl describe hpa - Verify CPU/memory metrics are actually being collected - Confirm Pods have proper resource requests defined - Review node capacity and Pending Pods HPA can only scale based on the signals and capacity available to it. Follow me for more DevOps troubleshooting tips and share an autoscaling issue you’ve had to debug!

NgoziN

@EngeeJon

14 days ago

Works locally, but fails in Jenkins? Your code works perfectly on your machine, but the Jenkins pipeline keeps failing. Most of the time, the problem is environment differences between local and CI. Common causes: - Missing environment variables or secrets in CI - Different package, dependency, or tool versions - Permission differences between local and Jenkins agents - External services unavailable from the CI environment How to troubleshoot: - Compare local vs CI environment configurations - Print tool versions and runtime details in the pipeline - Reproduce the issue inside the same Docker image or agent - Avoid relying on local machine assumptions Follow me for more DevOps troubleshooting tips and share a pipeline issue that only failed in CI!

NgoziN

@EngeeJon

14 days ago

When Half Your Ansible Hosts Succeed, and Half Fail Your playbook starts successfully, but only some hosts complete the changes. Now your environment is inconsistent, and troubleshooting becomes much harder. Common causes of partial failures: - Different package versions or OS configurations - Network instability or intermittent SSH connectivity - Permission differences across hosts - Tasks depending on services not available everywhere How to troubleshoot safely: - Identify patterns among failed hosts - Use --limit to isolate and retest affected systems - Design playbooks to be idempotent and restart-safe - Use rolling updates (serial) to reduce blast radius Follow me for more DevOps troubleshooting tips and share a partial failure issue you’ve had to debug!

NgoziN

@EngeeJon

10 months ago

@Techboy150 Thank you for sharing this @Techboy150

NgoziN

@EngeeJon

about 1 year ago

@hackSultan volunteer

NgoziN

@EngeeJon

Last Seen Users on Sotwe

Trends for you

Most Popular Users