"We use Prometheus for monitoring."
I hear this in almost every interview. Then I ask one question and the whole thing falls apart.
"Why do logs and metrics need different pipelines?"
Silence.
Most people jump into Prometheus and Grafana without understanding what they're actually solving. They know the tools. They can't explain the problem.
With observability, you're solving two completely different problems.
Logs tell you what happened. An error occurred. A request came in. A database query failed. These are events. Stories your application tells.
Metrics tell you how things are performing right now. Latency is 200ms. CPU is at 75%. You processed 500 requests per minute. These are measurements.
Different data types. Different collection methods. Different storage. That's where people get confused.
Last month in my DevOps bootcamp, we built a complete observability system for microservices on Kubernetes.
For logs, we used Fluentd sidecars that share a volume with the application container.
The app writes logs to the volume.
Fluentd reads and forwards them.
Clean separation of concerns.
At a small scale, you send logs straight to CloudWatch.
But when you're generating thousands of log lines per second, you add layers.
Lambda for formatting.
Kinesis for buffering.
OpenSearch for fast queries across petabytes of data.
S3 for long-term backup.
We kept 7 days in OpenSearch for active investigation. 30 days in CloudWatch. Years in S3 for compliance. Each layer has different cost and performance characteristics.
For metrics, Prometheus scrapes application endpoints every 30 seconds.
Developers instrument their code with Prometheus client libraries.
They expose a /metrics endpoint.
Prometheus pulls the data automatically.
We created ServiceMonitors that tell Prometheus which pods to scrape based on labels.
As soon as new pods come up, Prometheus discovers and scrapes them.
Then Grafana visualizes everything.
We imported pre-built dashboards from https://t.co/5wE21Lb4Q8 for Kubernetes monitoring.
And built custom panels for application-specific metrics.
Logs and metrics run in parallel.
When something breaks, metrics show you the spike. The error rate jumped. Latency went from 100ms to 2 seconds.
Then you check the logs. Filter for that time window. Find the stack traces. See exactly what failed.
You can't troubleshoot with just one. You need both perspectives.
We implemented it, troubleshot everything in a live call, generated real metrics and logs, and built dashboards in Grafana.
That's the difference between watching tutorials and actually understanding how systems work in production.
Pumba lets you kill, pause, and stress containers while injecting network delays, packet loss, and corruption
You can deploy it as a DaemonSet for cluster-wide chaos engineering
➜ https://t.co/op3WCIuAWP
Quick Linux Tip #10
Need to run a command that keeps running even after you close your SSH session?
Use:
$ nohup <command> &
It detaches the process from your terminal so it keeps running after logout and writes output to nohup.out by default.
The & sends it to the background so your shell stays usable while the job runs.
This saves you every time a long-running script dies because your VPN dropped mid-session.
Follow @tecmint for more #Linux tips
Quick Linux Tip #9
Need to test if a remote port is open without installing telnet or nmap?
Use:
$ nc -zv <host> <port>
It attempts to connect to the target host and port, then immediately tells you whether it succeeded or was refused.
The -z flag scans without sending data, and -v gives you a readable result instead of silence.
Works across firewalls, VPNs, and cloud security groups wherever nc is available, which is most Linux systems by default.
Follow @tecmint for more #Linux tips
@KobresTchele L'idée n'est pas forcément de consommer mais d'ajouter une valeur de plus afin de revendre un plus chère pour avoir une marge bcp plus considérable à l'extérieur.