@gumnos@lordbaco@awkhub Just as a semi-related aside: did you know you can have as many BEGIN/END blocks as you like? They will be executed sequentially
@tangming2005@awkhub in the past, since we had limited resources, we tried to use as few commands as possible; the line below will do the same job, and you can measure the elapsed time using the time or timex command.
awk '!/^#/ {print $1 "\t" $2 "\t" $5}' *.vcf | sort | uniq -d | wc -l
awk sticks around for a few very practical reasons, even if it looks like someone compressed a programming language into a crossword puzzle.
First, it fills a gap nothing else quite hits. It lives right between quick-and-dirty shell tricks and dragging in something heavier like Python. If you just need to grab a few columns, total some numbers, or reshape a text file, awk does it instantly. No imports, no setup, no extra files sitting around afterward. One command, done, move on with your life.
Second, once it clicks, it actually makes a lot of sense. The whole idea is simple: “when you see this, do that.” That’s not some exotic paradigm, that’s just how people think. The trouble is the syntax looks like it lost a fight with punctuation. Push through that phase and it starts to feel pretty clean.
Third, it’s everywhere. Any halfway normal Unix-like system already has it. Locked-down servers, stripped-down containers, random ancient boxes in a corner rack, awk is just sitting there like an old wrench that never got thrown out. You don’t have to ask permission or install anything, which is half the battle in real environments.
And then there’s the accumulation effect. awk has been around forever, so it’s baked into scripts, Makefiles, and decades of forum answers. You run into some cryptic one-liner, stare at it for a minute, figure it out, and now that trick is yours. Repeat that enough times and suddenly you know awk whether you meant to or not.
It’s not pretty. It’s not friendly. But it keeps getting the job done, which is more than you can say for most “modern” replacements that show up, trend for a year, and quietly disappear.
@igor_os777 I like #AWK or #GAWK
you can do scripts with the shebang
#!/usr/bin/awk -f
your script here...
very 'C' like but looser & scripted
handles floating point calculations & regex though only has rudimentary higher math
😼
@igor_os777 Awk is amazing. I implemented some tools to swiftly handle huge datasets from the Gaia mission (billions of records in CSV files). I even managed to run a rudimentary clustering algorithm in a simple PC on that huge set. Even a SQL-like tool for CSVs:
https://t.co/bHVzWseIQ5
1/ Just hit 60K followers on Linkedin. Was Googling basic awk syntax, now ask AI to do it for me.
Was reading other people's code to understand, now use AI to teach me.
The only difference is now 60,000 people watch me do it.
@Jason_Ewton@DThompsonDev Recently reviewed AI _audit_ I had 100% hallucinated bugs plus zero QA by human :-|
For replacing strings, complex or simple, I've found `awk`, `sed`, or `vim` are trust-able and testable.
Willing, ecstatic, I'd be to see evidence of AI productivity enhancements being real
🧵 Mastering AWK for Bioinformatics: Data Wrangling Made Simple
AWK is a powerhouse for genomics data manipulation. Here’s a quick guide with practical examples 👇
Have to do some analysis on files but only have terminal access?
awk is your best friend!
Here is a one liner which counts the frequency of each unique line
awk '{count[$0]++} END {for (item in count) print item, count[item]}' file.txt
Just wrote a simple bash loop with curl streamed right into a simple awk script's stdin to parse and format a summary of ~15 Jenkins jobs (250 MB logs total) into a neat nice little table. What a joy, this is what vibe coding takes away from you