Ex-academic data coach helping researchers and teams build confident R skills, using AI to assist, not replace you. Founder @DataSharpAcadem. Here to help.
Hi everyone, I'm dusting off this account to embrace my new role as a data teacher and founder of @DataSharpAcadem
With DSA, I aim to help people who work with data move from messy scripts and guesswork to clear, repeatable analyses in R.
More info at https://t.co/NjJHbgsvWO
Stop blaming your tools. Start by checking your data.
I’ve seen people spend hours rewriting code, changing packages, or asking AI for help…
for problems that were quietly sitting inside the dataset.
Messy data create fake complexity.
=> Fix the data, not the pipeline.
The vast majority of coding problems come from:
* inconsistent formatting
* hidden missing values
* broken variable types
* messy real-world data
You probably know how to code.
But are you equipped to deal with the real complexity of data analysis?
One of the biggest shocks in data analysis:
Real datasets are NOTHING like tutorial datasets.
Inconsistent categories.
Broken dates.
Missing metadata.
Mixed formats.
Unexpected NAs.
Many people think they are bad at coding.
In reality, nobody prepared them for messy data.
Suddenly, everything you try seems to fail, even though it worked before.
So you look for another method, another package.
But in most cases, the issue stems from your data.
There is something somewhere that is not what you think.
Sometimes, we create the complexity ourselves.
Messy data create fake complexity.
If adding 1 and 1 gives you 11, don't blame your tool.
Your data probably need some love.
A dataset with inconsistent categories, mixed formats, or duplicated labels can completely distort your understanding of what is happening.
Most “coding problems” are actually data problems.
Not the algorithm.
Not the package.
Not R/Python.
Your data!
Wrong types.
Inconsistent strings.
Hidden spaces.
Duplicate categories.
Unexpected NAs.
"1" ≠ 1
"Blue" ≠ "blue"
The code is often just reacting to the chaos.
AI can already help you write code.
The real challenge is learning how to look at your data:
- spotting suspicious patterns
- questioning assumptions
- understanding what the dataset can actually support
That skill is built through experience, not magic.
The @PAGES_ECN has launched a new mentorship program!
Applications are now open for:
🔹 Mentees seeking guidance and support
🔹 Mentors willing to share experience and advice
📅 Application deadline: 14 June 2026
Read more 👉https://t.co/fOoLC7DjSI
Good data analysis is less about knowing methods and more about knowing where to direct your attention.
That skill is mostly pattern recognition built through experience.
Start looking at your data differently.
Two people can look at the same dataset and draw completely different conclusions.
Not because one is smarter.
And not because one knows more advanced methods.
Often, it simply comes down to what they notice.
3/
Before running models:
* compare variables
* inspect missing values
* question strange patterns
* understand what the dataset can realistically answer
Good exploratory work is not wasted time.
It is where the real analysis begins.
1/
One thing that took me years to accept:
Good data analysis is often surprisingly slow at the beginning.
Not because strong analysts are inefficient.
But because they spend time understanding the dataset before deciding what to do with it.
2/
When I rush early, I usually pay for it later:
* weird outputs
* meaningless models
* analyses built on poor assumptions
Slowing down early often does make everything faster later.
Most people approach data analysis backwards.
They first ask:
“What method should I use?”
But strong analysis starts earlier:
- understanding the dataset,
- spotting patterns,
- questioning assumptions,
EDA is the foundation of the analysis itself.
Don't skip it.
📢 Applications are now open for the #PAGESECN Mentorship Programme!
🌍 One-on-one mentorship for paleoscience ECRs
🤝 Career guidance, networking & skill development
🧭 Connect with the global PAGES community
📅 Deadline: 14 June 2026
Apply here https://t.co/Omz5ph43LX
3/ Sometimes, a linear model or a PCA already tells you enough to guide the next step.
Simpler approaches often make patterns easier to see.
Complexity has its place.
But using it too early often masks problems instead of solving them.
1/ One of the most common mistakes in data analysis is jumping to complex solutions too early.
AI made this even easier. Sophisticated tools are now only one prompt away.
2/ But a fancy model won’t magically compensate for poor understanding of your dataset.
People spend hours tuning models before understanding:
- the structure of the data
- the variables
- the assumptions
- or what the dataset can realistically answer