Gerardo Fabián

@gedof3

Genética Médica

CDMX

Joined July 2009

3.6K Following

292 Followers

502 Posts

Pinned Tweet

Gerardo Fabián @gedof3

over 1 year ago

Identification of Pathogenic Copy Number Variants in Mexican Patients With Inherited Retinal Dystrophies Applying an Exome Sequencing Data‐Based Read‐Depth Approach - Fabian‐Morales - 2024 - Molecular Genetics & Genomic Medicine - Wiley Online Library https://t.co/6BkiaUV37k

275

gedof3 retweeted

Human Pangenome Reference Consortium @HumanPangenome

6 days ago

What is a reference genome? 🧬 In the context of HPRC, a reference genome is a low-error genome and associated annotation coordinate system that serves as the backbone for genome alignment. Explore more HPRC definitions: https://t.co/YneaCJZy3n

HumanPangenome's tweet photo. What is a reference genome? 🧬

In the context of HPRC, a reference genome is a low-error genome and associated annotation coordinate system that serves as the backbone for genome alignment.

Explore more HPRC definitions: https://t.co/YneaCJZy3n https://t.co/D7rwOY1C1d

gedof3 retweeted

Konrad Platzer

@platzer_k

17 days ago

Mwenda Rintari from Exeter presents his data on spliceogenic missense. 2.7% of missense variants in UK Biobank had a spliceAI > 0.2 and 72% of these also have low missense pathogenicity predictions with REVEL < 0.5 and AlphaMissense < 0.34. These variants did have an effect of protein levels suggesting NMD (when correcting for misbinding of antibodies to misfolded proteins w/o NMD). So yeah, check splicing score on your missense variants. #eshg2026

platzer_k's tweet photo. Mwenda Rintari from Exeter presents his data on spliceogenic missense. 2.7% of missense variants in UK Biobank had a spliceAI > 0.2 and 72% of these also have low missense pathogenicity predictions with REVEL < 0.5 and AlphaMissense < 0.34. These variants did have an effect of protein levels suggesting NMD (when correcting for misbinding of antibodies to misfolded proteins w/o NMD).

So yeah, check splicing score on your missense variants. #eshg2026

718

gedof3 retweeted

Gary Marcus

@GaryMarcus

21 days ago

🚨 Breaking neuroscience/evolution news! New Nature paper provides further evidence for a central hypothesis (with a long lineage) that I defended in my 2004 book, The Birth of the Mind: the evolutionary process of duplication and divergence played a key role in brain evolution.

GaryMarcus's tweet photo. 🚨 Breaking neuroscience/evolution news!

New Nature paper provides further evidence for a central hypothesis (with a long lineage) that I defended in my 2004 book, The Birth of the Mind: the evolutionary process of duplication and divergence played a key role in brain evolution.

160

129

14K

Who to follow

Jeremy Wilkinson

@jewilki

Molecular bio, microbiome, metagenomics, runner, sports, food & drink, camper, @okstate & @texastech alum, formerly @hutlab @harvard, now @PacBio. Views my own

Anna Murray

@AnnaM1966

Professor of Human Genetics, University of Exeter, researching the genetics of female reproductive ageing.

gedof3 retweeted

Ming "Tommy" Tang @tangming2005

23 days ago

You can not live without Rmarkdown if you use R for genomics data analysis. Cheatsheet from Posit: https://t.co/VouDECa9lU

tangming2005's tweet photo. You can not live without Rmarkdown if you use R for genomics data analysis. Cheatsheet from Posit:

https://t.co/VouDECa9lU https://t.co/inQ8HCFMQM

gedof3 retweeted

Konrad Platzer

@platzer_k

about 2 months ago

I watched the recording of the recent @gnomad_project @GA4GH webinar on v4 to address common misconceptions about the gnomAD by @HeidiRehm @SamBaxterCGC and Katherine Chao. 👀 📺 Much of that was said is likely known to most who use gnomAD on a daily basis such as myself. Still, some interesting nuggets were in there. So, here are my unsorted notes, in case you find them helpful: gnomAD is not a control group in the normal sense, but represents a reasonable reference set for severe diseases such as child-onset muscular dystrophy. It is best thought of as a general population dataset. It is NOT a healthy population dataset. Included are individuals from case-control studies and biobank participants. Not included are cohorts specifically recruited for detection of early onset mendelian diseases to avoid any over-enrichment of causal variants. Under US regulatory framework, there is no chance of any HPO terms that can be made available by a resource such as gnomAD. Because its freely accessible. No login required. Only sex, age and original study that the person was recruited in, is available to the gnomAD team. Only aggregated data can be shared. Not details on any specific participant. The only description they have is, if a person is a "control" or "case" in the original study and that only refers to the small phenotypic window thats addressed in the original study (say schizophrenia or diabetes). This is in contrast to @uk_biobank or @AllofUsResearch data. These datasets are not freely accessible - you only get to access them through their data portal where you need to login. But you get individual level data. @HeidiRehm said it will be "a while" until gnomAD data is available on another genome build other than the current hg38. Conversations on e.g. T2T or pangenome graph type of future gnomAD versions have just begun. Roughly half of the exome data made available in gnomAD v4 is from @uk_biobank. As it is a biobank that has a healthy volunteer bias, because volunteers tend to be slightly wealthier and healthier than the general population (see PMID: 28641372). Which actually means that the non-UK-biobank subset in gnomAD has a higher disease burden because it is more broadly recruited. UK Biobank participants do have rare diseases, see PMID: 28641372 Individuals in gnomAD are also from case-control studies, such as schizophrenia, myocardial infarction etc. Thus the gnomAD team created subsets, such as non-neuro to filter out certain groups/study participants. See contributing projects here https://t.co/Rll5B28ZBt At the current size of gnomAD in v4, any phenotype enrichment is less of a concern than in smaller previous versions. Simply because the dataset is 800k people. Question - should a subset be used to apply e.g. the ACMG BA1 or BS1 criteria? - @HeidiRehm says v4 is best, when used as a whole dataset, because the more people, the better. If you want a max allele freq for a certain variant, then check ancestry subgroups. All of gnomAD v4 can be considered as roughly "non-cancer" because all of the data from TCGA (The Cancer Genome Atlas Program) was removed due to QC. Which means, people in gnomAD can of course have cancer, but there is no enrichment of cancer phenotypes in v4. Question - there seems to be higher freq of cancer variants in the non-UK-biobank subset. Why? - KC: Could be due to recruiting, since UK Biobank only recruited people aged 40-69 years old. They did find far lower rates of cancer in UKB than in the general UK population. Again, biobank participants are on average healthier. @AllofUsResearch data will be inclued in v5 (+ another 800k individuals), which is "coming soon". Again, no phenotype data. Question - what about CHIP variants? - @HeidiRehm: gnomAD has a full spectrum of age so likely no enrichment, but of course older people are in there with CHIP variants. There are flags on genes, to signal high rate of CHIP variants. gnomAD does have age on many individuals included, but not all. This was information given by the original study. It is unclear to the gnomAD team actually what is meant by age in all of the different studies - e.g. time of inclusion into the study? Age when data was transferred to gnomAD? Other? The gnomAD team did not tinker with the age data, just used it as is, where possible and shares it in buckets in individual variant pages. Question - are you planning to include the million variant program dataset? - @HeidiRehm - not yet included because of no easy access to it. Since external sequencing data has to be reprocessed by the gnomAD team (quite expensive), only highly "valuable" datasets, meaning ancestrally diverse external datasets will be re-processed in future gnomAD. Simply due to cost-benefit. gnomAD v4 is mostly saturated for common variants from european ancestries, so non-european ancestries are more "valuable". Here is the preprint for gnomAD v4 by @konradjk and team if you want to dive into that PMID: 41929314. Question - why is #CFTR delta-F508 58x homozygous in gnomAD? @HeidiRehm - the team did a little research, as this is likely a bit too high for a general population. Turns out, a cohort from a clinic (important: not from UK Biobank) was included with a pulmonary focus. https://t.co/v89wQnylOi My former colleague @tomaeusTo also stumbled upon this #CFTR variant back in 2023 right after the release. See here for more details with a link to the gnomAD FAQ forum: https://t.co/YJIZ1bOUVw There are actually more than a few #CFTR over-represented variants in gnomAD, again likely due to the inclusion of that one pulmo clinic's cohort. Question - what about LongReads? HR: @TalkowskiLab is working on a LongRead gnomAD dataset. It will happen. They are working on it. No date for release just yet. There is a Canadian gnomAD dataset, not yet included in the primary gnomAD dataset. For now, it is available for download — with a browser interface coming soon. They actually want to provide phenotypes, but that is still "quite a few months out" (Jordan Lerner-Ellis). https://t.co/ScPFH5Ug4b https://t.co/M808TqEh3Q https://t.co/SF8iN3NcKV There is also a Singapore gnomAD dataset - probably not possible to access the data right now. So different groups can add their own wrinkles to these separate gnomad type datasets. Recruiting more samples for gnomAD is ongoing. Federated gnomAD explained here: https://t.co/Xa9vLjU75e @HeidiRehm points out there is also a gnomAD forum / discussion board. Please ask questions or feature requests. Link: https://t.co/bNUU98wOZ4 Thanks to the gnomAD on all their work on this invaluable resource. Couldn't do without it. Hope this summary helps. 🙂

platzer_k's tweet photo. I watched the recording of the recent @gnomad_project @GA4GH webinar on v4 to address common misconceptions about the gnomAD by @HeidiRehm @SamBaxterCGC and Katherine Chao. 👀 📺

Much of that was said is likely known to most who use gnomAD on a daily basis such as myself. Still, some interesting nuggets were in there. So, here are my unsorted notes, in case you find them helpful:

gnomAD is not a control group in the normal sense, but represents a reasonable reference set for severe diseases such as child-onset muscular dystrophy. It is best thought of as a general population dataset. It is NOT a healthy population dataset.

Included are individuals from case-control studies and biobank participants. Not included are cohorts specifically recruited for detection of early onset mendelian diseases to avoid any over-enrichment of causal variants.

Under US regulatory framework, there is no chance of any HPO terms that can be made available by a resource such as gnomAD. Because its freely accessible. No login required. Only sex, age and original study that the person was recruited in, is available to the gnomAD team. Only aggregated data can be shared. Not details on any specific participant.

The only description they have is, if a person is a "control" or "case" in the original study and that only refers to the small phenotypic window thats addressed in the original study (say schizophrenia or diabetes).

This is in contrast to @uk_biobank or @AllofUsResearch data. These datasets are not freely accessible - you only get to access them through their data portal where you need to login. But you get individual level data.

@HeidiRehm said it will be "a while" until gnomAD data is available on another genome build other than the current hg38. Conversations on e.g. T2T or pangenome graph type of future gnomAD versions have just begun.

Roughly half of the exome data made available in gnomAD v4 is from @uk_biobank. As it is a biobank that has a healthy volunteer bias, because volunteers tend to be slightly wealthier and healthier than the general population (see PMID: 28641372). Which actually means that the non-UK-biobank subset in gnomAD has a higher disease burden because it is more broadly recruited.

UK Biobank participants do have rare diseases, see PMID: 28641372

Individuals in gnomAD are also from case-control studies, such as schizophrenia, myocardial infarction etc. Thus the gnomAD team created subsets, such as non-neuro to filter out certain groups/study participants. See contributing projects here https://t.co/Rll5B28ZBt

At the current size of gnomAD in v4, any phenotype enrichment is less of a concern than in smaller previous versions. Simply because the dataset is 800k people.

Question - should a subset be used to apply e.g. the ACMG BA1 or BS1 criteria? - @HeidiRehm says v4 is best, when used as a whole dataset, because the more people, the better. If you want a max allele freq for a certain variant, then check ancestry subgroups.

All of gnomAD v4 can be considered as roughly "non-cancer" because all of the data from TCGA (The Cancer Genome Atlas Program) was removed due to QC. Which means, people in gnomAD can of course have cancer, but there is no enrichment of cancer phenotypes in v4.

Question - there seems to be higher freq of cancer variants in the non-UK-biobank subset. Why? - KC: Could be due to recruiting, since UK Biobank only recruited people aged 40-69 years old. They did find far lower rates of cancer in UKB than in the general UK population. Again, biobank participants are on average healthier.

@AllofUsResearch data will be inclued in v5 (+ another 800k individuals), which is "coming soon". Again, no phenotype data.

Question - what about CHIP variants? - @HeidiRehm: gnomAD has a full spectrum of age so likely no enrichment, but of course older people are in there with CHIP variants. There are flags on genes, to signal high rate of CHIP variants.

gnomAD does have age on many individuals included, but not all. This was information given by the original study. It is unclear to the gnomAD team actually what is meant by age in all of the different studies - e.g. time of inclusion into the study? Age when data was transferred to gnomAD? Other? The gnomAD team did not tinker with the age data, just used it as is, where possible and shares it in buckets in individual variant pages.

Question - are you planning to include the million variant program dataset? - @HeidiRehm - not yet included because of no easy access to it.

Since external sequencing data has to be reprocessed by the gnomAD team (quite expensive), only highly "valuable" datasets, meaning ancestrally diverse external datasets will be re-processed in future gnomAD. Simply due to cost-benefit.

gnomAD v4 is mostly saturated for common variants from european ancestries, so non-european ancestries are more "valuable". Here is the preprint for gnomAD v4 by @konradjk and team if you want to dive into that PMID: 41929314.

Question - why is #CFTR delta-F508 58x homozygous in gnomAD? @HeidiRehm - the team did a little research, as this is likely a bit too high for a general population. Turns out, a cohort from a clinic (important: not from UK Biobank) was included with a pulmonary focus.

https://t.co/v89wQnylOi

My former colleague @tomaeusTo also stumbled upon this #CFTR variant back in 2023 right after the release. See here for more details with a link to the gnomAD FAQ forum:

https://t.co/YJIZ1bOUVw

There are actually more than a few #CFTR over-represented variants in gnomAD, again likely due to the inclusion of that one pulmo clinic's cohort.

Question - what about LongReads? HR: @TalkowskiLab is working on a LongRead gnomAD dataset. It will happen. They are working on it. No date for release just yet.

There is a Canadian gnomAD dataset, not yet included in the primary gnomAD dataset. For now, it is available for download — with a browser interface coming soon. They actually want to provide phenotypes, but that is still "quite a few months out" (Jordan Lerner-Ellis).

https://t.co/ScPFH5Ug4b

https://t.co/M808TqEh3Q
https://t.co/SF8iN3NcKV

There is also a Singapore gnomAD dataset - probably not possible to access the data right now. So different groups can add their own wrinkles to these separate gnomad type datasets.

Recruiting more samples for gnomAD is ongoing. Federated gnomAD explained here: https://t.co/Xa9vLjU75e

@HeidiRehm points out there is also a gnomAD forum / discussion board. Please ask questions or feature requests. Link: https://t.co/bNUU98wOZ4

Thanks to the gnomAD on all their work on this invaluable resource. Couldn't do without it.

Hope this summary helps. 🙂

Gerardo Fabián @gedof3

about 2 months ago

@LaSEMOVI ¡Hola!, ¿Por qué ya no hay ciclovía sobre avenida Chapultepec?

gedof3 retweeted

Google for Health @GoogleForHealth

2 months ago

How can we help scientists get better data out of their instruments? Using @GoogleDeepMind AlphaEvolve, @GoogleResearch identified new ways to train DeepConsensus, improving DNA sequencing accuracy and output on PacBio instruments. Read more: https://t.co/BadpMAUmAu

gedof3 retweeted

Science Advances @ScienceAdvances

2 months ago

Despite widespread clinical testing, many patients with rare genetic diseases do not receive a definitive molecular diagnosis. Researchers report a new strategy using long-read RNA sequencing that improves the genetic diagnosis of rare diseases. https://t.co/f3GW0iE1af

ScienceAdvances's tweet photo. Despite widespread clinical testing, many patients with rare genetic diseases do not receive a definitive molecular diagnosis.

Researchers report a new strategy using long-read RNA sequencing that improves the genetic diagnosis of rare diseases. https://t.co/f3GW0iE1af https://t.co/2Hv4TVZjbo

gedof3 retweeted

David Pfau @pfau

2 months ago

I know nothing.

501

172K

Gerardo Fabián @gedof3

4 months ago

Marzo, 2026. CDMX

gedof3 retweeted

NEJM

@NEJM

4 months ago

𝐀𝐥𝐭𝐞𝐫𝐧𝐚𝐭𝐢𝐯𝐞 𝐬𝐩𝐥𝐢𝐜𝐢𝐧𝐠 is the use of different exons in the formation of messenger RNA from initially identical transcripts, which can result in the generation of related proteins from one gene, often in a manner specific to a type of tissue or a developmental stage. To learn more about this NEJM Illustrated Glossary term, read the editorial “Toward a Disease-Modifying Therapy for Dravet Syndrome” by Gemma L. Carvill, PhD, and Heather C. Mefford, MD, PhD, from @NUFeinbergMed and @StJudeResearch: https://t.co/78EEAj0GY8 Explore more terms: https://t.co/9j3iLbbGVo

NEJM's tweet photo. 𝐀𝐥𝐭𝐞𝐫𝐧𝐚𝐭𝐢𝐯𝐞 𝐬𝐩𝐥𝐢𝐜𝐢𝐧𝐠 is the use of different exons in the formation of messenger RNA from initially identical transcripts, which can result in the generation of related proteins from one gene, often in a manner specific to a type of tissue or a developmental stage.

To learn more about this NEJM Illustrated Glossary term, read the editorial “Toward a Disease-Modifying Therapy for Dravet Syndrome” by Gemma L. Carvill, PhD, and Heather C. Mefford, MD, PhD, from @NUFeinbergMed and @StJudeResearch: https://t.co/78EEAj0GY8

Explore more terms: https://t.co/9j3iLbbGVo

176

19K

Gerardo Fabián @gedof3

4 months ago

Genomics Applied to One Health. Buenos Aires, Argentina. Febrero 2026

gedof3 retweeted

Andres Moreno-Estrada @morestrada

4 months ago

Second time the MX biobank project (https://t.co/qaLnfPzcz2) gets the cover in a @Nature journal! This time in @NatureMedicine highlighting the work by @BarjonCar on Clinical genetic variation in Mexico. Another amazing cover by @mauguz33! https://t.co/k3eGLX4809

morestrada's tweet photo. Second time the MX biobank project (https://t.co/qaLnfPzcz2) gets the cover in a @Nature journal! This time in @NatureMedicine highlighting the work by @BarjonCar on Clinical genetic variation in Mexico. Another amazing cover by @mauguz33! https://t.co/k3eGLX4809 https://t.co/jtzk63X0dD

187

12K

gedof3 retweeted

L O P E @lopedetoledo

5 months ago

Amigos, nunca os fieis de las proyecciones ortogonales.

297

743

gedof3 retweeted

Leon Palafox

@leonpalafox

5 months ago

Veo a menudo que hay una confusión muy grande entre lo que es un embedding y un token. Un token es simplemente una unidad de texto. Puede ser una palabra, una sub-palabra o incluso un carácter, dependiendo del modelo. Es la forma en la que el texto se fragmenta para poder ser procesado. Ejemplo: “inteligencia artificial” → ["intel", "igencia", " artificial"] El token no “entiende” nada por sí mismo. Solo es una pieza de texto. Un embedding, en cambio, es una representación numérica de ese token (o de un conjunto de tokens) en un espacio vectorial. Es ahí donde aparece el significado: tokens con contextos o usos similares terminan cerca entre sí en ese espacio. Dicho de forma simple: •Token = qué entra al modelo •Embedding = cómo el modelo lo representa internamente Confundirlos es como confundir una letra con el significado de una palabra completa. Uno es la materia prima; el otro es la geometría que permite razonar, comparar y generalizar. Si quieres entender cómo funcionan realmente los modelos de lenguaje, esta distinción es básica. Sin embeddings, los tokens son solo texto partido. Sin tokens, no hay nada que representar.

leonpalafox's tweet photo. Veo a menudo que hay una confusión muy grande entre lo que es un embedding y un token.

Un token es simplemente una unidad de texto.
Puede ser una palabra, una sub-palabra o incluso un carácter, dependiendo del modelo. Es la forma en la que el texto se fragmenta para poder ser procesado.
Ejemplo:

“inteligencia artificial” → ["intel", "igencia", " artificial"]

El token no “entiende” nada por sí mismo. Solo es una pieza de texto.

Un embedding, en cambio, es una representación numérica de ese token (o de un conjunto de tokens) en un espacio vectorial.
Es ahí donde aparece el significado: tokens con contextos o usos similares terminan cerca entre sí en ese espacio.

Dicho de forma simple:
•Token = qué entra al modelo
•Embedding = cómo el modelo lo representa internamente

Confundirlos es como confundir una letra con el significado de una palabra completa.
Uno es la materia prima; el otro es la geometría que permite razonar, comparar y generalizar.

Si quieres entender cómo funcionan realmente los modelos de lenguaje, esta distinción es básica. Sin embeddings, los tokens son solo texto partido. Sin tokens, no hay nada que representar.

239

121

gedof3 retweeted

OpenAI

@OpenAI

5 months ago

Introducing Prism, a free workspace for scientists to write and collaborate on research, powered by GPT-5.2. Available today to anyone with a ChatGPT personal account: https://t.co/9mTLAbxPdH

16K

gedof3 retweeted

Espíritu Curioso

@EspirituCurioso

5 months ago

PAREN TODO. Estás ante "el hallazgo de la década" en México. 🇲🇽😱 En Oaxaca, acaba de emerger una tumba zapoteca del año 600 d.C. que ha dejado a los expertos en shock por su conservación. Guarda este hilo de datos para dimensionar esta joya de nuestra historia: 👇 📜 El Guardián: La figura central no es humana. Es un BÚHO (símbolo de la noche y el inframundo) cuyo pico "devora" o protege el rostro de un antiguo señor zapoteca. 🎨 Color Intacto: Después de 1,400 años, aún se conservan murales con pigmentos originales azules, verdes y ocres que narran procesiones rituales. 📍 No es Monte Albán: El hallazgo es en San Pablo Huitzo. Esto reescribe la historia, demostrando la complejidad y riqueza de los Valles Centrales fuera de la gran capital. Una muestra brutal de la grandeza milenaria de la cultura Zapoteca. Piel de gallina. 📸: @INAHmx

EspirituCurioso's tweet photo. PAREN TODO. Estás ante "el hallazgo de la década" en México. 🇲🇽😱

En Oaxaca, acaba de emerger una tumba zapoteca del año 600 d.C. que ha dejado a los expertos en shock por su conservación.

Guarda este hilo de datos para dimensionar esta joya de nuestra historia: 👇

📜 El Guardián: La figura central no es humana. Es un BÚHO (símbolo de la noche y el inframundo) cuyo pico "devora" o protege el rostro de un antiguo señor zapoteca.

🎨 Color Intacto: Después de 1,400 años, aún se conservan murales con pigmentos originales azules, verdes y ocres que narran procesiones rituales.

📍 No es Monte Albán: El hallazgo es en San Pablo Huitzo. Esto reescribe la historia, demostrando la complejidad y riqueza de los Valles Centrales fuera de la gran capital.

Una muestra brutal de la grandeza milenaria de la cultura Zapoteca. Piel de gallina.

📸: @INAHmx

860

174K

Gerardo Fabián @gedof3

6 months ago

El servicio de soporte solo se limitó a decir que el viaje no es elegible para un ajuste. Espero se pueda arreglar esta situación. El chofer es Miguel Ángel y el viaje temrina en Iztapalapa, un lugar totalmente diferente a mi destino. Espero me puedan dar una mejor atención.

gedof3's tweet photo. El servicio de soporte solo se limitó a decir que el viaje no es elegible para un ajuste. Espero se pueda arreglar esta situación. El chofer es Miguel Ángel y el viaje temrina en Iztapalapa, un lugar totalmente diferente a mi destino. Espero me puedan dar una mejor atención. https://t.co/nxxC9mnb5Z

Gerardo Fabián @gedof3

6 months ago

Cobro de un viaje que nunca hice: El día de ayer (14 de enero) solicité un viaje por la noche, sin embargo el chofer nunca pasó por mi y tampoco canceló el viaje. El día de hoy me llegó el cargo $2863. El viaje quedo activo por horas @Uber_MEX @Uber_Support @Profeco

Gerardo Fabián

@gedof3

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users