Bell Eapen @beapen - Twitter Profile

4 months ago

When #GenAI Ideas translate to practice with #DHTI https://t.co/1ssKGCiEjd Thank you @Hanson1954 for making this happen!

0

1

0

21

Bell Eapen @beapen

5 months ago

Bringing #GenAI Into the #EHR: Why #DHTI Matter (Part I) https://t.co/rrEKlFEuhH via @beapen

0

26

Bell Eapen @beapen

7 months ago

Are we trapped in a #Matrix ? https://t.co/WpBW90UNil

0

15

Bell Eapen @beapen

7 months ago

DHTI: a reference architecture for Gen AI in healthcare https://t.co/QSecktdAIT via @YouTube

0

29

Who to follow

Frank Kumli

@frankkumli

Shaping the Future: Future-Thinking, Strategy and Innovation!

Maui Derm

@MauiDerm

MauiDerm is the leader in dermatology medical education for MDs, NPs, and PAs. Follow for news and clinical pearls on melanoma, psoriasis, acne, and more!

Marty Brown

@healthcarecpa

President and CEO of PYA

Bell Eapen @beapen

7 months ago

Imagine setting up a #ChatEHR using #FHIR and #CDSHooks in just 15 minutes. Give it a try today, and while you’re there, explore the broader concept of #DHTI https://t.co/Jf95FAqgNh

0

2

0

39

Bell Eapen @beapen

7 months ago

Design Science Research in #healthcare : Bridging the Gap Between Ideas and Impact https://t.co/ZuO8urpBWC

0

16

beapen retweeted

AdaWeinstock @AWeinstock

9 months ago

Elated to share that the Weinstock lab just got its 1st R01!! We are SO grateful to the hardworking NIH staff for tirelessly pushing to get those last NOAs out

AWeinstock's tweet photo. Elated to share that the Weinstock lab just got its 1st R01!!

We are SO grateful to the hardworking NIH staff for tirelessly pushing to get those last NOAs out https://t.co/JcTrfsasFK

52

522

11

21

48K

Bell Eapen @beapen

about 1 year ago

#LLM-in-the-Loop #CQL execution with unstructured data and #FHIR terminology support https://t.co/ozBTJbzH59 via @beapen

0

32

Bell Eapen @beapen

about 1 year ago

Great to make some new connections at #AMIA #CIC25

0

32

beapen retweeted

Stephen Turner 🦋 @stephenturner.us @strnr

about 1 year ago

Genomic Tokenizer: Toward a biology-driven tokenization in transformer models for DNA sequences https://t.co/yN90IYcZA7 🧬🖥️🧪 https://t.co/uBG9ryWF8a

0

30

10

3K

beapen retweeted

Biology+AI Daily @BiologyAIDaily

about 1 year ago

Genomic Tokenizer: Toward a biology-driven tokenization in transformer models for DNA sequences 1. Genomic Tokenizer (GT) introduces a biologically grounded approach to DNA sequence tokenization, aligning with the central dogma of molecular biology by using codons—three-letter nucleotide sequences—as the core unit of tokenization. 2. Unlike traditional character or k-mer tokenizers, GT recognizes start and stop codons, assigns identical tokens to synonymous codons, and treats introns and out-of-frame regions as UNK tokens, reducing vocabulary size while preserving biological relevance. 3. GT is implemented within the HuggingFace tokenizer framework, enabling seamless integration into existing transformer-based DNA analysis pipelines and support for tasks like masked language modeling and sequence classification. 4. The tokenizer supports customizable start/stop codons and intron treatment, making it adaptable for different organisms, including prokaryotes and mitochondrial genomes. 5. In classification experiments using a lung cancer-related variant dataset, GT showed greater robustness to long sequence lengths compared to character tokenization, and outperformed it in longer sequence tasks. 6. While byte-pair encoding (BPE) achieved the highest overall performance, its large vocabulary comes with high computational cost. GT balances biological insight with computational efficiency and a compact vocabulary. 7. GT tokenization avoids issues of redundancy and information leakage in masked language modeling that are common in overlapping k-mer tokenizers, leading to cleaner training signals and potentially better generalization. 8. The biological foundation of GT allows it to better model frame-shift mutations, synonymous substitutions, and stop-gain variations—key features in predicting phenotypic impact from genetic data. 9. Preliminary comparisons highlight GT’s strength in biological modeling and suggest potential advantages for foundational model training across genomics tasks when compared with purely data-driven tokenizers. 10. GT is open-source, installable via PyPI, and encourages broader exploration across genomic datasets and transformer architectures, including long-context models such as HyenaDNA. 💻Code: https://t.co/XNazAyGLlK 📜Paper: https://t.co/oQ5wmVe0Vg #Genomics #Tokenization #Transformers #Bioinformatics #DNASequence #LLM #Codon #MaskedLanguageModel #DeepLearning #HuggingFace

BiologyAIDaily's tweet photo. Genomic Tokenizer: Toward a biology-driven tokenization in transformer models for DNA sequences

1. Genomic Tokenizer (GT) introduces a biologically grounded approach to DNA sequence tokenization, aligning with the central dogma of molecular biology by using codons—three-letter nucleotide sequences—as the core unit of tokenization.

2. Unlike traditional character or k-mer tokenizers, GT recognizes start and stop codons, assigns identical tokens to synonymous codons, and treats introns and out-of-frame regions as UNK tokens, reducing vocabulary size while preserving biological relevance.

3. GT is implemented within the HuggingFace tokenizer framework, enabling seamless integration into existing transformer-based DNA analysis pipelines and support for tasks like masked language modeling and sequence classification.

4. The tokenizer supports customizable start/stop codons and intron treatment, making it adaptable for different organisms, including prokaryotes and mitochondrial genomes.

5. In classification experiments using a lung cancer-related variant dataset, GT showed greater robustness to long sequence lengths compared to character tokenization, and outperformed it in longer sequence tasks.

6. While byte-pair encoding (BPE) achieved the highest overall performance, its large vocabulary comes with high computational cost. GT balances biological insight with computational efficiency and a compact vocabulary.

7. GT tokenization avoids issues of redundancy and information leakage in masked language modeling that are common in overlapping k-mer tokenizers, leading to cleaner training signals and potentially better generalization.

8. The biological foundation of GT allows it to better model frame-shift mutations, synonymous substitutions, and stop-gain variations—key features in predicting phenotypic impact from genetic data.

9. Preliminary comparisons highlight GT’s strength in biological modeling and suggest potential advantages for foundational model training across genomics tasks when compared with purely data-driven tokenizers.

10. GT is open-source, installable via PyPI, and encourages broader exploration across genomic datasets and transformer architectures, including long-context models such as HyenaDNA.

💻Code: https://t.co/XNazAyGLlK
📜Paper: https://t.co/oQ5wmVe0Vg

#Genomics #Tokenization #Transformers #Bioinformatics #DNASequence #LLM #Codon #MaskedLanguageModel #DeepLearning #HuggingFace

0

18

5

7

1K

Bell Eapen @beapen

about 1 year ago

Try this #DNA #tokenizer in your transformer model training pipelines. Feedback will be highly appreciated.

bioRxiv Bioinfo @biorxiv_bioinfo

about 1 year ago

Genomic Tokenizer: Toward a biology-driven tokenization in transformer models for DNA sequences https://t.co/IB0Oo7YWXN #biorxiv_bioinfo

0

30

6

18

2K

0

33

Bell Eapen @beapen

over 1 year ago

Complement or substitute? How AI increases the demand for human skills https://t.co/A9HVNgBiPC

0

1

0

42

Bell Eapen @beapen

over 1 year ago

The new #FHIR bulk $import is awesome! Importing #MIMIC was easy! https://t.co/vnQjw9ti0r

0

1

0

59

Bell Eapen @beapen

about 2 years ago

A "lang"chain is only as strong as its weakest link!

0

1

0

77

Bell Eapen @beapen

over 2 years ago

Navigating the Complexities of #GenAI in #Medicine: 5 Software development Blunders to Avoid https://t.co/zVBLLjqjYI

0

82

Bell Eapen @beapen

over 2 years ago

Cornell Researchers Introduce Graph Mamba Networks (GMNs): A General Framework for a New Class of Graph Neural Networks Based on Selective State Space Models https://t.co/hFYfmxHEdt via @Marktechpost

0

103

Bell Eapen @beapen

over 2 years ago

Counterfactual formulation of patient-specific root causes of disease https://t.co/sX6fVYCz8I

0

47

Bell Eapen @beapen

over 2 years ago

#Medprompt: How to architect #LLM solutions for #healthcare. https://t.co/ZPp3ANTIO7

1

0

98

Bell Eapen @beapen

almost 3 years ago

#Distilling #LLMs step by step to small task-specific models https://t.co/iu93sXr6NE via @beapen

0

1

0

117

Bell Eapen

@beapen

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users