Sergio Perez

@sergiopprz

AI Solutions Architect at @nvidia. Working on LLMs, accelerators and AI. Before @graphcoreai, @amazon, PhD @imperialcollege. All views are my own. He/him.

London

Joined November 2011

1.2K Following

819 Followers

885 Posts

Pinned Tweet

Sergio Perez @sergiopprz

over 2 years ago

Time for a new challenge: I've joined @nvidia as Solutions Architect for Conversational AI 🗣️🤖🧠! I look forward to helping the AI community to use NVIDIA's GPUs for language applications. I'll be based in London, do reach out if you come over!

sergiopprz's tweet photo. Time for a new challenge: I've joined @nvidia as Solutions Architect for Conversational AI 🗣️🤖🧠!

I look forward to helping the AI community to use NVIDIA's GPUs for language applications. I'll be based in London, do reach out if you come over! https://t.co/X2iN1WsB3G

2

24

0

0

1K

Sergio Perez @sergiopprz

24 days ago

We are live discussing with the @bielikllm team about their Bielik models and how they compressed their Bielik 11B to 7B. Join us to ask life about pruning, distillation, post-training alignment, etc https://t.co/GdvQ5kzn2H

0

0

0

0

45

sergiopprz retweeted

over 1 year ago

🟢 ¡APÚNTATE AL GTC 2025! El próximo 17-21 MARZO volverá el gran evento de NVIDIA. Un montón de ponencia de IA, robótica y mucho más! Y como cada año, quien se registre conmigo podrá conseguir un gran premio! Todos los detalles en este hilo 👇🧵

DotCSV's tweet photo. 🟢 ¡APÚNTATE AL GTC 2025!

El próximo 17-21 MARZO volverá el gran evento de NVIDIA. Un montón de ponencia de IA, robótica y mucho más!

Y como cada año, quien se registre conmigo podrá conseguir un gran premio!

Todos los detalles en este hilo 👇🧵

30

666

722

111

182K

Sergio Perez @sergiopprz

over 1 year ago

@omarespejel @igeniusai The data center may be in Italy, but the chips will probably come from Taiwan since TSMC is there and they produce the majority of NVIDIA chips.

1

3

0

0

53

Who to follow

Verified account

VP AI Search @Cohere | ex-huggingface | Creator of SBERT (https://t.co/MKKOMfuQ4C)

Lorenzo Melchor

Deputy Director @desdelamoncloa #ONAC supporting the 🇪🇸 #Science4Policy ecosystem, before @EU_Commission | PhD Mol Biol & Master in Policy Analysis

Verified account

Building a more Thought Full world • Humanity after AI • Math Professor @CarnegieMellon • Social entrepreneur • International Math Olympiad Foundation VP

Sergio Perez @sergiopprz

over 1 year ago · Reading

It's really inspiring to be working with @igeniusai. I look forward to all the training and inference that will come out of their new Colosseum supercomputer with Grace Blackwell Superchips! Great news for Italy and Europe.

over 1 year ago

Unveiling Colosseum, one of the world’s largest #NVIDIADGX AI supercomputers, built in collaboration with @NVIDIA, and powered by NVIDIA Grace Blackwell Superchips, for the training and deployment of advanced models in highly regulated industries. Read more in the press release: https://t.co/JR1uwxGwva

domynai's tweet photo. Unveiling Colosseum, one of the world’s largest #NVIDIADGX AI supercomputers, built in collaboration with @NVIDIA, and powered by NVIDIA Grace Blackwell Superchips, for the training and deployment of advanced models in highly regulated industries. Read more in the press release: https://t.co/JR1uwxGwva

1

42

11

3

59K

1

5

0

0

297

Sergio Perez @sergiopprz

over 1 year ago

@katjasrz Get well soon, Katja!

1

0

0

0

91

Sergio Perez @sergiopprz

over 1 year ago · London

@dctanner @gusthema Thanks for organising! It's a great community of AI engineers. Will come again to the next event!

0

2

0

0

19

Sergio Perez @sergiopprz

over 1 year ago

The motivation is to help AI engineers to estimate the number of GPUs needed to run an AI model under certain requests/second or latency requirements. The course is highly practical, with several notebooks using python, bash and deployment of @nvidia Inference Microservices.

0

1

0

0

85

Sergio Perez @sergiopprz

over 1 year ago

Are you interested in AI Inference and want to dive deeper? Dima Mironov and I have created a practical course about "Sizing LLM Inference Systems". Check it out and let us know what you think: https://t.co/YpsJFy8iKY

sergiopprz's tweet photo. Are you interested in AI Inference and want to dive deeper? Dima Mironov and I have created a practical course about "Sizing LLM Inference Systems".

Check it out and let us know what you think: https://t.co/YpsJFy8iKY https://t.co/g8aSOqKICA

1

1

0

1

157

Sergio Perez @sergiopprz

over 1 year ago · London

@harari_yuval in London this evening presenting #Nexus: a brief history of information networks. He's on the side of existential risk with AI, but acknowledges that "we've just seen AI amoebas, who knows how the AI T-Rex will be?" For him AI is Alien Intelligence.

sergiopprz's tweet photo. @harari_yuval in London this evening presenting #Nexus: a brief history of information networks.
He's on the side of existential risk with AI, but acknowledges that "we've just seen AI amoebas, who knows how the AI T-Rex will be?" For him AI is Alien Intelligence. https://t.co/UfUBEbI4S4

0

0

0

0

41

Sergio Perez @sergiopprz

over 1 year ago

@dctanner Looking forward to connecting with the AI Engineer community in the next meetup!

0

1

0

0

20

Sergio Perez @sergiopprz

over 1 year ago

Join us at the AI Engineer meetup in London on the 17th October! I'll be presenting about AI inference and how companies leverage cloud AI services or do-it-yourself strategies. As a bridge between both, I'll talk about @nvidia NIMs and its benefits. https://t.co/P62qPr2B4s

Damien C. Tanner

almost 2 years ago

AI Engineer London meetup is back again on October 17th. Big thanks to NVIDIA and again Cloudflare for supporting us. Apply for a spot here: https://t.co/MBVeJu0ij6

2

21

3

2

2K

0

3

0

0

380

Sergio Perez @sergiopprz

over 1 year ago

Even after 4 years working in the semiconductor industry, I've learned a lot from reading this Economist's overview. The end of Moore's law, Dennard's scaling, new ideas to reach a trillion transistors, specialized chips for AI... This overview is superb.

almost 2 years ago

AI has returned chipmaking to the heart of computer technology. But it is time for some new ideas. Read our latest Technology Quarterly to learn how advances in chipmaking, both incremental and radical, can keep the exponential engine humming https://t.co/NgBVGmKMQW 👇

10

12

7

5

46K

0

1

1

0

350

Sergio Perez @sergiopprz

over 1 year ago

Join us on the 3rd of October at the JADE day in University of Oxford! I'll give a keynote about GPUs for scientific computing and AI.

JADE2 HPC Service @JADE2_HPC

almost 2 years ago

📢Keynote Speaker for JADE Day 2024: Sergio Perez, NVIDIA Sergio will give a talk on how NVIDIA enables scientific breakthroughs covering #GenAI, #GPUs, software libraries #NVIDIA has developed for scientific researchers & more 🔗https://t.co/8sAE5k4l9i

JADE2_HPC's tweet photo. 📢Keynote Speaker for JADE Day 2024: Sergio Perez, NVIDIA

Sergio will give a talk on how NVIDIA enables scientific breakthroughs covering #GenAI, #GPUs, software libraries #NVIDIA has developed for scientific researchers & more

🔗https://t.co/8sAE5k4l9i https://t.co/AlnhapjBES

0

4

1

0

467

0

4

0

0

280

sergiopprz retweeted

almost 2 years ago

Introducing ChatQA 2, a Llama3-based model with a 128K context window, designed to close the gap between open LLMs and leading proprietary models like GPT-4-Turbo in both long-context and RAG capabilities. The long-context capability of LLMs is sometimes viewed as a rival to RAG, but from a pragmatic perspective, they complement each other. RAG efficiently retrieves relevant contexts for query-based tasks from millions or billions of tokens, a feat long-context LLMs cannot achieve. Meanwhile, long-context LLMs excel at summarizing entire documents, where RAG may fall short. Thus, A state-of-the-art LLM should excel in both capabilities. Highlights: - The Llama3-ChatQA-2-70B model achieves accuracy comparable to GPT-4-Turbo-2024-0409 on real-world long-context tasks, and surpasses it on the ChatRAG benchmark. - We find the long-context retriever can alleviate the top-k context fragmentation issue in RAG, further improving RAG-based results for long-context understanding tasks. - We provide extensive comparisons between RAG and long-context solutions using state-of-the-art long-context LLMs. Further Information: - Paper: https://t.co/jEPQOsihpt - Model weights & training blend: To be released soon!

_weiping's tweet photo. Introducing ChatQA 2, a Llama3-based model with a 128K context window, designed to close the gap between open LLMs and leading proprietary models like GPT-4-Turbo in both long-context and RAG capabilities.

The long-context capability of LLMs is sometimes viewed as a rival to RAG, but from a pragmatic perspective, they complement each other. RAG efficiently retrieves relevant contexts for query-based tasks from millions or billions of tokens, a feat long-context LLMs cannot achieve. Meanwhile, long-context LLMs excel at summarizing entire documents, where RAG may fall short. Thus, A state-of-the-art LLM should excel in both capabilities.

Highlights:
- The Llama3-ChatQA-2-70B model achieves accuracy comparable to GPT-4-Turbo-2024-0409 on real-world long-context tasks, and surpasses it on the ChatRAG benchmark.
- We find the long-context retriever can alleviate the top-k context fragmentation issue in RAG, further improving RAG-based results for long-context understanding tasks.
- We provide extensive comparisons between RAG and long-context solutions using state-of-the-art long-context LLMs.

Further Information:
- Paper: https://t.co/jEPQOsihpt
- Model weights & training blend: To be released soon!

_weiping's tweet photo. Introducing ChatQA 2, a Llama3-based model with a 128K context window, designed to close the gap between open LLMs and leading proprietary models like GPT-4-Turbo in both long-context and RAG capabilities.

The long-context capability of LLMs is sometimes viewed as a rival to RAG, but from a pragmatic perspective, they complement each other. RAG efficiently retrieves relevant contexts for query-based tasks from millions or billions of tokens, a feat long-context LLMs cannot achieve. Meanwhile, long-context LLMs excel at summarizing entire documents, where RAG may fall short. Thus, A state-of-the-art LLM should excel in both capabilities.

Highlights:
- The Llama3-ChatQA-2-70B model achieves accuracy comparable to GPT-4-Turbo-2024-0409 on real-world long-context tasks, and surpasses it on the ChatRAG benchmark.
- We find the long-context retriever can alleviate the top-k context fragmentation issue in RAG, further improving RAG-based results for long-context understanding tasks.
- We provide extensive comparisons between RAG and long-context solutions using state-of-the-art long-context LLMs.

Further Information:
- Paper: https://t.co/jEPQOsihpt
- Model weights & training blend: To be released soon!

_weiping's tweet photo. Introducing ChatQA 2, a Llama3-based model with a 128K context window, designed to close the gap between open LLMs and leading proprietary models like GPT-4-Turbo in both long-context and RAG capabilities.

The long-context capability of LLMs is sometimes viewed as a rival to RAG, but from a pragmatic perspective, they complement each other. RAG efficiently retrieves relevant contexts for query-based tasks from millions or billions of tokens, a feat long-context LLMs cannot achieve. Meanwhile, long-context LLMs excel at summarizing entire documents, where RAG may fall short. Thus, A state-of-the-art LLM should excel in both capabilities.

Highlights:
- The Llama3-ChatQA-2-70B model achieves accuracy comparable to GPT-4-Turbo-2024-0409 on real-world long-context tasks, and surpasses it on the ChatRAG benchmark.
- We find the long-context retriever can alleviate the top-k context fragmentation issue in RAG, further improving RAG-based results for long-context understanding tasks.
- We provide extensive comparisons between RAG and long-context solutions using state-of-the-art long-context LLMs.

Further Information:
- Paper: https://t.co/jEPQOsihpt
- Model weights & training blend: To be released soon!

_weiping's tweet photo. Introducing ChatQA 2, a Llama3-based model with a 128K context window, designed to close the gap between open LLMs and leading proprietary models like GPT-4-Turbo in both long-context and RAG capabilities.

The long-context capability of LLMs is sometimes viewed as a rival to RAG, but from a pragmatic perspective, they complement each other. RAG efficiently retrieves relevant contexts for query-based tasks from millions or billions of tokens, a feat long-context LLMs cannot achieve. Meanwhile, long-context LLMs excel at summarizing entire documents, where RAG may fall short. Thus, A state-of-the-art LLM should excel in both capabilities.

Highlights:
- The Llama3-ChatQA-2-70B model achieves accuracy comparable to GPT-4-Turbo-2024-0409 on real-world long-context tasks, and surpasses it on the ChatRAG benchmark.
- We find the long-context retriever can alleviate the top-k context fragmentation issue in RAG, further improving RAG-based results for long-context understanding tasks.
- We provide extensive comparisons between RAG and long-context solutions using state-of-the-art long-context LLMs.

Further Information:
- Paper: https://t.co/jEPQOsihpt
- Model weights & training blend: To be released soon!

2

151

40

75

20K

Sergio Perez @sergiopprz

almost 2 years ago

Eager to try Llama 3.1 405B? Start calling it now with our inference endpoints: https://t.co/lA4keH47WS With @NVIDIA AI Foundry, you can customise Llama 3.1 with your data: https://t.co/HoWAcGqK9p

0

1

0

0

126

Sergio Perez @sergiopprz

almost 2 years ago

Hi, any recommendations about VSCode extensions for OpenAI-compatible LLMs? I've been trying llm-vscode from @huggingface but I get errors https://t.co/RFoTG2Pvo3

0

0

0

0

102

Sergio Perez @sergiopprz

almost 2 years ago

"Graphcore today announced that the company has been acquired by SoftBank Group Corp" Best wishes for my former colleagues in this new era for @graphcoreai

Graphcore @graphcoreai

almost 2 years ago

Exciting news: Graphcore joins @SoftBank_Group to build next generation of AI compute. https://t.co/DcEMoa1xSh

graphcoreai's tweet photo. Exciting news: Graphcore joins @SoftBank_Group to build next generation of AI compute. https://t.co/DcEMoa1xSh https://t.co/3Q9NcM5vgH

4

80

32

10

50K

0

5

1

2

852

Sergio Perez @sergiopprz

almost 2 years ago

@thecharlieblake Looking forward to your next AI papers, this is not the end of the journey for unit scaling 😉

0

1

0

0

164

Sergio Perez @sergiopprz

about 2 years ago

Are you in the Benelux area this week? 🇧🇪🇳🇱🇱🇺 Join us in Brussels at the @RedHat Tech Day to discuss about AI inference with NIMs and how to deploy them for production! https://t.co/FpvFV5bX5g

0

0

0

0

145

Sergio Perez @sergiopprz

over 2 years ago · London

excited 4 today #GTC24 https://t.co/0dAVeeZ3lg

sergiopprz's tweet photo. excited 4 today #GTC24

https://t.co/0dAVeeZ3lg https://t.co/xyLK8F3eLG

0

8

0

0

349

Last Seen Users on Sotwe

Trends for you

Most Popular Users