Fabian Peña

@fabiancpl

AI/ML Engineer & PhD Researcher | LLMs for Software Engineering: coding agents, evals & agentic systems | Turning frontier AI research into production software

Germany

Joined March 2011

651 Following

125 Followers

174 Posts

Fabian Peña @fabiancpl

3 days ago

@KaiXCreator Extended context window aren't the holy grail anymore. It's harness!

134

Fabian Peña @fabiancpl

9 months ago

Si bien es importante impulsar iniciativas enfocadas en la creación de nuestros propios modelos fundacionales, tenemos un buen punto de partida para construir productos que realmente ataquen las necesidades de nuestra región🚀. https://t.co/ljPyUHBHzV

Fabian Peña @fabiancpl

9 months ago

¿Podría ChatGPT reemplazar a un abogado (junior) en Colombia? No estoy seguro si hoy podría, pero me encontré con algo que indica que puede que pronto si lo haga🤔. En Colombia existe algo llamado "Examen de idoneidad profesional"🎓 (...).

115

Fabian Peña @fabiancpl

9 months ago

Incluso también probé con Llama 3.2 y obtuve un para nada despreciable 88.8%. Esta es una alternativa de código abierto con apenas una fracción del tamaño/costo de los modelos anteriores. (...)

Who to follow

IEEE VIS

@ieeevis

The premier forum for visualization advances for academia, government, and industry. We invite you to share your research, insights, and enthusiasm at IEEE VIS

José Molano-Pulido

@JoseFMolano

Ingeniero de datos | Tweeting in english at @MolanoJoseF

Sir WA Lancelot

@lancelot_wa

Advance Research & Data Science enthusiast. #AI #IoT

Fabian Peña @fabiancpl

about 1 year ago

@freddier Que tal una competencia de agentes? Con premio jeje

152

Fabian Peña @fabiancpl

about 1 year ago

@freddier Puedo decir que soy afortunado pq tengo algunos años de exp. en el área y un buen trabajo también. Es 100% en inglés y después de meses aún lo siento abrumador. Puedo escribir bien y hacerme entender. Mi mayor problema es entender a los demás cuando me hablan

fabiancpl retweeted

John Rush

@johnrushx

over 1 year ago

The Era of AI Agents began 🫥 I don't know if I'm terrified or excited. see for yourself 🧵:

164

685

10K

Fabian Peña @fabiancpl

over 1 year ago

This is the second part of the article where I share the results of a experiment on the use of OpenAI's o1 to design software and then write code following that design, a common practice in the industry but mostly neglected by current LLM-powered tools. https://t.co/Ep5TfsR90v

141

fabiancpl retweeted

Sumanth

@Sumanth_077

over 1 year ago

How LLMs work, clearly explained with visuals:

266

204K

Fabian Peña @fabiancpl

over 1 year ago

I'm a first-year PhD student working on LLMs for Software Engineering. I’m just starting my journey, but I want to invite you to read some of my current work: https://t.co/1ZXnFLnAPQ I'd be happy to get some feedback or ideas for future collaboration.

fabiancpl retweeted

Sumanth

@Sumanth_077

almost 2 years ago

Pandas is a powerful data analysis and manipulation library for Python! NVIDIA just made Pandas 150x faster with zero code changes🔥 All you have to add is just a couple of lines of code: %load_ext cudf.pandas import pandas as pd

176

90K

fabiancpl retweeted

Tom Yeh

@ProfTomYeh

about 2 years ago

Transformer by Hand✍️ To study the transformer architecture, it is like opening up the hood of a car and seeing all sorts of engine parts: embeddings, positional encoding, feed-forward network, attention weighting, self-attention, cross-attention, multi-head attention, layer norm, skip connections, softmax, linear, Nx, shifted right, query, key, value, masking. This list of jargons feels overwhelming! What are the key parts that really make the transformer (🚗) run? In my opinion, the 🔑 key is the combination of: [attention weighting] and [feed-forward network]. All the other parts are enhancements to make the transformer (🚗) run faster and longer, which is still important because those enhancements are what lead us to "large" language models. 🚗 -> 🚚 Walkthrough [1] Given ↳ Input features from the previous block (5 positions) [2] Attention ↳ Feed all 5 features to a query-key attention module (QK) to obtain an attention weight matrix (A). I will skip the details of this module. In a follow-up post I will unpack this module. [3] Attention Weighting ↳ Multiply the input features with the attention weight matrix to obtain attention weighted features (Z). Note that there are still 5 positions. ↳ The effect is to combine features across positions (horizontally), in this case, X1 := X1 + X2, X2 := X2 + X3....etc. [4] FFN: First Layer ↳ Feed all 5 attention weighted features into the first layer. ↳ Multiply these features with the weights and biases. ↳ The effect is to combine features across feature dimensions (vertically). ↳ The dimensionality of each feature is increased from 3 to 4. ↳ Note that each position is processed by the same weight matrix. This is what the term "position-wise" is referring to. ↳ Note that the FFN is essentially a multi layer perceptron. [5] ReLU ↳ Negative values are set to zeros by ReLU. [6] FFN: Second Layer ↳ Feed all 5 features (d=3) into the second layer. ↳ The dimensionality of each feature is decreased from 4 back to 3. ↳ The output is fed to the next block to repeat this process. ↳ Note that the next block would have a completely separate set of parameters. Together, the two key parts: attention and FFN, transform features both across positions and across feature dimensions. This is what makes the transformer (🚗) run!

460

245K

fabiancpl retweeted

LangChain

@LangChain

about 2 years ago

GPT Researcher is the best open source repo for using LLMs to generate research reports 📁Their recent update used LangChain document loaders to connect it to local files📁 Check out the release AND the awesome walkthrough video (touches on both LangGraph and LangSmith)

143

145

21K

Fabian Peña @fabiancpl

about 2 years ago

Of course, I don't expect for a software to solve this for me instantly, but do you know some kind of documentation on good practices for doing it at scale? (4/4)

Fabian Peña @fabiancpl

about 2 years ago

For many days, I've been building ETLs to pre-process tons of data from different datasources like GitHub, StackOverflow, among others, in order to train LLMs of different sizes. (1/4)

Fabian Peña @fabiancpl

about 2 years ago

When should I stop doing this and start manually cleaning the data for boder cases? I am surprised I didn't find a practical guide for doing this. All people talk about the importance of data quality and some practitioners implement some common data pre-processing steps. (3/4)

Fabian Peña @fabiancpl

about 2 years ago

I've followed a simple iterative methodology consisting of: - Run the ETL for each dataset - Manually review a sample of the data - Extend the ETL to include more cleaning cases (2/4)

Fabian Peña @fabiancpl

about 2 years ago

@ylecun How can we start doing that? Could you recommend us a few references?

Fabian Peña

@fabiancpl

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users