AI/ML Engineer & PhD Researcher | LLMs for Software Engineering: coding agents, evals & agentic systems | Turning frontier AI research into production software
Si bien es importante impulsar iniciativas enfocadas en la creación de nuestros propios modelos fundacionales, tenemos un buen punto de partida para construir productos que realmente ataquen las necesidades de nuestra región🚀.
https://t.co/ljPyUHBHzV
¿Podría ChatGPT reemplazar a un abogado (junior) en Colombia? No estoy seguro si hoy podría, pero me encontré con algo que indica que puede que pronto si lo haga🤔.
En Colombia existe algo llamado "Examen de idoneidad profesional"🎓 (...).
Incluso también probé con Llama 3.2 y obtuve un para nada despreciable 88.8%. Esta es una alternativa de código abierto con apenas una fracción del tamaño/costo de los modelos anteriores. (...)
@freddier Puedo decir que soy afortunado pq tengo algunos años de exp. en el área y un buen trabajo también. Es 100% en inglés y después de meses aún lo siento abrumador. Puedo escribir bien y hacerme entender. Mi mayor problema es entender a los demás cuando me hablan
This is the second part of the article where I share the results of a experiment on the use of OpenAI's o1 to design software and then write code following that design, a common practice in the industry but mostly neglected by current LLM-powered tools.
https://t.co/Ep5TfsR90v
I'm a first-year PhD student working on LLMs for Software Engineering. I’m just starting my journey, but I want to invite you to read some of my current work: https://t.co/1ZXnFLnAPQ
I'd be happy to get some feedback or ideas for future collaboration.
Pandas is a powerful data analysis and manipulation library for Python!
NVIDIA just made Pandas 150x faster with zero code changes🔥
All you have to add is just a couple of lines of code:
%load_ext cudf.pandas
import pandas as pd
Transformer by Hand✍️
To study the transformer architecture, it is like opening up the hood of a car and seeing all sorts of engine parts: embeddings, positional encoding, feed-forward network, attention weighting, self-attention, cross-attention, multi-head attention, layer norm, skip connections, softmax, linear, Nx, shifted right, query, key, value, masking. This list of jargons feels overwhelming!
What are the key parts that really make the transformer (🚗) run?
In my opinion, the 🔑 key is the combination of: [attention weighting] and [feed-forward network].
All the other parts are enhancements to make the transformer (🚗) run faster and longer, which is still important because those enhancements are what lead us to "large" language models. 🚗 -> 🚚
Walkthrough
[1] Given
↳ Input features from the previous block (5 positions)
[2] Attention
↳ Feed all 5 features to a query-key attention module (QK) to obtain an attention weight matrix (A). I will skip the details of this module. In a follow-up post I will unpack this module.
[3] Attention Weighting
↳ Multiply the input features with the attention weight matrix to obtain attention weighted features (Z). Note that there are still 5 positions.
↳ The effect is to combine features across positions (horizontally), in this case, X1 := X1 + X2, X2 := X2 + X3....etc.
[4] FFN: First Layer
↳ Feed all 5 attention weighted features into the first layer.
↳ Multiply these features with the weights and biases.
↳ The effect is to combine features across feature dimensions (vertically).
↳ The dimensionality of each feature is increased from 3 to 4.
↳ Note that each position is processed by the same weight matrix. This is what the term "position-wise" is referring to.
↳ Note that the FFN is essentially a multi layer perceptron.
[5] ReLU
↳ Negative values are set to zeros by ReLU.
[6] FFN: Second Layer
↳ Feed all 5 features (d=3) into the second layer.
↳ The dimensionality of each feature is decreased from 4 back to 3.
↳ The output is fed to the next block to repeat this process.
↳ Note that the next block would have a completely separate set of parameters.
Together, the two key parts: attention and FFN, transform features both across positions and across feature dimensions. This is what makes the transformer (🚗) run!
GPT Researcher is the best open source repo for using LLMs to generate research reports
📁Their recent update used LangChain document loaders to connect it to local files📁
Check out the release AND the awesome walkthrough video (touches on both LangGraph and LangSmith)
Of course, I don't expect for a software to solve this for me instantly, but do you know some kind of documentation on good practices for doing it at scale? (4/4)
For many days, I've been building ETLs to pre-process tons of data from different datasources like GitHub, StackOverflow, among others, in order to train LLMs of different sizes. (1/4)
When should I stop doing this and start manually cleaning the data for boder cases?
I am surprised I didn't find a practical guide for doing this. All people talk about the importance of data quality and some practitioners implement some common data pre-processing steps. (3/4)
I've followed a simple iterative methodology consisting of:
- Run the ETL for each dataset
- Manually review a sample of the data
- Extend the ETL to include more cleaning cases
(2/4)