André Jonasson @afjonasson - Twitter Profile

10 months ago

Full details in our paper: https://t.co/q0zsOFz5nA which presents further analyses of RoPE's effect on model activations.

0

1

0

57

André Jonasson @afjonasson

10 months ago

What are those two large-magnitude bands in the activations of the queries and keys of LLMs with rotary positional embeddings? 🧵

afjonasson's tweet photo. What are those two large-magnitude bands in the activations of the queries and keys of LLMs with rotary positional embeddings? 🧵 https://t.co/fHmoOgJUKZ

1

5

1

199

André Jonasson @afjonasson

10 months ago

Interesting observation: features with large activations tend to cluster near the lower bound. By feeding just one sequence (shorter than context length) and inspecting key/query activations, you can get a rough estimate of a model’s context length. (black dashes = lower bound)

afjonasson's tweet photo. Interesting observation: features with large activations tend to cluster near the lower bound. By feeding just one sequence (shorter than context length) and inspecting key/query activations, you can get a rough estimate of a model’s context length. (black dashes = lower bound) https://t.co/DiMFEe6gzN

1

0

76

André Jonasson @afjonasson

about 1 year ago

We're using this as part of an implementation of an AI Assistant with python code execution as a tool. Parsing the streaming json that's being generated for the tool call allows us to display the code as it's being generated to the users.

0

1

0

59

Who to follow

Joel Martin

@joeliomartini

André Jonasson @afjonasson

about 1 year ago

Found pydantic's `from_json` function's argument `partial_mode="trailing-strings"` for parsing streaming json and therefore tool calls (e.g. code) through an obscure github issue, deserves more visibility in this context.

2

1

0

178

André Jonasson @afjonasson

about 1 year ago

https://t.co/PoMNAivgtz jiter instead of pydantic can be used if pydantic isn't being used elsewhere. jsonrepair is another option worth exploring.

1

0

64

André Jonasson @afjonasson

about 2 years ago

arxiv link: https://t.co/scudmDySqP

0

35

André Jonasson @afjonasson

about 2 years ago

Some additional aspects of Meta's paper on improving LLM performance using a multi-token objective function that I enjoyed: 1. Choice points, tokens that decide the future trajectory of the generation. 2. The sequential backward trick to maintain the memory consumption.

afjonasson's tweet photo. Some additional aspects of Meta's paper on improving LLM performance using a multi-token objective function that I enjoyed:
1. Choice points, tokens that decide the future trajectory of the generation.
2. The sequential backward trick to maintain the memory consumption. https://t.co/k78gN5XQ6A

1

0

1

62

André Jonasson @afjonasson

over 2 years ago

@TheHamedMP Somewhat relevant tweet - sidestepping abstractions may be best option depending on what you are doing: https://t.co/GzagBBjEgR

Jim Fan

@DrJimFan

almost 3 years ago

When developing Voyager, we only used the thinnest abstraction in LangChain and didn't touch the agent API at all. Hackability is the No. 1 important feature to cutting-edge AI research & products. Libraries that augment LLMs (vector DB, search, interpreter) are more useful than wrappers around them. Don't get me wrong: LangChain is fantastic for education and well-established workflows that work out of the box. But you are better off building your own pipeline for anything beyond. This reminds me of the gazillion "high-level packages" that used to haunt Tensorflow, while nothing beats raw PyTorch code in usability, flexibility, simplicity, and elegance.

10

419

59

213

261K

1

0

66

André Jonasson @afjonasson

almost 3 years ago

Link to video: https://t.co/Y5FRWxiwol Link to a related paper: https://t.co/4QiGFBzxpZ end/🧵

0

61

André Jonasson @afjonasson

almost 3 years ago

John Schulman's (OpenAI) presentation on RLHF has some great information about pitfalls when labeling Supervised Fine-tuning (SFT) answers for Large Language Models (LLMs). Here are some nuggets from the presentation. 1/🧵

afjonasson's tweet photo. John Schulman's (OpenAI) presentation on RLHF has some great information about pitfalls when labeling Supervised Fine-tuning (SFT) answers for Large Language Models (LLMs).

Here are some nuggets from the presentation.
1/🧵 https://t.co/mD6R3BjrX0

1

0

1

144

André Jonasson @afjonasson

almost 3 years ago

An immediate consequence is that Open Source Foundation Models that are fine-tuned on the data collected from the output of ChatGPT and GPT, or from any other LLM, may become incentivised to hallucinate or withhold information. 7/

1

0

74

André Jonasson

@afjonasson

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users