I don’t use Twitter X…
because of the crazy bullshit Elon is posting…
and I won’t buy a Tesla.
Add this to the data explaining user and sales decline.
Sad…
https://t.co/3F68zemT9Z
This 76-page paper on Prompting Techniques has become quite popular. A nice read for your weekend.
- "The Prompt Report: A Systematic Survey of Prompting Techniques": ✨
Explores structured understanding and taxonomy of 58 text-only prompting techniques, and 40 techniques for other modalities.
📌 The paper focuses on discrete prefix prompts rather than cloze prompts, because prefix prompts are widely used with modern LLM architectures like decoder-only models. It excludes soft prompts and techniques using gradient-based updates.
📌 The paper identifies 58 text-based prompting techniques broken into 6 major categories:
1) In-Context Learning (ICL) - learning from exemplars/instructions in the prompt
2) Zero-Shot - prompting without exemplars
3) Thought Generation - prompting the LLM to articulate reasoning
4) Decomposition - breaking down complex problems
5) Ensembling - using multiple prompts and aggregating outputs
6) Self-Criticism - having the LLM critique its own outputs
📌 For ICL, it discusses key design decisions like exemplar quantity, ordering, label quality, format, and similarity that critically influence output quality. It also covers ICL techniques like K-Nearest Neighbor exemplar selection.
📌 Extends the taxonomy to multilingual prompts, discussing techniques like translate-first prompting and cross-lingual ICL. It also covers multimodal prompts spanning image, audio, video, segmentation, and 3D modalities.
📌 More complex techniques like agents that access external tools, code generation, and retrieval augmented generation are also taxonomized. Evaluation techniques using LLMs are discussed.
📌 Prompting issues like security (prompt hacking), overconfidence, biases, and ambiguity are highlighted. Two case studies - benchmarking techniques on MMLU and an entrapment detection prompt engineering exercise - are presented.
There are a number of automated prompt optimization (APO) techniques emerging that focus on improving the system/instruction prompt either as an activity that occurs before the prompt is put into production🚧 or a recursive feedback loop ➿that evaluates the results of the prompt and provides suggestions to improve the system prompt based on those results.
This is an interesting example of the former... 🚧
Here is an article I wrote that calls out a number of real-world production use cases 🛞for APO where these techniques could potentially be applied. https://t.co/7e0ZBWkrWo
"With long context LLMs comes long prompts"👇
People typically just write 1- or 2-sentence quick prompts when using an LLM for a task.
How to create 1- or 2-page long prompts to boost performance?
���PromptAgent automatically writes long prompts for you!🔥
Without need of the demanding process of iteratively revising & expanding prompts manually as @AndrewYNg described below, PromptAgent uses Monte-Carlo Tree Search (the Reasoning-via-Planning algorithm) to automatically find the best, detailed prompts describing various aspects:
- Task description
- Term clarification
- Exception handling
- Priority and emphasis
- Solution guidance
- Formatting, etc.
👉Check out PromptAgent in the #LLMReasoners library: https://t.co/hQiawH2PHZ
📢Thanks @MaEnze98259 for contributing it to #LLMReasoners!
📄Papers:
- PromptAgent https://t.co/8SAwba3sn5
- Reasoning-via-Planning https://t.co/DilKTr5rIs
When I first met @shriyashku (Yash) we had this very conversation. That is, token prices and long-range views on token demand. I walked away thinking this guy is one of, if not the most, interesting founders🧠 I have met in a very long time. About 4 weeks later I joined @withmartian.🧨 I hope you enjoy reading Yash's words on economics, coal > steam engines, assembly language > python, LLM adoption and other interesting ideas! 💡💡💡
“The idea that the current state of the art LLMs would simply replace search might go down as one of most premature or potentially misguided strategy “blunders” in a long time”https://t.co/F9ID1OUAYW
If you are into prompt engineering check out this video that covers a very exciting frontier in prompt engineering - DSPy. The video was created by my good friend @CShorten30 🧠who is Head of Research @weaviate_io .
DSPy offers a new programming model, inspired by PyTorch, that gives you a massive amount of control over these LLM programs.
If you are not a developer but you are intro prompting don't be afraid. Connor does a good job of explaining the cure use cases. I found this to be one of the most fascinating things I've seen recently. Granted we have a lot of fascinating things going down in AI.
https://t.co/uBi2FUTsUo
🚨Announcing the largest study focused on *how* to optimize the prompts within LM programs, a key DSPy challenge.
Should we use LMs to… Craft instructions? Self-generate examples? Handle credit assignment? Specify a Bayesian model?
By @kristahopsalong* @michaelryan207* &team🧵
This is a breakdown of a successful tweet targeted to [AI technical audience] that got 31x amplification 🚀...
...that is @aparnadhinak the author has 4600 followers yet the post received 140k impressions!!
Killer Killer series @aparnadhinak I am FAN BOY📣
Topic: LLM as a Judge Evals ⚖️
One of the biggest open questions with AI is its impact on software business models over time. What seems to be under-appreciated about AI is how it can enable significant TAM expansion for a large number of categories over time when software can deliver outcomes, and not just enable existing work.
Right now the dominant business model in SaaS is a per seat model, which inevitably means that the total number of seats you can sell is limited to the number of employees in the organization that are relevant for your particular software. Legal software is roughly capped by the size of the legal team, audit software is capped by the size of the audit team, and so on. The implication of this is that the customer generally *already* has to have not only a need for your solution, but also the existing headcount in the organization to become users of your software. Incidentally, this is often why so many SaaS products tend to go after horizontal productivity categories, because this maximizes the number of potential users you have access to in an organization.
AI flips this on its head, especially with the power of AI Agents, and you get a new form of “outcome-as-a-service”. When AI is actually doing the work within the software itself, you're no longer constrained by the number of employees inside the organization to use the actual software. The software is quite literally bringing along the work with it and delivering a particular business outcome. It's clear the full potential of this playing out is not fully understood, as this represents a massive transformation of software as an industry.
When you are no longer constrained by the size of a team or department to use your software, markets are no longer arbitrarily capped in size. In this new era, software that powers a legal workflow actually brings the equivalent of legal knowledge work along with it, and software for audits brings the equivalent of audit work with it. All of a sudden small businesses, under-resourced teams in large enterprises, and all new geographies begin to open to up as markets. AI will enable otherwise niche categories of software to become much larger, and already large categories of software to become even bigger.
This transformation is similar to what we've seen in other markets where a new innovation has unlocked the size of a market well beyond its original demand. For instance, most investors and economists would have thought the size of Uber or Lyft's market was the size of the existing Taxi market, when in fact the market size was orders of magnitude larger once the shape of the product changed to make the offering easier to consume. We’ve seen this effect time and time again in areas like SaaS, cloud computing, a variety of mobile categories, and more.
We’re only in the earliest of stages of figuring out what this all means for the future of the software business model. Clearly all new variables of monetization will need to emerge when you start to pay software vendors for outcomes as opposed to just the software itself. But inevitably, when you remove the existing dominant constraint of enterprise software, TAM expansion will follow.
How I used GPT4o to assist writing a blog post aimed at experienced a LLM Judge AI developers. First: I did the manual work of reading all of our technical teams' related content in Slack. Then I put all of those messages in Gpt4o /// I'm out of characters should I upgrade????
@withmartian I am excited to have worked with so many progressive AI companies. The article is a good read for anyone building in AI. Challenges we all will face are described an while this is not a solutions piece some very exciting solutions are discussed such as an LLM as an Observer.
This area is a must read for anyone building in AI. I am fortunate to have access and to collaborate with these leading companies on the futures edge of AI! Excited to see where part 2 and part 3 of this research take us and the frontier of AI.
At Martian, we are fortunate to work with many of the world's most advanced users of AI. We see the problems they face on the leading edge of AI and collaborate closely with them to overcome these challenges.
In this first of a three-part series, we share a view into the future of prompt engineering we refer to as Automated Prompt Optimization (APO). In this article we summarize the challenges faced by leading AI companies including @mercor_ai , @G2dotcom, @copy_ai, @autobound_ai, @6senseInc, ZeltaLabs, @EDITED_HQ, @supernormalapp, and others.
We identify key issues like model variability, drift, and “secret prompt handshakes”. We reveal innovative techniques used to address these challenges, including LLM observers, prompt co-pilots, and human-in-the-loop feedback systems to refine prompts.
We invite the broader AI community to collaborate with us on research in this area. If you are interested in participating, please reach out to us!
https://t.co/DlcCWg1DEc