Unless you are rich, don’t make your child a sportsperson: Badminton coach Pullela Gopichand
https://t.co/u377HrwPFW
Download the TOI app now:
https://t.co/R7g1Tn3dwa
Wisdom is learning the lessons we thought we already knew. DeepSeek reminds us of three important learnings from computing history:
1) Computing obeys the gas law. Making it dramatically cheaper will expand the market for it. The markets are getting it wrong, this will make AI much more broadly deployed.
2) Engineering is about constraints. The Chinese engineers had limited resources, and they had to find creative solutions.
3) Open Wins. DeepSeek will help reset the increasingly closed world of foundational AI model work. Thank you DeepSeek team.
Buying a flat in India (as an investment), is a waste of money. This only makes the builder richer, not you.
There are 3 specific reasons why you should buy a flat:
1) To live in it (than its not investing). You love the place, you buy it.
2) You can build a business on it (eg. an AirBnB); this improves your yield.
3) You are getting a distressed deal. And, the long-term rent on it would justify the price. This usually does not happen anymore in any major city in India.
The worst reason to buy a flat is: oh in the last 3 years the prices have doubled, so.... you can imagine the next 3 years?
Buddy, that's the builder's rate. What they can sell Tower B now, after making Tower A.
Not the value of your 3 year old flat, which you purchased in Tower A.
After a recent price reduction by OpenAI, GPT-4o tokens now cost $4 per million tokens (using a blended rate that assumes 80% input and 20% output tokens). GPT-4 cost $36 per million tokens at its initial release in March 2023. This price reduction over 17 months corresponds to about a 79% drop in price per year. (4/36 = (1 - p)^{17/12})
As you can see, token prices are falling rapidly! One force that’s driving prices down is the release of open weights models such as Llama 3.1. If API providers, including startups Anyscale, Fireworks, Together AI, and some large cloud companies, do not have to worry about recouping the cost of developing a model, they can compete directly on price and a few other factors such as speed.
Further, hardware innovations by companies such as Groq (a leading player in fast token generation), Samba Nova (which serves Llama 3.1 405B tokens at an impressive 114 tokens per second), and wafer-scale computation startup Cerebras (which just announced a new offering this week), as well as the semiconductor giants NVIDIA, AMD, Intel, and Qualcomm, will drive further price cuts.
When building applications, I find it useful to design to where the technology is going rather than only where it has been. Based on the technology roadmaps of multiple software and hardware companies — which include improved semiconductors, smaller models, and algorithmic innovation in inference architectures — I’m confident that token prices will continue to fall rapidly.
This means that even if you build an agentic workload that isn’t entirely economical, falling token prices might make it economical at some point. As I wrote previously, being able to process many tokens is particularly important for agentic workloads, which must call a model many times before generating a result. Further, even agentic workloads are already quite affordable for many applications. Let's say you build an application to assist a human worker, and it uses 100 tokens per second continuously: At $4/million tokens, you'd be spending only $1.44/hour – which is significantly lower than the minimum wage in the U.S. and many other countries.
So how can AI companies prepare?
- First, I continue to hear from teams that are surprised to find out how cheap LLM usage is when they actually work through cost calculations. For many applications, it isn’t worth too much effort to optimize the cost. So first and foremost, I advise teams to focus on building a useful application rather than on optimizing LLM costs.
- Second, even if an application is marginally too expensive to run today, it may be worth deploying in anticipation of lower prices.
- Finally, as new models get released, it might be worthwhile to periodically examine an application to decide whether to switch to a new model either from the same provider (such as switching from GPT-4 to the latest GPT-4o-2024-08-06) or a different provider, to take advantage of falling prices and/or increased capabilities.
Because multiple providers now host Llama 3.1 and other open-weight models, if you use one of these models, it might be possible to switch between providers without too much testing (though implementation details — specifically quantization, does mean that different offerings of the model do differ in performance). When switching between models, unfortunately, a major barrier is still the difficulty of implementing evals, so carrying out regression testing to make sure your application will still perform after you swap in a new model can be challenging. However, as the science of carrying out evals improves, I’m optimistic that this will become easier.
[Original text (with links): https://t.co/txk7q32EXn ]
Thank you Meta and the Llama team for your huge contributions to open-source! Llama 3.1 with increased context length and improved capabilities is a wonderful gift to everyone.
I hope foolish regulations don't like California's proposed SB1047 don't stop such innovations.
The Art of Saying No
Contextual Noncompliance in Language Models
Chat-based language models are designed to be helpful, yet they should not comply with every user request. While most existing work primarily focuses on refusal of "unsafe" queries, we posit that the scope of noncompliance should be broadened. We introduce a comprehensive taxonomy of contextual noncompliance describing when and how models should not comply with user requests. Our taxonomy spans a wide range of categories including incomplete, unsupported, indeterminate, and humanizing requests (in addition to unsafe requests). To test noncompliance capabilities of language models, we use this taxonomy to develop a new evaluation suite of 1000 noncompliance prompts. We find that most existing models show significantly high compliance rates in certain previously understudied categories with models like GPT-4 incorrectly complying with as many as 30% of requests. To address these gaps, we explore different training strategies using a synthetically-generated training set of requests and expected noncompliant responses. Our experiments demonstrate that while direct finetuning of instruction-tuned models can lead to both over-refusal and a decline in general capabilities, using parameter efficient methods like low rank adapters helps to strike a good balance between appropriate noncompliance and other capabilities.