NLP Cloud is a privacy-focused OpenAI alternative.
Either use our pre-trained generative AI models through a simple API or fine-tune and deploy your own models.
GPT-OSS 120B is now available on NLP Cloud!
Our implementation is fast and reliable at scale, so we encourage you to try it on our Playground: https://t.co/JzM1Re8ZUx
If you have questions about how to integrate NLP Cloud and GPT-OSS 120B into your stack, please reach out!
🇫🇷 Article : Panorama de l’IA française
🇫🇷 13 IA françaises à connaître absolument : Mistral AI, Giskard, NLP Cloud…
📍 Souveraineté, performance, innovation
🔗 Découvrez les pépites made in France : https://t.co/6ZO7rfG4CB
#IA#FrenchTech#MistralAI
Anthropic made a nice article about how they have implemented web search in Claude using a multi-agent system:
https://t.co/maECEyK9ml
I do recommend this article if you are building an agentic application because it gives you some ideas about how your system could be architected. It mentions things like:
- Having a central large LLM act as an orchestrator and many smaller LLMs act as workers
- Parallelized tasks vs sequential tasks
- Memorizing key information
- Dealing with contexts
- Interacting with MCP servers
- Controlling costs
- Evaluating accuracy of agentic pipelines
Multi-agent systems are clearly still in their infancy, and everyone is learning on the go. It's a very interesting topic that will require strong system design skills.
An additional take: RAG pipelines are going to be replaced with multi-agent search because it's more flexible and more accurate.
Do you agree with that?
Are you deploying your own LLMs in production?
We have created an article that summarizes all cutting-edge inference optimization techniques you can implement in 2025:
https://t.co/TjBWjvNceU
Please do not hesitate to reach out if you have questions about LLM deployment!
There has been a backlash against Cursor over the last couple of days.
It seems that the Cursor support system is 100% based on AI, and it clearly gave very bad answers to users who could not log into Cursor because of a bug, leading to many customers cancelling their subscription.
As an AI developer myself, I think I understand the limitations of generative AI pretty well, and I find it shocking that some companies implement support chatbots to give such definitive answers without a human in the loop.
This is even more shocking when done by a team of people - the Cursor team - who know AI very well...
Unfortunately this is tempting to cut support costs like this, but this is a big mistake in my opinion.
AI is not mature enough for that.
Google TPUs have been around for quite some time now, and I've rarely seen any company seriously use them in production...
At NLP Cloud we used TPUs at some point behind our training and fine-tuning platform. But they were tricky to set up and not necessarily faster than NVIDIA GPUs.
We also worked on a POC for TPU-based inference, but it was a failure because GCP lacked many must-have features on their TPU platform: no fixed IP address, no serious observability tools, slow TPU instance provisioning process, XLA being sometimes hard to debug...
Researchers may be interested in TPUs but is it because of TPUs themselves or because of the generous Google TRC program that gives access to a bunch of free TPUs?
Also, the fact that Google TPUs cannot be purchased but only rented through the GCP platform might scare many organizations trying to avoid vendor lock-in.
Maybe this new generation of TPUs is different and GCP has matured the TPU ecosystem on GCP?
If some of you have experience using TPUs in production, I'd love to hear your story 🙂
Are you building an AI stack and working on inference optimization?
We just made an article about gen AI inference engines where we compare TensorRT-LLM by @NVIDIAAI, vLLM by @UCBerkeley vs Hugging Face TGI by @huggingface vs LMDeploy:
https://t.co/78EcLwVM4U
Hope it's useful!
NLP Cloud is proud to be named in the 2024 Gartner® Innovation Guide For Generative AI Technologies!
Thank you to all our clients for using the NLP Cloud API in their generative AI applications!
We have considerably improved our sentiment analysis API.
From now on you can use advanced models like LLaMA 405B for sentiment and emotion analysis, and specify which target the analysis should apply to (a person, a company, a concept...)!
Try it here: https://t.co/wLEVHyz1dr
Looking to deploy LLaMA 3.1 405B?
Here is a tutorial we made about deploying a quantized version of LLaMA 3.1 405B on @googlecloud compute engine, using @vllm_project :
https://t.co/28ehf07xZf
And of course you can also use our LLaMA 3.1 405B API too!
LLaMA 3.1 405B is now available on NLP Cloud!
We now propose a production-ready version of this cutting-edge powerful AI model.
The cost is the same as Fine-tuned LLaMA 3 70B: $0.0018 per 1K tokens!
You can easily try it on our Playground here: https://t.co/JzM1Re8s4Z
#AI#LLM
How to build a social media sentiment analysis pipeline?
We show in this article how to achieve that on Reddit with #Go, #FastAPI, KWatch, and NLP Cloud: https://t.co/xftvmVPJNI
If you have questions please don't hesitate to ask!
#AI#socialnetwork#DigitalMarketing#Reddit