We’re proud to welcome @GeneralistAI to the @NorwestVP portfolio! The company is on a mission to make general-purpose robots a reality and is led by a world-class team in @peteflorence and @andyzengineer and Andrew Barry
https://t.co/onmjEtJrei
We Parse PDFs
We spent 7 figures to put this on billboards throughout SF.
I thought long and hard about putting something more creative and whimsical. But then you wouldn’t know what we do.
AI agents (and humans) are consuming exponentially more documents as they do real work. They need the best quality document parser to not output garbage on downstream tasks.
This is what we do today as a company. If you have any PDFs (or other documents), we parse them :)
If you’re around SF in June for one of the following events, come stop by our booths:
✅ Snowflake Summit (this week, Booth 1123)
✅ Databricks Data+AI Summit (June 15-18, Booth 137)
✅ AI Engineer World Fair(June 29-July 2, Booth L-G47)
You can find us by the same sign we put on our billboards!
We Parse PDFs
@llama_index
Invest and sign a definitive agreement in the same quarter? Well, this is a new one!
Congrats to @pratyus and the @natomalabs team on entering into a definitive agreement with @Snowflake.
The demand for agentic AI infrastructure is moving fast.
https://t.co/2cvaqcl0kx
If you're stopping by the SF Caltrain station over Memorial Day weekend, you might catch a glimpse of our digital ads 📺
We parse
(PDFs)
(50+ other document types)
Turso now includes unlimited active databases in every plan. We already had unlimited databases, but we would charge you based on how many of them were active. That is now gone. You want a database, you get a database.
This is why we released liteparse :)
Free, open-source, designed for agents.
Natively supports OCR / screenshotting for deeper visual understanding in a document when needed.
We’re open sourcing the first document OCR benchmark for the agentic era, ParseBench.
Document parsing is the foundation of every AI agent that works with real-world files. ParseBench is a benchmark that measures parsing quality specifically for agent knowledge work:
✅ It optimizes for semantic correctness (instead of exact similarity)
✅ It has the most comprehensive distribution of real-world enterprise documents
It contains ~2,000 human-verified enterprise document pages with 167,000+ test rules across five dimensions that matter most: tables, charts, content faithfulness, semantic formatting, and visual grounding.
We benchmarked 14 known document parsers on ParseBench, from frontier/OSS VLMs to specialized parsers to LlamaParse. Here are some of our findings:
💡 Increasing compute budget yields diminishing returns - Gemini/gpt-5-mini/haiku gain 3-5 points from minimal to high thinking, at 4x the cost.
💡 Charts are the most polarizing dimension for evaluation. Most specialized parsers score below 6%, while some VLM-based parsers do a bit better.
💡 VLMs are great at visual understanding but terrible at layout extraction. GPT-5-mini/haiku score below 10% on our visual grounding task, all specialized parsers do much better.
💡 No method crushes all 5 dimensions at once, but LlamaParse achieves the highest overall score at 84.9%, and is the leader in 4 out of the 5 dimensions.
This is by far the deepest technical work that we’ve published as a company. I would encourage you to start with our blog and explore our links to Hugging Face to GitHub. All the details are in our full 35-page (!!) ArXiv whitepaper.
🌐: Blog: https://t.co/57OHkx0pQW
📄 Paper: https://t.co/Ho2oH2xEAM
💻 Code: https://t.co/6P7UxqOZYA
📊 Dataset: https://t.co/YguIXWm41j
🎥 YouTube: https://t.co/6Fh1Nsk9ei
We're excited to collaborate with @googledevs on building an agentic workflow over complex financial documents - using LlamaParse and Gemini 3.1 Pro
Brokerage statements have complex layouts, dense tables, and oftentimes visual elements like charts. Our multi-step agentic workflow does the following:
1. Ingest PDF into LlamaParse
2. Extract text and tables
3. Generate human-readable summary using Gemini
Shoutout to @Vish_ow and @itsclelia 🙌
Check it out: https://t.co/6dd7mKNkyk
We’re proud to share that @OuroMeds , a Norwest portfolio company, has signed a definitive agreement to be acquired by @GileadSciences.
When we co-led Ouro Medicine’s Series A in 2024, we deeply believed in its mission to fundamentally change how chronic immune-mediated diseases are treated.
Congratulations on this significant milestone, and we look forward to supporting the company in its next chapter with Gilead.
Read more: https://t.co/RidcJA95y6
The DOJ messed up some redactions on the latest Epstein files 🗄️🔏 - they didn’t flatten the PDF layers and you can highlight/copy the underlying text.
If you want to extract this text at scale, you *can’t* just feed everything to a VLM (gpt-5.2, sonnet-4.5, gemini 3). VLMs only look at the top-level visual layer of the page, and will output the redacted blocks.
You need to also reconstruct the text from the PDF binary itself, which is more in line with “traditional” techniques.
LlamaParse uses a combination of both VLMs along with reading the underlying binary.
* If you try out our agentic mode by default, it will output the redacted blocks in the markdown `md` field, but extract out the full text in the `text` field
* With a simple prompt change you can also extract out the full text in `md`. Prompt: "Do not output redactions if the underlying extracted text already exists - output the full extracted text instead"
Whether you want to comb through any set of released government documents or any other file, come check out LlamaParse!
Source reddit thread: https://t.co/Vq5P3UkgMp
File: https://t.co/8fsuBIjYMu
To use LlamaParse, sign up to LlamaCloud: https://t.co/XYZmx5TFz8
Today, @Pinterest announced that it has reached a definitive agreement to acquire @NorwestVP portfolio company @tv_Scientific.
Proud to have led the Series A because we believed CTV would become a true performance channel. Jason and David proved that. https://t.co/79h48lGmKm
Huge Congratulations to @thakurtarun and @vezainc as @ServiceNow announces intent to acquire. Deeply thankful for allowing @NorwestVP to be a part of the journey from very early on.
https://t.co/B9CQoS2hjC
Claude Code over Excel++ 🤖📊
Claude already 'works' over Excel, but in a naive manner - it writes raw python/openpyxl to analyze an Excel sheet cell-by-cell and generally lacks a semantic understanding of the content. Basically the coding abstractions used are too low-level to have the coding agent accurately do more sophisticated analysis.
Our new LlamaSheets API lets you automatically segment structure complex Excel sheets into well-formatted 2D tables. This both gives Claude Code immediate semantic awareness of the sheet, and allows it to run Pandas/SQL over well-structured dataframes.
We've written a guide showing you how specifically to use LlamaSheets with coding agents!
Guide: https://t.co/Hxng8t53Bo
Sign up to LlamaCloud: https://t.co/XYZmx5TFz8
You might’ve known us as a “RAG framework” company - but we’ve been a best-in-class, agentic document OCR/workflow company for the past 1.5+ years! 📑🤖
We’re building the future of knowledge work over documents.
Our website is awesome - check it out if you haven’t already 👇
https://t.co/YiIfjVlzb6
Turso is an incredible technical feat. A Rust rewrite of sqlite, with an async-first architecture, incoming support for concurrent writes, vector search, and browser / wasm support out of the box.
I think this has a very good chance of being a foundational piece of infrastructure of the vibe-coding age. On-demand, sqlite-compatible global databases that can also run in-browser and on-device.
The pace at which the project is evolving is most definitely *not normal*. @penberg and @glcst are built different.
Demo: https://t.co/CDjYwGZMNo
TURSO LAUNCH PARTY IS OCTOBER 8 🎉
We're hosting a Launch Party in San Francisco on October 8 to celebrate the Turso Beta Launch.
Join us in-person! RSVP at https://t.co/fhOKvfxVUP
More details to follow.
I’ve excited to announce a brand-new website and documentation hub 💫 that solidifies our evolution towards automating knowledge work over your documents.
You might’ve followed us since the “RAG framework” days. Even then, the biggest challenge users faced was figuring out how to actually ingest an entire collection of unstructured docs (.pdf, .pptx, .docx, and more) for chatbot/agentic workflow use cases. Over the past year we’ve progressively built up incredibly deep tech around document parsing, extraction, and indexing - while teaching developers how to build various workflows on top.
We’re now going all in on documents, and we’re the only company that has both 1) SOTA document processing and file management 📈, and 2) agentic orchestration on top to solve use cases like deep research, report generation, and document workflows end-to-end.
Our llamas will continue to love all sorts of data (we have 600+ integrations on the open-source framework!), but they now especially love automating paperwork 🦙📄. If you would also love to automate paperwork, come check out our new website and come talk to us!
Site: https://t.co/XCA5y7Rc9C
Developer Hub: https://t.co/LfNh0LlwXU