The #img2table python library does a better job than anything else at extracting tables from PDFs and images. The only thing that comes close is a prerelease version of Azure Document Intelligence that is available in limited locations.
https://t.co/hDQ2uFCDFR
I thought I wanted to know whether a cloud service had a certain LLM, but apparently I really wanted a romantic manga series. At first, I thought this must be an AI generated image for a presentation, but no, it's real.
https://t.co/sSKFbPtuwh
Where will you be during the eclipses in 2023 and 2024?
This map shows the path of the Moon’s shadow as it crosses the contiguous U.S. during the eclipse on Oct. 14, 2023 and total solar eclipse on April 8, 2024. Will you get a chance to see them? https://t.co/mC7CagW0AR
@duckdb vs #pyarrow
>>> import duckdb
>>> duckdb.sql("select * from read_parquet('myfile.parquet') where mycolumn = 123")
vs
>>> import pyarrow.parquet as pq
>>> import pyarrow.compute as pc
>>> t = https://t.co/549dl5cZhy_table('myfile.parquet')
>>> t.loc[t['mycolumn'] == 123]
So I was at the Harvard museum, I noticed a potato battery set and it wasn't working. Started to fix it, but the security guard stopped me. Apparently it was just an art installment! No wonder why the multimeter was set on AC... oh, artists...
You see, ODBC/JDBC wasn't designed for modern day data use cases. For e.g. a client application like @tableau 📊 dealing with petabytes of data stored in a data lake/warehouse will hit that performance limit.
With STOIC's Reactive Caching Service, L0 is browser memory, L1 is browser local storage, and L2 is GPU memory. But what is L3? Obviously, it has to be CPU memory, but in what format? Arrow IPC of course! But what is L4 then? CPU memory again, but in Parquet format.