@aidenybai Yes, the useState and Effect everywhere is so extremely annoying. And they appear even If you directly tell it not to do it. I can only use the cursor for some super small features
@kshaughnessy2 How is this even possible? =) I mean how did they scale and were able to keep any kind of quality in the projects, you would have to be able to hire incredibly fast
@AveaiGlobal I have made a product for real-time Twitter scraping and live notifications for any given topic including the meme coins - https://t.co/B6yAUeRUTM . Please check it out, it has 2 weeks free trial =)
Today Iโm launching my first solo-developed app, https://t.co/6uwKyoMYxB, which extracts signals from X. Please check out the launch on Product Hunt!
https://t.co/EIxs6tEnyu by @poloniki
Product is designed to give you a highly customizable experience with X, enabling you to:
- Extract signals from X
- Research and discover top influencers
- Filter content by sentiment, industry, or content type
- Highly customize your feed
If you like the app or just want to support it, Iโd greatly appreciate your upvote! ๐
I still donโt see a significant improvement in AI coding capabilities even with the new Claude 3.7 Sonnet. I provided it with a straightforward problem: comparing two vectors from the same APIโone directly from the API and the other retrieved from the database. The first vector had a length of 100, while the second was 10,000. Claude tried numerous solutions and finally suggested truncating the second vector... 5 out of 5 people i asked instantly suggested that database result was a stringified JSON array.
Not a hater, tool is amazing. It's just not yet at the "replacing engineers" stage, which I see a lot in discussions.
#Claude #CLAUDE37
People have too inflated sense of what it means to "ask an AI" about something. The AI are language models trained basically by imitation on data from human labelers. Instead of the mysticism of "asking an AI", think of it more as "asking the average data labeler" on the internet.
Few caveats apply because e.g. in many domains (e.g. code, math, creative writing) the companies hire skilled data labelers (so think of it as asking them instead), and this is not 100% true when reinforcement learning is involved, though I have an earlier rant on how RLHF is just barely RL, and "actual RL" is still too early and/or constrained to domains that offer easy reward functions (math etc.).
But roughly speaking (and today), you're not asking some magical AI. You're asking a human data labeler. Whose average essence was lossily distilled into statistical token tumblers that are LLMs. This can still be super useful ofc ourse. Post triggered by someone suggesting we ask an AI how to run the government etc. TLDR you're not asking an AI, you're asking some mashup spirit of its average data labeler.
Thanks! =) Glad to help!
1) Best approach would be to find ready topic/category tree, I think there should be a few available/ or you can generate them through llm, it would take only few promts to get baseline. We only need to do it once, so it would be almost free.
2) Costs are per token, so it would depend on the page size, but I believe you can embed even more - landing pages would have much less content than a blog post for example. But truth is we can achieve similar performance for absolutely free (I can send an example tomorrow). So If you would need to spend more than 10$ you can use publicly available free embedding solutions - here is one of the lists - https://t.co/eHVbaGHr55 . There are many more. But for the start I would suggest just using OpenAi.
3) Diversity of topics would not be a problem, it would it make rather easier to distinguish.
In order to get best performance it would be advised to embed pages chunk by chunk (embedding are generally trained on a smaller texts) and then get the most occurrent topic. The beauty is that when we scrape web page we can target <p> tags and we would already get chunked text in our hands with no processing needed. Another nice part is that we can embed texts chunk by chunk in big batches (if you doing it locally for free - would limited by pc memory), but you can embed embed hundreds of thousands pages for free from your laptop.
I will try send you short example tomorrow of that, the whole thing should not take more than 50 lines with frontend in python. i have a template somewhere in my repos.
even without any chunking - it would work really well, but there will be some noise. There are ways to make it almost absolutely precise and even with less cost, but I would have to write an article on that, they are a bit more convoluted , so hard to to show in a tweet๐
Best side of this approach is that it is cheap/scalable can be improved worked on/fine tuned. If you have few hundred pages probably pure LLM will do the job. But then when the number inreases, bill can become too much =)
Model responses can be unstable. It takes long for processing (much longer than this approach) It can be expensive if you have big website database. This approach is much much cheaper, much much faster and more stable as well.
It is actually not that complex. Implementation in python takes maybe 15 lines. If you input my description in LLM it would probably give you solution very fast, you will have just to insert api keys.
And, of course, OpenAI vectors use not 3 features but 1536 (for the smallest version) and it is very precise. We don't know what each number represents as it was trained on different tasks and contents, so thy are much more abstract than "fluffy", "big" or "small".Also multilingual i believe, though performance drops with other languages.
Embedding in a very simplified terms is conversion of text into format that computer can understand - numbers. Each number represent some specific feature. For example I can describe a "cat" as
1) How big from 1 to 10 : 2
2) How load from 1 to 10 :2
3) How fluffy from 1 to 10 : 3
Etc..
So I can represent cat in a form of list of numbers [2,2,3]
Than lets say elephant - [10,8, 0]
Dog - [4,4,4].
Now when I have list of vectors. I can apply "cosine similarity" which again in simple terms is simply a measure of how close each vector (list of numbers/features) is to each other.
If we skip the math and get the result we will get, for example:
โข Cat and Dog: 0.98
โข Cat and Elephant: 0.68
Meaning even on that very small vector we can clearly see that cats are closer to dogs, than to elephants (higher score). in you case we would be comparing site content and topic names.
Now how do you get thoose vectors and compare them. They are hundreds of services and ways. But simplest is to use OpenAi for embedding: https://t.co/hdBdZWMqom
And Pinecone as database to store vectors and use the search which will calculate similarly for you https://t.co/eRhIsMeqHB (i would get criticized for this search, but for beginner its the fastest way, i believe). And here is example of full code implementation in python:
https://t.co/O886SXV0S5
So your task would be to
1) generate topic names and descriptions
2) embed them with openai
3) save vectors to pinecone
4) get website content
5) embed this content with openai (same)
6) use search closest in pinecone to get top topic
I hope this would be helpfull =)