Felipe Mello @fmello93 - Twitter Profile

15 days ago

@d1muncher69 @Im_IrushiK haiku 3.5 with low reasoning does it. It is not an intelligence issue. To have the model properly count characters, we would probably need tokens to be individual letters.

fmello93's tweet photo. @d1muncher69 @Im_IrushiK haiku 3.5 with low reasoning does it. It is not an intelligence issue. To have the model properly count characters, we would probably need tokens to be individual letters. https://t.co/wAIoVYD5Bh

1

0

30

Felipe Mello

@fmello93

16 days ago

@QGallouedec @SergioPaniego When the model is deployed, thinking will be stripped, but during training, past thinking is preserved. So the model is trained with a different distribution vs production. Do you know if anyone has analyzed if this matters?

1

0

1

281

Felipe Mello

@fmello93

7 months ago

@Acon43158243 @arcprize 😂, thats fair, second example is also like that

0

105

Felipe Mello

@fmello93

7 months ago

@arcprize on the 3rd example, why is the correct answer to fill the 2 spots at the top, and not to the bottom right?

3

0

420

Who to follow

Deep Learning Research @ https://t.co/YZ8uB4Cavn

7 months ago

@vikhyatk loss fn peak?

1

0

115

Felipe Mello

@fmello93

7 months ago

@maharshii Can you share the downsides? i.e. why isnt this the default?

1

2

0

548

Felipe Mello

@fmello93

7 months ago

@vikhyatk i was considering doing some experiments with it soon. I am surprised that you couldn't use it with torch, since they mention it in their readme

fmello93's tweet photo. @vikhyatk i was considering doing some experiments with it soon. I am surprised that you couldn't use it with torch, since they mention it in their readme https://t.co/KT6wKHN8XF

1

2

0

290

Felipe Mello

@fmello93

7 months ago

@drisspg I didn’t have a good experience with gpt-5 on cursor, but heard good things about codex from multiple people. I will have to give it a try

0

2

0

136

Felipe Mello

@fmello93

7 months ago

@redtachyon @ShashwatGoel7 We want to get this right in TorchForge (let you worry about your crazy ideas, not infra). It is still early days, so there's a lot of room on the design. Let us know if you have any ideas/would like to contribute.

0

4

0

48

Felipe Mello

@fmello93

8 months ago

@xidulu I see. If you need on the fly packing, I implemented it in tune but never merged: https://t.co/f7VQQY8vfv You can also find it here by someone else that did it: https://t.co/FE9UUrfajY

1

3

0

1

108

Felipe Mello

@fmello93

8 months ago

@redtachyon best of luck in the new journey!

0

1

0

2K

Felipe Mello

@fmello93

8 months ago

@_lewtun Glad you liked it! its WIP. I will make it a bit less crowded, give users some filtering options. Let us know if you have any feedback/ideas.

1

2

0

94

Felipe Mello

@fmello93

8 months ago

@difficultyang @testttt1236 Thanks for your service, sir

0

1

0

207

Felipe Mello

@fmello93

8 months ago

@fchollet Couldn’t the takeaway be that we can try to apply this curriculum to **any** thinking problem and have LLMs internalize thinking? Something like how Claude could achieve great results without a thinking mode.

0

1

3K

Felipe Mello

@fmello93

9 months ago

@giffmana Yeah, I feel unproductive and slower when I go pure vibe coding. Much better for me to take the lead and prompt the model every other step. I also find it helpful with planning, ie chugging thousands of lines of code and coming up with a proposal + pseudo code.

0

2

0

846

Felipe Mello

@fmello93

10 months ago

@giffmana My Meta/Google interviews were pretty standard. But there was one google interview that caught me by surprise. I was asked to write unit tests and share the hardest bug I had ever solved. I wouldnt say it was "cool", but it carried more signal than leetcode.

1

11

0

3

7K

Felipe Mello

@fmello93

10 months ago

@_xjdr Gemini cli performs waaay worse than cursor. Have been trying Claude code. I like it more than cursor, it gets further and for better price, but it’s still stupid. I think that cursors advantage is that I can query N models to revise each others plan, something I miss with Claude

0

103

Felipe Mello

@fmello93

10 months ago

@Daksh46559036 @coldhealing same idea: I think its a safe choice to do something technical, that allows you to explore multiple areas in that field, and gives you a way out if you find yourself happier at non-technical roles.

0

7

Felipe Mello

@fmello93

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users