jn @Challenging666 - Twitter Profile

4 days ago

@HuggingPapers I was concerned that loop-based models might reduce inference efficiency and that simply reducing parameters would offer limited gains. PLT seems to address this concern. If a model can be trained with 1× resources while gaining loop× inference benefits, it may more excellent.

0

101

Challenging666 retweeted

DailyPapers

@HuggingPapers

4 days ago

LoopCoder-v2 is out A 7B model trained on 18T tokens that scores 64.4 on SWE-bench Verified with just two loops, beating models 30x larger. Adding a third loop makes it worse. Model and code are on Hugging Face.

HuggingPapers's tweet photo. LoopCoder-v2 is out

A 7B model trained on 18T tokens that scores 64.4 on SWE-bench Verified with just two loops, beating models 30x larger.

Adding a third loop makes it worse.

Model and code are on Hugging Face. https://t.co/nyHlt7suMB

4

95

18

68

9K

Challenging666 retweeted

Ge Zhang @GeZhang86038849

over 1 year ago

[1/n] 🎉We are very pleased to introduce FineFineWeb, which is currently the largest open-source fully automatic classification practice for fine-grained web data. Specifically, our contributions are as follows: 🔪We decompose the entire deduplicated version of Fineweb into 67 categories with a significant amount of seed data. 🧮We conduct a correlation analysis between vertical categories as well as between vertical categories and common Benchmarks for FineFineWeb, and also provided the distribution analysis of URLs and other content. 🧑‍⚖️We provide test sets for PPL evaluation based on the 67 selected vertical domains of FineFineWeb, and offer a "small cup" (Validation) and a "medium cup" (Test). 🪙We provide all the full-process materials for training fasttext and bert. 📅We will give suggestions on data proportioning based on our dataset. (Based on RegMix, Coming Soon in our Report! [Due to tight computing power, it will be as soon as possible])

GeZhang86038849's tweet photo. [1/n]
🎉We are very pleased to introduce FineFineWeb, which is currently the largest open-source fully automatic classification practice for fine-grained web data. Specifically, our contributions are as follows:
🔪We decompose the entire deduplicated version of Fineweb into 67 categories with a significant amount of seed data.
🧮We conduct a correlation analysis between vertical categories as well as between vertical categories and common Benchmarks for FineFineWeb, and also provided the distribution analysis of URLs and other content.
🧑‍⚖️We provide test sets for PPL evaluation based on the 67 selected vertical domains of FineFineWeb, and offer a "small cup" (Validation) and a "medium cup" (Test).
🪙We provide all the full-process materials for training fasttext and bert.
📅We will give suggestions on data proportioning based on our dataset. (Based on RegMix, Coming Soon in our Report! [Due to tight computing power, it will be as soon as possible])

7

160

44

77

24K

Challenging666 retweeted

Ge Zhang @GeZhang86038849

over 1 year ago

[1/n] 🔥 Happy to Introduce FullStack Bench: A comprehensive evaluation dataset, focusing on full-stack programming across 16 languages and more than 11 real-world application domains like data analysis, software engineering, and machine learning. Whether or not your CodeLLM is a FullStack Coder instead of an leetcode nerd? It's time to put your code LLMs to the test!!! 📝

GeZhang86038849's tweet photo. [1/n]
🔥 Happy to Introduce FullStack Bench: A comprehensive evaluation dataset, focusing on full-stack programming across 16 languages and more than 11 real-world application domains like data analysis, software engineering, and machine learning.

Whether or not your CodeLLM is a FullStack Coder instead of an leetcode nerd?

It's time to put your code LLMs to the test!!! 📝

11

135

33

63

47K

Challenging666 retweeted

Qwen

@Alibaba_Qwen

over 1 year ago

🚀Now it is the time, Nov. 11 10:24! The perfect time for our best coder model ever! Qwen2.5-Coder-32B-Instruct! Wait wait... it's more than a big coder! It is a family of coder models! Besides the 32B coder, we have coders of 0.5B / 1.5B / 3B / 7B / 14B! As usual, we not only share base and instruct models, we also provide quantized models in the format of GPTQ, AWQ, as well as the popular GGUF! 💖 👉🏻Blog: https://t.co/7FnV3SUHuD 👉🏻Tech Report: https://t.co/Y3JN2Ly7H6 👉🏻Hugging Face: https://t.co/GgfeNq0XML 👉🏻ModelScope: https://t.co/VJwMAvEaHN 👉🏻Kaggle: https://t.co/7GW9GZJYre 👉🏻GitHub: https://t.co/gMGC8b5Hwv 👉🏻Demo [chat]: https://t.co/JxAYwnLM9u 👉🏻 Demo [Artifacts]: https://t.co/cyJEHV30e1 The flagship model, Qwen2.5-Coder-32B-Instruct, reaches top-tier performance, highly competitive (or even surpassing) proprietary models like GPT-4o, in a series of benchmark evaluation, including HumanEval, MBPP, LiveCodeBench, BigCodeBench, McEval, Aider, etc. It reaches 92.7 in HumanEval, 90.2 in MBPP, 31.4 in LiveCodeBench, 73.7 in Aider, 85.1 in Spider, and 68.9 in CodeArena!

Alibaba_Qwen's tweet photo. 🚀Now it is the time, Nov. 11 10:24! The perfect time for our best coder model ever! Qwen2.5-Coder-32B-Instruct!

Wait wait... it's more than a big coder! It is a family of coder models! Besides the 32B coder, we have coders of 0.5B / 1.5B / 3B / 7B / 14B! As usual, we not only share base and instruct models, we also provide quantized models in the format of GPTQ, AWQ, as well as the popular GGUF! 💖

👉🏻Blog: https://t.co/7FnV3SUHuD

👉🏻Tech Report: https://t.co/Y3JN2Ly7H6

👉🏻Hugging Face: https://t.co/GgfeNq0XML

👉🏻ModelScope: https://t.co/VJwMAvEaHN

👉🏻Kaggle: https://t.co/7GW9GZJYre

👉🏻GitHub: https://t.co/gMGC8b5Hwv

👉🏻Demo [chat]: https://t.co/JxAYwnLM9u

👉🏻 Demo [Artifacts]: https://t.co/cyJEHV30e1

The flagship model, Qwen2.5-Coder-32B-Instruct, reaches top-tier performance, highly competitive (or even surpassing) proprietary models like GPT-4o, in a series of benchmark evaluation, including HumanEval, MBPP, LiveCodeBench, BigCodeBench, McEval, Aider, etc. It reaches 92.7 in HumanEval, 90.2 in MBPP, 31.4 in LiveCodeBench, 73.7 in Aider, 85.1 in Spider, and 68.9 in CodeArena!

73

2K

410

761

662K

Challenging666 retweeted

ollama

@ollama

over 1 year ago

ollama run opencoder OpenCoder is available in 1.5B and 8B models.

13

847

97

386

63K

Challenging666 retweeted

Qian Liu

@sivil_taram

over 1 year ago

🔍Why could a coding model trained on just 2.5T tokens compete with top-tier models like DeepSeekCoder (10T tokens) and QwenCoder (15T tokens)? 🌟 Curious about the answer? Check out our paper, OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models (🏠 https://t.co/Hh5otarsvx, 📑 https://t.co/mkEr6kBkjk), a new code language model with top-tier code generation performance and fully openness! In this paper, we reveal the full details of our data cleaning, processing, and synthesis pipeline — insights that top labs often keep under wraps for code pre-training! Here’s what we offer: ✨ 1.5B & 8B code models supporting both English and Chinese 📚 Code to reproduce the 2.5T tokens of training data (coming soon!) 🛠️ 4.5M+ high-quality SFT examples This work was lead by awesome @SimingHUAN38187 , @crazycth0901 and @ziliwang8011184 . And please find more details in this thread! 🧵

sivil_taram's tweet photo. 🔍Why could a coding model trained on just 2.5T tokens compete with top-tier models like DeepSeekCoder (10T tokens) and QwenCoder (15T tokens)?

🌟 Curious about the answer? Check out our paper, OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models (🏠 https://t.co/Hh5otarsvx, 📑 https://t.co/mkEr6kBkjk), a new code language model with top-tier code generation performance and fully openness!

In this paper, we reveal the full details of our data cleaning, processing, and synthesis pipeline — insights that top labs often keep under wraps for code pre-training! Here’s what we offer:

✨ 1.5B & 8B code models supporting both English and Chinese
📚 Code to reproduce the 2.5T tokens of training data (coming soon!)
🛠️ 4.5M+ high-quality SFT examples

This work was lead by awesome @SimingHUAN38187 , @crazycth0901 and @ziliwang8011184 . And please find more details in this thread! 🧵

10

541

95

528

94K

Challenging666 retweeted

Ge Zhang @GeZhang86038849

over 1 year ago

[1/n] ### Discover AutoKaggle: Revolutionizing Data Science Competitions with Multi-Agent Collaboration! 🚀 Introducing AutoKaggle — a multi-agent framework designed to automate the full spectrum of data science competitions on Kaggle! From background understanding to model prediction, AutoKaggle takes on all phases, boosting efficiency and reducing manual overhead. 💡 Highlights of AutoKaggle: 🛠️ Phase-based workflow: Six key phases (Understanding, EDA, Cleaning, Feature Engineering, Model Building). 🤖 Five specialized agents: Reader, Planner, Developer, Reviewer, Summarizer. 🔁 Iterative debugging & unit testing for robust, correct code generation. 📊 Built-in ML tools library to handle data cleaning, feature engineering, and modeling. 🤤 Flexible Customize Support on ML Tool Library allows you to drive the workflow as you want.

GeZhang86038849's tweet photo. [1/n] ### Discover AutoKaggle: Revolutionizing Data Science Competitions with Multi-Agent Collaboration! 🚀

Introducing AutoKaggle — a multi-agent framework designed to automate the full spectrum of data science competitions on Kaggle! From background understanding to model prediction, AutoKaggle takes on all phases, boosting efficiency and reducing manual overhead.

💡 Highlights of AutoKaggle:
🛠️ Phase-based workflow: Six key phases (Understanding, EDA, Cleaning, Feature Engineering, Model Building).
🤖 Five specialized agents: Reader, Planner, Developer, Reviewer, Summarizer.
🔁 Iterative debugging & unit testing for robust, correct code generation.
📊 Built-in ML tools library to handle data cleaning, feature engineering, and modeling.
🤤 Flexible Customize Support on ML Tool Library allows you to drive the workflow as you want.

7

151

36

76

15K

Challenging666 retweeted

Rohan Paul

@rohanpaul_ai

over 1 year ago

Qwen Code Interpreter, with Qwen Code 2.5 & WebLLM Running locally on your browser A cool @huggingface space showcasing the power or opensource model and WebLLM ----- WebLLM is a high-performance, in-browser language model inference engine that leverages WebGPU for hardware acceleration, enabling powerful LLM operations directly within web browsers without server-side processing.

rohanpaul_ai's tweet photo. Qwen Code Interpreter, with Qwen Code 2.5 & WebLLM

Running locally on your browser

A cool @huggingface space showcasing the power or opensource model and WebLLM

-----

WebLLM is a high-performance, in-browser language model inference engine that leverages WebGPU for hardware acceleration, enabling powerful LLM operations directly within web browsers without server-side processing.

2

19

3

13

2K

jn @Challenging666

almost 2 years ago

[6/6] 🔍Deep Insights📚 FuzzCoder's fine-tuning on the Fuzz-Instruct dataset, collected from heuristic fuzzing tools, provides deep insights into the mutation process, leading to more targeted fuzzing. GitHub: https://t.co/q5BOrQsmuj

Challenging666's tweet photo. [6/6] 🔍Deep Insights📚 FuzzCoder's fine-tuning on the Fuzz-Instruct dataset, collected from heuristic fuzzing tools, provides deep insights into the mutation process, leading to more targeted fuzzing.

GitHub: https://t.co/q5BOrQsmuj https://t.co/zWcFkNQMjL

0

1

0

43

jn @Challenging666

almost 2 years ago

💥Introducing FuzzCoder: Revolutionizing Byte-level Fuzzing with Large Language Models!🌟 Experience the future of software security with our groundbreaking approach. 🔗 Learn more: https://t.co/9eaXDmdbaU #Fuzzing #Cybersecurity #AI #LLM #LargeLanguageModels #SoftwareSecurity

Challenging666's tweet photo. 💥Introducing FuzzCoder: Revolutionizing Byte-level Fuzzing with Large Language Models!🌟 Experience the future of software security with our groundbreaking approach.
🔗 Learn more: https://t.co/9eaXDmdbaU
#Fuzzing #Cybersecurity #AI #LLM #LargeLanguageModels #SoftwareSecurity https://t.co/Guq2o19Vb2

1

7

0

1

749

jn @Challenging666

almost 2 years ago

[5/6] 📊Performance📊 FuzzCoder demonstrates remarkable improvements across various input formats. Our extensive experiments show that FuzzCoder, integrated with AFL, outperforms traditional methods in effective mutation proportion and crash discovery rates.

Challenging666's tweet photo. [5/6] 📊Performance📊 FuzzCoder demonstrates remarkable improvements across various input formats. Our extensive experiments show that FuzzCoder, integrated with AFL, outperforms traditional methods in effective mutation proportion and crash discovery rates. https://t.co/tAAFiKB61i

1

0

39

Challenging666 retweeted

Yujia Qin @TsingYoga

almost 2 years ago

视觉-语言模型(VLM)领域在研究些什么？🧐 VLM是一个从去年末开始快速发展的领域，对研究者来说尚有大量“金矿”未被发掘，且当前探索仍然非常初步，对大模型的初学者上手难度较小🥰 以下是帮你快速掌握VLM领域目前发展的文章推荐📰： 1. 从宏观视角整体了解整个领域有哪些具体的探索方向（例如数据配比、Image Encoder选择、VL connector的设计、当前有哪些benchmark、VLM的训练策略等） a. Cambrian: A Fully Open, Vision-Centric Exploration of Multimodal LLMs 最全最新没有之一的全方位探索 Link: https://t.co/fqS9zVB5AS b. MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training 比较老但仍然推荐一读的文章 Link: https://t.co/Q7HSytHhwB c. What matters when building vision-language models? 结论相比前两篇有很好的补充 Link: https://t.co/xOTVQj8PZ6 2. VLM特有的提升推理效率方案：设计更优的V-L Attention机制 a. An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models 发现vision token存在大量的冗余，可以通过token dropping来大幅提升推理速度而不伤害效果 Link: https://t.co/4hvsgy0nr7 b. VoCo-LLaMA: Towards Vision Compression with Large Language Models 通过类似RMT的token压缩方式减少vision token数量从而提升推理速度 Link: https://t.co/8227vL5Sd0 3. vision encoder的分辨率对模型性能的影响，结论简单粗��：影响很大，分辨率越大效果越好 a. InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD Link: https://t.co/hcUik3aVlQ b. DeepSeek-VL: Towards Real-World Vision-Language Understanding Link: https://t.co/OBJRnRBlK1 4. VLM模型架构选择：All-in-one Decoder （early-fusion）还是Vision Encoder和Language Decoder分离？ a. Unveiling Encoder-Free Vision-Language Models https://t.co/puhPIluZEB b. Chameleon: Mixed-Modal Early-Fusion Foundation Models https://t.co/p5iWwWHbMA 5. 对于较为主流的VLM分离架构，Vision-Language Connector如何设计？ a. TokenPacker: Efficient Visual Projector for Multimodal LLM https://t.co/s3N1ntrajw 6. VLM分离架构的最佳训练方式 a. Long Context Transfer from Language to Vision https://t.co/eoKGiw9ufm 7. LLaVA系列的所有文章+博客 Improved Baselines with Visual Instruction Tuning https://t.co/8pLH4RNxAZ https://t.co/amvIg2TynU https://t.co/cfVN0Pf6at https://t.co/CKiCyN2d0G https://t.co/By6hrZyNyU https://t.co/PMX6iqmwHt 8. 一些快速提升你VLM码力的实战仓库推荐（见图）（列得不够全希望大家在评论区继续补充）

TsingYoga's tweet photo. 视觉-语言模型(VLM)领域在研究些什么？🧐

VLM是一个从去年末开始快速发展的领域，对研究者来说尚有大量“金矿”未被发掘，且当前探索仍然非常初步，对大模型的初学者上手难度较小🥰

以下是帮你快速掌握VLM领域目前发展的文章推荐📰：

1. 从宏观视角整体了解整个领域有哪些具体的探索方向（例如数据配比、Image Encoder选择、VL connector的设计、当前有哪些benchmark、VLM的训练策略等）

a. Cambrian: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
最全最新没有之一的全方位探索
Link: https://t.co/fqS9zVB5AS

b. MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
比较老但仍然推荐一读的文章
Link: https://t.co/Q7HSytHhwB

c. What matters when building vision-language models?
结论相比前两篇有很好的补充
Link: https://t.co/xOTVQj8PZ6

2. VLM特有的提升推理效率方案：设计更优的V-L Attention机制

a. An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
发现vision token存在大量的冗余，可以通过token dropping来大幅提升推理速度而不伤害效果
Link: https://t.co/4hvsgy0nr7

b. VoCo-LLaMA: Towards Vision Compression with Large Language Models
通过类似RMT的token压缩方式减少vision token数量从而提升推理速度
Link: https://t.co/8227vL5Sd0

3. vision encoder的分辨率对模型性能的影响，结论简单粗��：影响很大，分辨率越大效果越好

a. InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Link: https://t.co/hcUik3aVlQ

b. DeepSeek-VL: Towards Real-World Vision-Language Understanding
Link: https://t.co/OBJRnRBlK1

4. VLM模型架构选择：All-in-one Decoder （early-fusion）还是Vision Encoder和Language Decoder分离？

a. Unveiling Encoder-Free Vision-Language Models
https://t.co/puhPIluZEB

b. Chameleon: Mixed-Modal Early-Fusion Foundation Models
https://t.co/p5iWwWHbMA

5. 对于较为主流的VLM分离架构，Vision-Language Connector如何设计？

a. TokenPacker: Efficient Visual Projector for Multimodal LLM
https://t.co/s3N1ntrajw

6. VLM分离架构的最佳训练方式

a. Long Context Transfer from Language to Vision
https://t.co/eoKGiw9ufm

7. LLaVA系列的所有文章+博客

Improved Baselines with Visual Instruction Tuning
https://t.co/8pLH4RNxAZ
https://t.co/amvIg2TynU
https://t.co/cfVN0Pf6at
https://t.co/CKiCyN2d0G
https://t.co/By6hrZyNyU
https://t.co/PMX6iqmwHt

8. 一些快速提升你VLM码力的实战仓库推荐（见图）

（列得不够全希望大家在评论区继续补充）

12

312

69

321

41K

jn @Challenging666

about 2 years ago

[10/10] 🤗 McEval Resources: Homepage: https://t.co/KjVwoxoX6X arXiv: https://t.co/09U2dQCPEV Code: https://t.co/yHwRToHrEz Leaderboard: https://t.co/puo8sgQbdc Evaluation Data: https://t.co/OVQqzDyJsb Instruction Data: https://t.co/dsTOGoKU8L HF Paper: https://t.co/Q825C2ff3R

0

31

jn @Challenging666

about 2 years ago

🚀 Thrilled to introduce 🔥McEval🔥, the first massively multilingual code evaluation benchmark of 40 programming languages with 16K test samples, including code generation, completion, and explanation tasks. McEval: Massively Multilingual Code Evaluation https://t.co/KjVwoxoX6X

Challenging666's tweet photo. 🚀 Thrilled to introduce 🔥McEval🔥, the first massively multilingual code evaluation benchmark of 40 programming languages with 16K test samples, including code generation, completion, and explanation tasks.

McEval: Massively Multilingual Code Evaluation
https://t.co/KjVwoxoX6X https://t.co/EeMUa6drdb

1

7

0

1

2K

jn @Challenging666

about 2 years ago

[9/10] Based on algorithmic complexity, we classify McEval into three levels (Easy/Medium/Hard). The performance of the CodeQwen model on code generation tasks shows that for most languages, the model can answer most easy questions but struggles with medium and hard ones.

Challenging666's tweet photo. [9/10]
Based on algorithmic complexity, we classify McEval into three levels (Easy/Medium/Hard).

The performance of the CodeQwen model on code generation tasks shows that for most languages, the model can answer most easy questions but struggles with medium and hard ones. https://t.co/IbDKeS7O2Q

1

0

57

jn

@Challenging666

Last Seen Users on Sotwe

Trends for you

Most Popular Users