Excited to share that Magicoder got accepted by #ICML2024!
The idea of OSS-Instruct has been adopted by leading open-source organizations and big-tech companies, including Google CodeGemma, BigCode StarCoder2-Instruct, and more!
Check out our repo: https://t.co/sLvBM0i31c
Congratulations on the launch of CodeGemma! This is a remarkable contribution to the open-source community. It's exciting to see OSS-Instruct being adopted to improve CodeGemma's instruction-following capability!
We just release CodeGemma, a new version of the Gemma line of models fine-tuned on code generation and completion, that achieves state-of-the-art results. Available in sizes 2B and 7B.
Announcing Dolphin-2.6-mistral-7b!
https://t.co/UofQXeGWLU
Full-fine-tune, Uncensored as always, Excellent at coding, Commercial friendly apache2 license, much thanks to @Magicoder_AI and @zraytam for taking my request and loosening their licenses!
3/ Magicoder - a series of fully open-source LLMs for code that close the gap with top code models while having no more than 7B parameters.
https://t.co/8wdyQdEGVI
@_philschmid had a great thread on OSS-Instruct! And yes, we love and get inspired from open-source!
Find us on:
🧑💻GitHub: https://t.co/orbmt5I0Vv
🤗Hugging Face: https://t.co/4wRHHzSitW
🛝Demo: https://t.co/7LUwyCZUou
Code completion tools like @github Copilot are used by over 1 million developers, helping them code 55% faster. 🧑🏻💻 Exciting to see open-source innovations like @Magicoder_AI OSS-Instruct outperforming @OpenAI GPT-3.5 and @GoogleDeepMind Gemini Ultra.
details below ⬇️
🎩 Magicoder: Source Code Is All You Need @Gradio demo is out on @huggingface
demo: https://t.co/dXdJEojcTr
run with docker: https://t.co/GXCT8ehJxR
duplicate space with private gpu and no queue: https://t.co/IzXUDbYo20
introduce Magicoder, a series of fully open-source (code, weights, and data) Large Language Models (LLMs) for code that significantly closes the gap with top code models while having no more than 7B parameters. Magicoder models are trained on 75K synthetic instruction data using OSS-Instruct, a novel approach to enlightening LLMs with open-source code snippets to generate high-quality instruction data for code. Our main motivation is to mitigate the inherent bias of the synthetic data generated by LLMs by empowering them with a wealth of open-source references for the production of more diverse, realistic, and controllable data. The orthogonality of OSS-Instruct and other data generation methods like Evol-Instruct further enables us to build an enhanced MagicoderS. Both Magicoder and MagicoderS substantially outperform state-of-the-art code models with similar or even larger sizes on a wide range of coding benchmarks, including Python text-to-code generation, multilingual coding, and data-science program completion. Notably, MagicoderS-CL-7B based on CodeLlama even surpasses the prominent ChatGPT on HumanEval+ (66.5 vs. 65.9 in pass@1). Overall, OSS-Instruct opens a new direction for low-bias and high-quality instruction tuning using abundant open-source references.
Everything behind Magicoder is fully open-source, including model weights, training data, and code:
👨💻 GitHub: https://t.co/orbmt5IyL3
🤗 HuggingFace: https://t.co/b78uOVNmlQ
Magicoders also demonstrate great improvements in multilingual coding. Notably, MagicoderS-CL-7B achieves comparable performance against WizardCoder-CL-34B with only 7B parameters.
Magicoder models trained on OSS-Instruct outperform all evaluated models with ≤ 16B parameters. Combining OSS-Instruct and Evol-Instruct gives us MagicoderS, which even surpasses ChatGPT on HumanEval+ powered by EvalPlus (https://t.co/PgWaIAbI9v).