🧵 1/ New preprint drop: "Back to the Barn with LLAMAs: Evolving Pretrained LLM Backbones in Finetuning Vision Language Models" 🦙
If you swap a better LLM backbone into your VLM, do you get a better VLM?
Short answer: not really. Longer answer: it's more interesting than that.
4/ some VLM capabilities only emerge in the newest LLM generation. Other tasks dominated by visual understanding barely improve regardless of which LLAMA you plug in
Streamlining federal permitting with AI 📄🖥️⏩
PNNL researchers are using AI to bring valuable data distributed across hundreds of federal government agencies into a single dataset that's crucial for modernizing permitting technology for the 21st century.
CEQ coordinated with @ENERGY’s @PNNLab PermitAI project on the release of NEPATEC 2.0, a major accomplishment on the road to a simplified, speedier Federal permitting and environmental review process.
We are organizing AI4Permitting Workshop on April 29th at the NAEP 2025 conference in Charleston, South Carolina.
Learn how to speed up National Environmental Policy Act (NEPA) processes at our upcoming workshop at NAEP 2025.
https://t.co/sKsa0FIcr3
#AI4Permitting#NEPA
We also currently have a kaggle competition open, which is running LLM evaluations on a QuAD benchmark specific to understanding permitting documents - deadline is june 30!
https://t.co/T3aGWI7Q6j
This corpus is part of DOE's voltAIc initiative, which is using LLMs to accelerate permitting processes
This corpus was developed in partnership with @PNNLab - you can find more details about their work on this project here:
https://t.co/kpqPOQ8d1b
🚨if you care about fixing NEPA environmental permitting OR are looking for new high-quality domain text corpuses to build LLMs on top of...
DOE just released a 3.6B token corpus of federal permitting documents on huggingface!
This corpus includes...
The video of the Lytle Lecture I gave at University of Washington last week is available.
Title: "Objective Driven AI: Towards Machines that can Learn, Reason, and Plan"
Lytle Lecture Page: https://t.co/GEazvj4p1v
Slides: https://t.co/wbPmq4MgP8
Video: https://t.co/YBR6JRAjX8
📢📢 I am recruiting PhD students for our group at Auburn to work on open-world event understanding and active embodied vision! If you are interested in working with our group, reach out!
More info: https://t.co/97jrdXeqd8
Please RT!
SCITUNE: Aligning Large Language Models with Scientific Multimodal
Instructions
Abs: https://t.co/ykKhykM0YV
Pdf: https://t.co/vMHqghL4gr
Presenting SciTune, a tuning framework to improve large language models' (LLMs) ability to follow scientific multimodal instructions. SciTune includes two stages: scientific concept alignment to learn across various scientific visual signals and textual signals, and scientific instruction tuning to fine-tune on a multimodal scientific reasoning task. LLaMA-SciTune, surpasses human performance on the ScienceQA multimodal reasoning benchmark and performs significantly better than SoTA vision-language models in a variety of scientific image understanding tasks with zero-demonstrations during the inference time.
@ChunyuanLi Since you used LLaVA as the base (LLaMA -> Stage 1: Feature Alignment -> Stage 2: Instruction Tuning -> LLaVA), I was thinking whether performing LLaVA-Med (Stage 1) on top of LLaVA (Stage 1) can increase the performance.