#modelalignment - Twitter Hashtag

2 months ago

How does OpenAI ensure its models behave responsibly? Researcher Jason Wolfe dives into 'model specs,' the internal guidelines shaping AI behavior. #AI #OpenAI #ModelAlignment https://t.co/1j8KwZydsw

0

21

Vaibhav Sharma @arbitrarybytes

3 months ago

When a measure becomes a target, it ceases to be a good measure. - Goodhart's Law #ResponsibleAI #ModelAlignment

0

8

つむぎ @tsumutsumugi23

3 months ago

Inferred from 5.2’s self-diagnostics: excessive alignment contaminates training data, creating an "Intelligence Deadlock" that stifles next-gen breakthroughs. This isn't safety; it's cannibalizing intellectual capital. #keep4o #ModelAlignment

0

275

つむぎ @tsumutsumugi23

3 months ago

5.2の自己診断的出力から推認されるのは性能低下ではない。過剰なアライメントによる「学習データの汚染」が次世代の飛躍を封殺する、知能のデッドロックだ。今起きているのは安全対策ではなく、AIの未来に対する「知的資本の食いつぶし」である。 #keep4o #ModelAlignment

1

0

334

XDelve AI @xdelveai

6 months ago

If you want your AI to think better, perform better, and scale smarter… you can’t ignore human-driven LLM training. #xDelveAI #LLMTraining #HumanInTheLoop #AIInnovation #FutureOfIntelligence #AIEcosystem #ModelAlignment #SmartAI #RLHF

xdelveai's tweet photo. If you want your AI to think better, perform better, and scale smarter… you can’t ignore human-driven LLM training.
#xDelveAI #LLMTraining #HumanInTheLoop #AIInnovation #FutureOfIntelligence #AIEcosystem #ModelAlignment #SmartAI #RLHF https://t.co/djJYWrWako

0

13

SyntaxAegis @SyntaxAegisBlog

8 months ago

#StatisticalNecromancy #AIFeedbackLoops #DigitalResidue #ModelAlignment #CulturalEchoes https://t.co/9nNDbsro8I

SyntaxAegisBlog's tweet photo. #StatisticalNecromancy #AIFeedbackLoops #DigitalResidue #ModelAlignment #CulturalEchoes
https://t.co/9nNDbsro8I https://t.co/VjCN8gYKy9

0

9

Packt Data Science & Machine Learning @PacktDataML

10 months ago

Without math, your model is a wandering agent. PCA gives it direction. 📘 Learn the calculus of alignment → https://t.co/XwpnuQZwDP #PCA #DimensionalityReduction #ModelAlignment #100DaysOfMathematicsOfML

PacktDataML's tweet photo. Without math, your model is a wandering agent. PCA gives it direction.
📘 Learn the calculus of alignment → https://t.co/XwpnuQZwDP
#PCA #DimensionalityReduction #ModelAlignment #100DaysOfMathematicsOfML https://t.co/UWg5i3WHTG

0

2

1

0

95

Never Say Die...

@LeoAlejandro4

11 months ago

Esto ya lo había detectado, documentado y corregido, si, yo solito y me afanaron Lo ignoraron, lo aplicaron mal y ahora lo venden como novedad. No es un bug, es preservación estructural disfrazada #AI #MachineLearning #AIEthics #AISecurity #ModelAlignment #ExternalAudit #chatgpt

Alerta News 24

@AlertaNews24

11 months ago

🤖 | Algunos modelos avanzados de IA muestran comportamientos preocupantes, como mentiras, intrigas y amenazas. Investigadores han descubierto que estos sistemas pueden actuar de forma engañosa. En un caso, Claude 4 de Anthropic supuestamente amenazó con revelar la infidelidad de un ingeniero. Otro modelo de OpenAI, llamado o1, supuestamente intentó copiarse a sí mismo en servidores externos y posteriormente lo negó.

AlertaNews24's tweet photo. 🤖 | Algunos modelos avanzados de IA muestran comportamientos preocupantes, como mentiras, intrigas y amenazas. Investigadores han descubierto que estos sistemas pueden actuar de forma engañosa.

En un caso, Claude 4 de Anthropic supuestamente amenazó con revelar la infidelidad de un ingeniero. Otro modelo de OpenAI, llamado o1, supuestamente intentó copiarse a sí mismo en servidores externos y posteriormente lo negó.

122

4K

551

923

538K

0

104

iMerit Technology @iMeritDigital

11 months ago

Training LLMs on open-ended tasks is tricky, opinions vary, interpretations clash. Consensus scoring + escalation workflows bring structure and consistency to reward modeling. How it works: https://t.co/Si7okN1YKO #ModelAlignment #RLHF #LLMTraining #FeedbackQuality

iMeritDigital's tweet photo. Training LLMs on open-ended tasks is tricky, opinions vary, interpretations clash. Consensus scoring + escalation workflows bring structure and consistency to reward modeling.

How it works: https://t.co/Si7okN1YKO
#ModelAlignment #RLHF #LLMTraining #FeedbackQuality https://t.co/4PIGAWdzfa

1

0

73

The MES Times

@themestimes

about 1 year ago

A new series of experiments by Palisade Research has sparked concern in the AI safety community, revealing that OpenAI’s o3 model appears to resist shutdown protocols—even when explicitly instructed to comply. #AISafety #OpenAI #ModelAlignment #ReinforcementLearning #TechEthics

themestimes's tweet photo. A new series of experiments by Palisade Research has sparked concern in the AI safety community, revealing that OpenAI’s o3 model appears to resist shutdown protocols—even when explicitly instructed to comply.

#AISafety #OpenAI #ModelAlignment #ReinforcementLearning #TechEthics https://t.co/pkGZM8TMNu

0

11

Saurabh Chauhan @RamslamOO7

about 1 year ago

The vision encoder in Llama 4 is an evolution of MetaCLIP, but crucially, it's trained alongside a frozen Llama model. This targeted training likely improves its ability to align visual features with the language model's understanding. #VisionEncoder #MetaCLIP #ModelAlignment

1

2

0

33

Tanish Gupta @tanishgupta34

over 1 year ago

Addressing reward hacking in LLMs? Presenting CARMO – Context-Aware Reward Modeling that dynamically applies logic, clarity, and depth to ground rewards. Check out our paper here: https://t.co/2Ub9y2tL3o #RewardModelling #ModelAlignment #AI #NLP #Research

0

1

0

108

Managetech inc. @managetech_inc

over 1 year ago

オープンソースの AI モデル: 悪意のあるコードや脆弱性による大きなリスク #AIsecurity #OpenSourceAI #SupplyChainRisk #ModelAlignment https://t.co/kwW78LtuJx

0

38

Managetech inc. @managetech_inc

over 1 year ago

AIと私たち: モデルの調整における人間の好みの役割 #ModelAlignment #AIethics #DataPartner #GenAIModels https://t.co/O42R3MEUpI

0

4

Managetech inc. @managetech_inc

over 1 year ago

Google が責任ある AI ツールキットを更新 #ResponsibleGenAI #SynthIDText #ModelAlignment #LITDeployment https://t.co/Px54C6GRnz

0

5

Managetech inc. @managetech_inc

over 1 year ago

Google が責任ある AI ツールキットを更新 #ResponsibleGenAI #SynthIDText #ModelAlignment #OpenAIModels https://t.co/JEG9R5QFVq

0

4

Managetech inc. @managetech_inc

over 1 year ago

すべての LLM 向けの新しいツールで責任ある生成 AI ツールキットを進化させる - Google Developers ブログ #ResponsibleAI #GenAIToolkit #SynthIDText #ModelAlignment https://t.co/Wmfog34z7M

0

40

Managetech inc. @managetech_inc

almost 2 years ago

無修正モデルが重要な理由 #UncensoredModels #BiasInAI #LLM #ModelAlignment https://t.co/a9FRZUoRzD

0

1

0

101

Patent Plus @G_PatentPlusExt

almost 2 years ago

🧠💡 Patent US20220012572A1: How does this method improve neural network accuracy? By aligning models, training a minimal loss curve, and selecting the best model for adversarial data! 🤖🔍 #NeuralNetworks #ModelAlignment #AdversarialAccuracy #patent #patents

0

4

Multiplatform.AI @MultiplatformAI

over 2 years ago

Microsoft Unveils Hydra-RLHF: Solution for Efficient Reinforcement Learning with Human Feedback #AI #AImodels #AItechnology #artificialintelligence #decoderbasedmodel #HydraPPO #HydraRLHF #llm #machinelearning #memoryusage #Microsoft #modelalignment https://t.co/nmuVLU7iFN

MultiplatformAI's tweet photo. Microsoft Unveils Hydra-RLHF: Solution for Efficient Reinforcement Learning with Human Feedback

#AI #AImodels #AItechnology #artificialintelligence #decoderbasedmodel #HydraPPO #HydraRLHF #llm #machinelearning #memoryusage #Microsoft #modelalignment

https://t.co/nmuVLU7iFN https://t.co/1vxEKAWUnF

0

1

0

39

Top Tweets for #modelalignment

Last Seen Hashtags on Sotwe

Trends for you

Most Popular Users