@oxinabox_frames@apaszke@shoyer@dancherp On this view you get dual numbers by implementing an instruction tape tracer + transform, and then optimising it by fusing those steps together, without changing the semantics or PC issues.
@oxinabox_frames@apaszke@shoyer@dancherp I tend to think of the tracing and AD transform steps as orthogonal. AD transform looks similar everywhere (adjusted for Fortran AST, Jaxprs or SSA). Tracing systems are the source of PC errors, not the transform step.
Excited to share our @UofT & @VectorInst work "Emergent Road Rules in Multi-Agent Driving Environments" w/ @PhilionJonah, Andrew Liao & @FidlerSanja.
Paper: https://t.co/wOouAT5CkF
Project Page: https://t.co/ceW3ZIZWQn
Code: https://t.co/f25FvLk5Oj
[1/5]
"The police and the national guard, who ringed every campus, keeping the students from creeping across to society like so many black rats swarming out of a leaky ship.”
“I can’t live two hours without my ID, he said to himself. I don’t even dare walk out of the lobby of this rundown hotel and onto the public sidewalk. They’ll assume I’m a student or teacher escaped from one of the campuses.” – Philip K. Dick
“He knew that at one time she had been illegally married to a student commune leader, and that for one year she had lived in the rabbit warrens of Columbia University, along with all the smelly, bearded students kept subsurface lifelong by the pols and the nats. ...
The Imperial College epidemic simulation code that I helped a little on is now public: https://t.co/LBVAgsE4AY I am a strong proponent of public code for models that may influence policy, and while this is a "release" rather than a "live" depot, it is a Good Thing.
@skornblith@StefanKarpinski The SK / Singapore approach depends on pretty extreme measures (location tracking, loose rules on gov access to medical records etc) that will be a lot harder to implement in the US (politically and logistically) than lockdown
@StefanKarpinski A lot depends on how long immunity lasts for; for some coronaviruses it's around three months, which would make things significantly harder. Hopefully more data coming out on that soon.
@ChadScherrer@theshawwn@rbhar90 That's not the only option; e.g. you can just make up a derivative that's good enough for gradient descent. And there are probably other mathematically reasonable schemes that you could choose.
@ChadScherrer@theshawwn@rbhar90 All programs have a bunch of non-differentiable operations like mod. An AD will just assume the derivative is always 0 for that function.
@Inoryy @ianadwilliamson @shoyer Julia's Python interop is actually really solid, and in some ways more convenient (no PyObject casting, more auto conversions). More experimental (but basically working) is differentiation through PyCall: https://t.co/Hfbv7fBQHQ
@ChadScherrer@oxinabox_frames A big challenge is getting efficient source transform AD and higher-order derivatives; if done naively, nesting gradients produces exponential amounts of code. You'll notice this if you try to do third-order in Zygote