@DrAnneCarpenter@benbfly@aexbrown@manlius84@nature I'm not deflecting I have been consistent in all these threads. IMHO the decision should be based the complexity of the data and the methods that need to be developed to analyze it. One size doesn't fit all. Treating everything the same will have unintended negative consequences.
If you are still writing your code in Matlab, please try to transition to a real open language like Python. Matlab is NOT open code, it is a walled garden. If you believe in open science then you shouldn't be using it. (As it happens Matlab is also poorly suited to big projects.)
@DrAnneCarpenter@benbfly@aexbrown@manlius84@nature Sorry Anne I wonder (1) why you don't seem to be concerned about protecting young students who collected the data and are still working on primary papers and (2) why you think immediate data dumps won't disincentivize big projects. Every decision/rule involves tradeoffs.
@benbfly@DrAnneCarpenter@aexbrown@manlius84@nature All of our data are released on https://t.co/huu2gYNvyS. These data will be released on https://t.co/huu2gYNvyS. And the code for our whole pipeline is being released piecemeal as we make it usable for other humans, that is on our github page. We're not "hoarding" anything.
@benbfly@DrAnneCarpenter@aexbrown@manlius84@nature I understand but I don't think that you're fully considering the differences in data taken in different fields/labs, or the impacts on students who take a big risk in doing something really ambitious and novel that can support multiple papers but only after many years of work.
@lmeyerov@manlius84@aexbrown@nature@imerov I am one of those open source folks! We release lots of code on github, including the best brain viewer out there right now. And our data are released on https://t.co/huu2gYNvyS. However, the timelines in CS and neuroscience are inevitably different.
@CaAl @maltoesermalte @nicebread303 @lakens@sharoz But again, all the data are being used in these papers. My lab has a longstanding policy of releasing all data once the primary papers are out, and we have a clear track record of doing this. You can check https://t.co/huu2gYNvyS for our data or github for code.
@DrAnneCarpenter@aexbrown@manlius84@nature You seem pretty angry Ann but FYI 1) our simple data policy is that we release ALL data after the primary papers are out, 2) this is to protect the students who collected the data and developed the methods, 3) doing otherwise would disincentivize highly novel/complicated studies.
@rubenarslan@lakens@sharoz I think the field is moving toward that as an alternative venue for credit assignment and I think that is a good idea in general. Then people could cite the primary research papers or the data as appropriate.
@DrAnneCarpenter@aexbrown@manlius84@nature I am sorry that you feel that way Ann. We have always been up front about our data sharing policy. We share all data on https://t.co/huu2gYNvyS after the primary papers have been published. This is to protect the students who collected the data until they can finish their work.
@nicebread303 @lakens@sharoz That seems like a really weird solution to the problem. If the data are completely novel and require new methods of analysis then the timeline is unknown. A better solution would be to base future funding on past evidence that data collected were eventually shared.
@nicebread303 @lakens@sharoz I argued that it should depend on the complexity (information content) of the data and the extent to which new analyses will be required to publish subsequent papers. I don't think that there is a one-size-fits-all rule. But 5 years certainly seems generous.
@jjolij @jjfahrenfort @lakens This sounds like a good policy, especially if there are boilerplate license forms available for various scenarios. It would protect the students while getting the data out quickly. However, AFAIK we have nothing like that over in the US.
@aexbrown @NeuroPolarbear @BorisBarbour @manlius84 Again we all agree that the data should be released. The question is when. Immediate release has negative repercussions for the students who collected it and disincentivizes large studies. There is no perfect solution.
@lakens@sharoz Big data sets collected by consortia won't solve the problem. They are often "designed by committee" which means they are mediocre for everything and good for nothing. That said, I would LOVE it if someone would collect a large data set for me, as long as it met my standards!
@lakens@sharoz People doing quick cutesy studies that are significant but which have small effects is precisely what got us into this replication problem to begin with! We should be incentivizing high-quality science, and you can't do that quickly.
@lakens@sharoz Look Daniel the bottom line is that if everyone has to release all data on first publication then people will be incentivized to collect small, low-information-content data sets. You know that and I know that. And that will not be good for science.
@RomainBrette @manlius84@aexbrown@nature We are obeying the rules. We are getting this done as fast as possible. We have a long track record of releasing data and we will release these data too just like we always do. I feel like you are just trolling now...