@NA_user_@statquant And 43 of those 112 PRs are tagged WIP anyway. So there's 69 not 112 (out of 1,415 closed) to be prodding about on twitter. Have a look at merged PRs and compare them to open PRs to see why they aren't merged and perhaps you can help.
@NA_user_@statquant And a better opener would have been to appreciate that 1,415 PRs have been merged leaving 112 currently open. Then if you actually look at the details of what happens in the PRs maybe discussing kindly will be helpful and fruitful.
@D_O_Arantes print in j returns NULL. Try something other than NULL as the last value of {} and you'll see V1 present in the result with that type, albeit empty, for consistency. It needs to eval j even on empty to know the names and types to return in the zero-row result.
@MyKo101AB@kar9222@Bootvis@EalesJames@statquant Perhaps then I could have replied: 'standard column names' and 'internal dataset always the same' are too vague for anyone to help about any software. If you can provide more details using verbose=TRUE we can try to understand the query and go from there.
@MyKo101AB@kar9222@Bootvis@EalesJames@statquant If you've said "I've posted this query to S.O. <link> and it seems it is the fastest way in data.table. Is there any other software that can do this task faster?" Then that would have come across a lot better.
@MyKo101AB@kar9222@Bootvis@EalesJames@statquant That's unfair for the reason it followed after I had asked you for verbose=TRUE output and you said you couldn't, and, you're restating it as if it was the first response, and you've dropped words from the quote that are important in context.
@MyKo101AB@kar9222@Bootvis@EalesJames@statquant And this is where you're mistaken. We know data.table has flaws, and there are known slowdowns. I was even thinking of accelerating efforts to fix them, if you had just sent me the details I asked for. Instead it's you that got offended and called me patronising.
@MyKo101AB@kar9222@Bootvis@EalesJames@statquant I'm not intending to be patronising. But I'm looking, again, at your original question, and you seem to be looking for a faster solution than data.table for a vague join task. Responding to check details seems utterly reasonable.
@MyKo101AB@kar9222@Bootvis@EalesJames@statquant Again, and revised, so even if we could tell you that you were doing something inefficiently that you didn't know, you didn't want to know that?
@MyKo101AB@kar9222@Bootvis@EalesJames@statquant So even if you were hitting a known bug, known slowdown, or you didn't know you were doing something inefficiently, you didn't want to know that? (Again: I provided the names of two faster package at joins in some cases, anyway, too.)
@0xeinar@MyKo101AB So often there is something wrong that we can address: an index that can be added, or a different type. That's what I was trying to get at. Even so, I pointed him to our benchmarks which show 2 other packages that are faster on join in some cases. I give up.
@MyKo101AB@Bootvis@EalesJames@statquant I wasn't offended before, but now I am. I pointed you to our benchmarks twice which show 2 other packages that are faster in some cases on join, and you didn't even acknowledge that.
@MyKo101AB@statquant@rstatstweet 'because of the preset column names' is a strange phrase to use. Well, anyway, if you find a way to post obfuscated verbose output (many people have) then I'm happy to look. What have you got to lose? And I pointed you to Polars and cuDF already too.
@MyKo101AB@statquant@rstatstweet And so absolutely nothing useful from verbose=TRUE? Are you merging character, or factor columns, low or high cardinality, numeric, integer? You're just giving us nothing to go on.
You could try Polars and cuDF. See our benchmarks: https://t.co/1JujkTWVfJ.
@MyKo101AB@statquant@rstatstweet That would be possible. But there are more amazing things to do before that surely. Currently the queue of PRs from other people who have quietly contributed is the priority.
@MyKo101AB@statquant@rstatstweet If an SQL query was slow, would you ask the same question in the same way? I hope not. You should look at the query plan, see if indexes are there and being used, etc. Have you even turned on verbose=TRUE and looked yourself?