Just created this to answer a student's question. I do not care for social media, but maybe I can help demystify or correct some misconceptions about statistics
@SamKoebrich @qcsoarer This is actually something I was thinking of talking about this morning. A good thing to do would be to separate precinct counts from all States into urban and non-urban areas (roughly blue/red). Then conduct a Benford's analysis to determine goodness-of-fit and compare.
@ReposeGuru I think it is clear we have work to do to improve our sampling processes. I would hate to think the errors we see are intentional. I think it is more likely that previous'y tried and true sampling techniques have not caught up to the modern world-both technologically and socially
Thank you to whoever took this over and developed an interest in the data collection and exploration/analysis. I am glad people are getting interested in learning this stuff!
@VirtuArete I really am going to bed now but I saw this and can't help but enthusiastically say: anything by Charu Aggarwal (Outlier Analysis is particularly good!)
For the few of you that liked the stats jokes, I will end the night with one: Why was Mr. Bernoulli Bayes known as the "man about town"? (answer in the reply)
Who knew that with less than 100 lines of code you could make half the country wish you were dead, the other half appreciate math, and approximately 0.00000001% laugh at what you thought were high quality stats jokes.
@ReznoirA You are absolutely correct from an efficiency standpoint. I would even say that I can do it in one line with Perl (because I can do anything in Perl in one line). But I wrote the code the way I did for instructional purposes. Feel free to hate it.
@thingcreator Yes, but Benford's is a discrete distribution and is really easy to create. Chi-squared is also easy to code (I say that even though I used scipy, less typing). Also, in anything I am mocking up as an example that needs to collect data, I am going to use Python. But I do love R!
@Prof_JTaylor I want to say thanks but....I don't think having followers is a good thing. I would rather people learned some cool statistical techniques and how to code (I mean, come on, Python is so incredibly accessible, I grew up on C and Perl).
@halfacanuck Oh my, I don't think I will be posting any others here. I'll take data collection requests and I can write analytic code, but you post the results lest someone call me Putin again.
@thecatalvarado I think a lot of people (and yes that includes you) are reading into what I've posted with their own bias. I clearly have talked about goodness-of-fit to Benford's distribution and the anomalousness relative to parallel in-context sets (i.e. - 1 set of frequencies vs another).
I am making these tweets to explain in one place some analysis that was done last night.
1 - I was asked offline about doing Benford's on election data. I explained that this is common and a useful way to detect anomalies in data that are driven by artificial process (e.g. fraud)
@thecatalvarado Note, the code scrapes the data for you and runs the analysis, but feel free to evaluate it. Also, keep in mind that it's not meant to be the most efficient code (got some valid criticism there). It is written to be instructional.
@thecatalvarado Totally get it, econ rocks, you guys are basically stat cousins (and many people fail to realize so are the agriculture guys, stats foundations there!).
Here's the link to the data:
https://t.co/2CfNJvp0WJ
and here is the link to the code:
https://t.co/zO7lqY1ZGA
@thecatalvarado If my data was subject to an IRB you would be somewhat correct (there are actually other more important hurdles in that case). This was a just a brief explanation of how to perform the data collection and analysis. There is no peer review in this scenario other than the PUBLIC.