Ryan Marcus

Verified account

@RyanMarcus

Assistant prof @CIS_Penn. Machine learning for systems, databases.

Philadelphia, PA

Joined March 2009

1.1K Following

1.7K Followers

673 Posts

19 days ago · Camden

I'm extremely optimistic about the future of education in the "AI era" (whatever that is), but one worrying trend I don't know how to address is that students are talking to *each other* less frequently. Study groups used to be a forcing function for that, but study groups seem to be an unfortunate casualty of our current processes.

1

13

0

0

182

19 days ago · Camden

@kn_owled_ge Relatedly, my course's TA office hours attendance is down 80-90%. Students who "just" want to get the HW done have the LLM do it. Students who want in-depth tutoring also have the LLM do it. On exams, I've had more perfect scores *and* more failing grades than ever before.

2

44

3

1

2K

22 days ago · Camden

@ShriramKMurthi @ArjunGuha Are your house rules / lab manual public? I'm curious -- I'm working on one for my group but I don't feel like I have anything coherent yet.

1

1

0

0

233

about 2 months ago · Camden

@krismicinski @ShriramKMurthi Ut oh... Is there a reason I want to be a senior member? I think I've let my ACM membership lapse every other year.

1

1

0

0

45

Who to follow

Wisconsin DB Group (@wiscdb.bsky.social)

One of the finest database groups in the country. We work on building next generation database systems, data integration, data science, and database theory.

Professor @utn_nuremberg. Formerly @AWS, @MIT_CSAIL, @TU_Muenchen. Focused on building efficient, easy-to-use data systems.

The Proceedings of the VLDB Endowment (PVLDB) RSS Feed: https://t.co/5wEKOfq2OD Bluesky: https://t.co/jULSIiQ5M3

3 months ago

@nomad421 Here's an example of the "almost": a LL chained hashmap is one of the most efficient concurrent multi-map implementations! You can insert a new node into a bucket's chain with a single atomic swap. This is useful for hash joins in database systems. https://t.co/uzdNFKcXFQ

0

2

0

0

69

5 months ago

@BenSManning @metrics52 Having fun is a (big!) competitive advantage. Those who succeed are likely to have several competitive advantages. So there's an "over-representation" of fun-having at the top. Of course, not everyone at the top has fun, and not everyone who has fun makes it to the top...

0

2

0

0

928

5 months ago

We conclude with a discussion about how database researchers should use industrial traces, and how we might begin to build systems that optimize for "the query the user never sends." 📄Paper: https://t.co/5LbhzV7Pxz

0

1

0

0

148

5 months ago

Most database teams optimize what they see in workload logs. But those very optimizations change what users choose to run! In our CIDR paper, we argue that industrial workloads exhibit 𝐬𝐮𝐫𝐯𝐢𝐯𝐨𝐫𝐬𝐡𝐢𝐩 𝐛𝐢𝐚𝐬: logs reflect a negotiation between users and the platform.

RyanMarcus's tweet photo. Most database teams optimize what they see in workload logs. But those very optimizations change what users choose to run!

In our CIDR paper, we argue that industrial workloads exhibit 𝐬𝐮𝐫𝐯𝐢𝐯𝐨𝐫𝐬𝐡𝐢𝐩 𝐛𝐢𝐚𝐬: logs reflect a negotiation between users and the platform. https://t.co/lCv6R9JaDQ

1

6

0

0

340

5 months ago

For researchers, databases traces are a MAJOR upgrade compared to synthetic benchmarks (or simply making something up, which is shockingly common). We argue we need more of these workload traces to build a complete picture, and, perhaps more importantly, see what is missing.

1

1

0

0

185

about 1 year ago

For that one query that must go 𝑟𝑒𝑎𝑙𝑙𝑦 𝑓𝑎𝑠𝑡, BayesQO (by Jeff Tao) finds superoptimized plans using Bayesian optimization in a learned plan space. It’s costly, but the results can train an LLM to speed things up next time. 📄https://t.co/ZaHFBd6d7I

RyanMarcus's tweet photo. For that one query that must go 𝑟𝑒𝑎𝑙𝑙𝑦 𝑓𝑎𝑠𝑡, BayesQO (by Jeff Tao) finds superoptimized plans using Bayesian optimization in a learned plan space. It’s costly, but the results can train an LLM to speed things up next time.

📄https://t.co/ZaHFBd6d7I https://t.co/J7aaAP6fZK

0

6

0

0

319

about 1 year ago

OLAP workloads are dominated by repetitive queries -- how can we optimize them? A promising direction is to do 𝗼𝗳𝗳𝗹𝗶𝗻𝗲 query optimization, allowing for a much more thorough plan search. Two new SIGMOD papers! 🧵

1

10

0

1

616

about 1 year ago

LimeQO (by @yi_zixuan), a 𝑤𝑜𝑟𝑘𝑙𝑜𝑎𝑑-𝑙𝑒𝑣𝑒𝑙 approach to query optimization, can use neural networks or simple linear methods to find good query hints significantly faster than a random or brute force search. 📄https://t.co/WncZWqOCGe

RyanMarcus's tweet photo. LimeQO (by @yi_zixuan), a 𝑤𝑜𝑟𝑘𝑙𝑜𝑎𝑑-𝑙𝑒𝑣𝑒𝑙 approach to query optimization, can use neural networks or simple linear methods to find good query hints significantly faster than a random or brute force search.

📄https://t.co/WncZWqOCGe https://t.co/JMGO8zHZ8t

1

7

0

0

419

about 1 year ago · Philadelphia

@DPearsonPHL @coryfromphilly Yeah, college-aged folks in college-adjacent stations wearing college-branded clothing seems like good evidence to make this inference. I'll report back if/when I get a response from the higher-ups.

0

6

0

0

69

about 1 year ago · Philadelphia

@DPearsonPHL @coryfromphilly Is there really a disproportionate trend of Penn students evading the fare? Not saying there isn't, I'm uneducated here. If so, I'll raise the issue with the university. I imagine I'll at least get a response. Fare evasion is clearly against the student code of conduct.

1

5

0

0

83

about 1 year ago

@alpha_convert Use RDTSCP, with an extra mfence if you want to ensure writes are flushed. This also solves the problem of different NUMA regions having different clocks. I'm not sure anyone uses RDTSC for timing on modern CPUs, but admittedly I haven't looked into it in a while.

2

4

0

0

99

about 1 year ago

@justinjaffray I think the main reason it's called "JIT" is because it uses the LLVM/GCC APIs that are used for implementing JITs. Obviously if I use a screwdriver to hammer in a nail, that doesn't make the nail a screw, but calling it a "screwed in nail" isn't too far from the truth :D

0

0

0

0

50

over 1 year ago

Pair(akeet) programming.

0

13

1

0

817

over 1 year ago

@fluxtheorist @fizziksBoris @atheorist Oral exams, formal or informal, are a staple of any PhD program and, in my experience, work very well. But I don't know how to scale it up to a class of 300-400.

1

1

0

0

36

Last Seen Users on Sotwe

Trends for you

Most Popular Users