on the AI front - devs take 19% longer w/AI - the Polo Grounds

https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

Core Result

When developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against developer beliefs and expert forecasts. This gap between perception and reality is striking: developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%.

Note, this test was run on tickets or tasks that developers anticipated taking about 2 hours a piece. So that wouldn't cover AI helping a front end developer trying to do something novel in the back end.

--
I'm a bad take machine!

Gauntlet AI

It's a 12 week intensive AI coding bootcamp that, if you are invited and complete the course, you are guaranteed a $200K job with one of the sponsoring companies.

It was created by the guy behind BloomTech that sought to disrupt traditional college tracking for developers/coders/etc.

The publish videos nightly of that day's builds and the results are pretty impressive, on the surface, at least. More difficult to say if the apps are really enterprise-ready.

I wonder if the 19% degradation reflects lack of training on how best to use the AI tools, at least in part.

--
I think the children like it when I "get down" verbally.

working over 90 hours a week because he loves it. Dude, you're working the equivalent of 2 full-time jobs. No thanks.

Not quite the same value prop.

--
I think the children like it when I "get down" verbally.

After using AI for a while as a developer, my feeling is

It's really good in the initial scoping, design, and R&D phases. Initial implementations are pretty good as well -- how to lay out a set of functions or classes that form the scaffolding for your project. This is stuff that architects and developers could sink a month into. A single, good developer could use AI here and come out with an overall design that's probably as good or better in a couple hours.
Very good at basic/intermediate coding. Saves a good amount of time here.
It seems good at the QA/unit testing phase, but it lets a surprising amount of things through.
It's good at basic debugging -- "this error message/exception at this line means you need to fix your code this way".
On more complex debugging it gets into cycles and thrashes. I've had a number of instances where I've spent 2-8 hours and it only ends because of human intervention/observation. That's a "me problem" to some degree, but there's a very real skill to knowing when you have to step on it pretty hard to get it pointed in the right direction.

--
I'm a bad take machine!

Your observations are in line with what I've seen.

I expect improvement in this space, so I think it will take time, but these solutions will evolve and become better slowly at assessing some of the more complex issues.

I just hope in the meantime product people put out better guidelines describing some of the risks/ best practices of where these limitations are.

--
"2020 ... Let's win it all ..."

News at 11.

Half joking. I will be interested in reading this.

The 19% is based on with and without AI control groups.

It's a really interesting read.

--
I'm a bad take machine!

Though it somewhat aligns with my suspicion that it AI makes building prototypes of new products very fast, while it might only marginally help with fixing bugs or building features in existing products.

I have been vocal on here about my disdain for AI, particularly in domains where accuracy matters. But I have also managed to gin up prototypes for features to show my engineers, "This is how it should work" incredibly quickly. Still, the prototypes adhere to practically no rules of engineering we have for code that gets into our product, so the team can't just drop it in. Usually they just start over from scratch entirely. And the AI just simply refuses to follow the rules and guidelines we have for product engineering, so we don't use it on actual production code.

Still, I'd have guesses there would be at least a trace improvement on velocity.

Still don't want it answering any questions that can't immediately be validated with linting and compiling though.

--
Sometimes I rhyme slow sometimes I rhyme quick.

Incredibly helpful in the sales process as well as validating features and the underlying technology that will enable them. Production code it is not.

- No text -

I think it has the best agents built around it.

--
I'm a bad take machine!

...the game.

It's pretty nuts what you can prototype with a plan, code, test pattern. I usually burn thru Claude's limits and then hit Gemini (which currently has no limits, but doesn't work quite so well).

--
Sometimes I rhyme slow sometimes I rhyme quick.

- No text -

the Polo Grounds - 20250422

on the AI front - devs take 19% longer w/AI

What do you make of the claimed success of things like

I love how they use a testimonial from a guy

No kidding - two $100K jobs

I wonder where they found Roger

I agree w this

Or, devs take 19% longer than they estimated.

not quite

I'm pretty surprised.

That is how we use it.

Which LLM (do you prefer for coding, and why?

4o

The Claude Code and Gemini CLI agents have really upped...

Thanks, appreciate it!