Why AI and LLM don't work to produce code?

2024 DORA report shows that the usage of AI worsens software delivery. It stands that this is due to a lack of maturity in AI and LLM adoption. I believe it is induced by the tools themselves.

DORA report hypothesis

I read the 2024 DORA report. It provides interesting feedback regarding the usage of AI within software-producing organizations.

People experience an improvement in productivity when using AI tools to generate code, tests, and documentation at individual and team levels.

However, it does not improve velocity or software stability. On the contrary, metrics analyzed by DORA get worse when AI tools are used.

Reporters' hypothesis is that AI adoption lacks maturity and improvements are expected in coming years. My hypothesis is that people are hallucinating more than their favorite LLM.

AI used to produce code does not improve productivity and code quality

AI does not improve productivity

It does not improve productivity because it provides an intermediary between the developer's thoughts and produced code. In the classic way, the developer knows roughly what she wants the computer to do and refines her thoughts and written program step by step. With LLM, the developer has to express her thoughts in a prompt to produce a large amount of code in one shot. Then refine the prompt if the result does not correspond to the expected feature. It's a tough intellectual shift and refining the prompt does take time.

AI does not improve quality either

Anyways, the developer could train herself to prompt a LLM to produce the expected code. Now, she has to ensure that produced features meet the organization's quality criteria through test and code review. By experience, I know developers are bad at reviewing code written by others. The larger the amount of code to review, the worse the review is in terms of caught errors. That's why practices like pair, mob, ensemble programming have emerged: to review code while it's being typed.

Quality is also ensured by testing the feature and in the long run by automated testing. Are the developers and testers willing to write automated tests when they have a LLM that can generate them? Are they able to review the testing code? See the previous paragraph.

What should developers do?

The promise of AI is to produce code faster with a better quality than a human being. Maybe it could, but to do so, it should be prompted with a very fine degree of precision. We already have tools for this purpose. They are way cheaper than AI. They're called programming languages.