Agentic coding and Free Software
Through work, I have paid license to windsurf (recently renamed to "devin"), an application for LLM-based (aka, "Agentic") development.
I hadn't been using it that much, but in an effort to more clearly understand how this whole AI development thing works, I decided to give it a closer look recently.
My conclusions:
In its current form, this whole LLM wave is problematic for multiple reasons. But ignoring that, and looking at the technology only, I can say that:
- it is a paradigm shift;
- it is, at the technological level, a positive evolution;
- and it is a threat to free software.
Problems
Lest someone (incorrectly) assume that I am arguing in favour of the current state of affairs with regards to LLMs, let me state this first.
The way LLMs are built today is highly parasitic. Websites are downloaded in whole, at unsustainable rates, regardless of the consent of the people who made the original content. The result is predictable: servers get overloaded, server administrators attempt to implement various mitigations. Some of these mitigations work; some do, for a while; some are entirely useless. In actual fact, the mitigations are an arms race -- if too many people implement the same mitigation, then the people who try to build yet another LLM so they can extract rent will just try to work around the mitigation, eventually they will succeed, and you'll just have to come up with another mitigation. It's a bit like spam; you introduce regex-based spam filters, they introduce spelling mistakes, you introduce bayesian filters, they add a large batch of markov chain-generated semi-nonsense words made invisible by markup, you add filters to block emails with such markup, they move the text into an image. We have working mitigations today, but eventually we'll run out of ideas.
LLMs glob up everything they can while ignoring the license of the source material. The people who push those LLMs claim that pushing the source material through the machine learning algorithms makes the output of the algorithm distinct enough from the source material that the license no longer applies; I'm not so sure that this is true. I guess the New York Times v OpenAI lawsuit will teach us some of the answer to that question here, but even so the ethical questions about "is it OK to bring down another server just so we can download the internet for another for-pay LLM" are still open. And regardless of what the law states, my opinion on "you're using my copyleft code to generate code under a different license" is not something you might like if you agree with the rent seekers' opinion on the subject.
That all being said and true, the technology works. You can have a "conversation" with an LLM that resembles a human one. If you pass it some data, you can use plain english to ask it questions about that data, which is a lot easier than to ask it about that in a formal way. You can request it to generate some code, and it will generate something that looks like what you need and that will be mostly correct for like 95% of the time.
Now, yes, 95% of the time is not 100% of the time, and no, you can't ask it to "write me a piece of software that implements this 300-page requirements document and get back to me when you're done", because it will fail, and you won't know where it has failed, and you'll take it into production and expect everything to be fine because it won't and this one minor logic bug will cause half your servers to spin and consume credits with your infrastructure provider with nothing to show for it.
But that doesn't mean you can't use an LLM to build a large piece of software. It just means you have to understand the LLMs limitations and strenghts, and use them correctly.
Here's what an LLM is good at:
- Generating plausible text
- Interpreting text to figure out what a plausible meaning or summary of that text is
- Giving vague indications as to what the probable context of a given body of text is.
It turns out that that's enough to use the LLM to build a reliable piece of software, provided you do it right.
Paradigm shift
An LLM can generate text by the truckful. The generated text could be code. Given a good enough LLM, the generated text might even run and do something useful.
You can try to blindly run the code, and if it doesn't run correctly, you can paste the error message to the LLM, and it can tell you what went wrong and how you could possibly fix it. This creates a feedback loop: you ask it for an amount of code, you run the code, you receive an error, you tell it that the code is problematic and give it the error message, it makes changes to the code, now you have something that at least no longer fails at startup.
If you ask it to add tests to make sure that your code acts as per your specification, now you get an error if and when the code doesn't act as per your specification. Or, well, at least not as per the part of the specification that was correctly turned into a unit test by the LLM.
LLMs have a context window, so if the error message is pasted in the same conversation as where the code was generated, it is able to reuse the earlier prompts to refine how it should interpret the error message that you received.
You can't really paste the source code of an entire application into the prompt of your LLM, that would quickly overrun its context window. But LLMs also allow you to provide some form of background information -- a document, say -- on which you ask it to reason. It will interpret that document, but doing so uses less of the LLMs context window. So providing the LLM with your application's source code as background information can help it understand better how your code interacts. This is especially helpful if you only provide the LLM the background information relevant to the actual question.
So now if you are able to:
- Create background context with your application's source code
- Have the LLM generate a first draft of your requested change, plus the tests to make sure it works
- Compile (if applicable) the generated code (and tests) and run said tests
- Return any error messages to the LLM with a request to correct the error
Then the combination of "getting it 95% right off the bat" and the above feedback loop means you can generate syntactically correct code, that probably does what you need, in minutes.
I say "probably" for a reason. There are going to be cases where you specify a request without a number of details (because they are implied), and the LLM will get most of those details right but just not implement the one bit because it's an automaton and it doesn't think. Or you will ask it to make sure that two bits of the application look exactly the same, without specifying that they must act the same, now and in the future, and it will just generate the same block of code twice and then in a future change it will change one but not the other.
But if you review the changes, and you have experience as a programmer, you will be able to spot most cases where the LLM got it wrong. And so it's possible, if not necessarily easy at first, to use an LLM to generate mostly correct code.
There are certain places where "mostly correct" code is not desireable. But equally, there are also cases where, "mostly correct" is good enough.
After all, most of the software you run today -- the bits of it that weren't, yet, generated by an LLM -- is only "mostly correct", too, because to err is human and we all make mistakes. If not, there wouldn't be any CVEs and your software would never do anything wrong.
Now, doing the feedback loop described above is certainly something you could do manually. You could open an account on one of the LLM websites, upload the source code of your application, ask it to generate some new feature, download the newly generated feature, run it, and then copy/paste any error messages back into the LLM.
But that's a lot of manual work of the type that computers are pretty good at. So that's what the "windsurf" tool helps you with: you run it inside your IDE -- either a VSCode-based tool that you download from their website which comes with their product preinstalled, or a separate JetBrains plugin that you can install. You can then open your entire relevant codebase in a workspace in your IDE. You then ask the LLM, through the IDE, to generate a new feature in your codebase, and to also generate the test while it's at it. It will use a mixture of LLM interpretation and non-LLM functionality to scoop out the relevant bits of your codebase to send to the LLM as background information, will send it your prompt, will download the generated code and patch or create files, will compile (if required) and run the newly generated code and tests, and will refine the generated code if the tests produce any errors. All mostly automatic; by default, running anything requires explicit confirmation. You can turn that off completely (probably not a good idea), or you can give it a whitelist of things that you don't want to confirm (perhaps OK), and the tool also passes standing instructions to the LLM to never generate any command that deletes a file (which, like with any LLM, can be overridden, but it requires you to be very stubborn and to use more credits than you'd probably like).
All this put together means you can build something without writing any piece of code, provided you do it right.
A technically positive evolution
Don't go and say, "here's a 300-page document, read it and write whatever the document says". It will get it wrong, it will write a massive test suite that it will only run at the end, it will choke itself up trying to interpret the massive amount of failures it encounters, it will fill up its context window and it will start to forget some of the requirements. That won't work.
But what you can do -- what I did, in fact -- is this.
First, create an empty workspace. Don't put any code in it.
Then, tell the LLM to generate a backend framework using technology X and a frontend framework using technology Y that initially only says "hello, world". Also add tests to it, and run the tests.
It will do that. You'll not get much, but it will work.
Then, ask it to add some UI elements. A login page, perhaps. A navigation bar. Small things. Most of it doesn't have to be functional -- but tests must be there for the bits that are, and have it run the tests and evaluate the results.
Rinse, repeat, until you have a working application.
Importantly, in between the steps, you should also run the application
yourself and see if the change was implemented correctly. Sometimes it
won't be. Sometimes there will be a subtle bug -- I at one point had a
the application hang after a few minutes. Sometimes you tell it that
there's a subtle bug, and it will discover it more quickly than you
could, and it will fix it, and in implementing the fix it will uncover
another bug, and then you have to fix that one -- the fix it came up
with for the hang was to move something to an async process on the
server, which caused the application to start spinning while trying to
create hundreds of async jobs (this is when I realized that the hang was
a deadlock due to some part of the codebase doing something that
indirectly triggered itself). Sometimes it will try to fix the bug you
tell it about, and you'll see that it's going off on a tangent that has
nothing to do with what you're seeing. It's important to keep an eye on
what it's doing, so you can guide it back on track when that happens --
when I told it about the hang, it started investigating the part of the
code which sends out emails, thinking that it could hang while waiting
for sendmail to finish, but the hang was happening when the
application was idle, not when it was sending out emails, and only
when I told it about it happening when it was idle did it find the
deadlock.
So it's not a fully automatic process, and it needs to be guided by someone who knows what they're doing. But if that is the case, you can come up with something that works. I spent evenings and breaks for about a week, and I managed to create a working application which, had I written it by hand, would have taken me a few months of full-time work to come up with. And I now have a side project, fully complete and working, that I had been thinking about doing for more than a decade, but never got around to actually doing, because of all the work that would be involved and I just didn't see myself having the time for.
It's not perfect code. But it's mostly good enough, and it will perform the job it needs to. And it looks far slicker than most of the side projects I've done in the past, because in the past I would prioritize between implementing new features or making something look slick, and I would decide that the new feature was more important because it's only for me and there's only me and nobody cares if it looks good or not and I don't have three weeks to come up with something that looks better. But here, I found myself sometimes spending 10 minutes writing a prompt with instructions on making things look better. Because what's 10 minutes when you just spent an hour writing down and refining specifications for functionality and tests?
There are a number of other things in which an LLM can help a programmer.
For instance.
I received a bug report recently in a project I'm paid to maintain that I couldn't make heads or tails of. I opened the source code in my windsurf IDE, pasted the bug report in the prompt, and then requested the tool to analyze the source code and the associated logs and tell me how the described behavior could be happening. It turned out that I had overlooked something, but with the help of the tool, I found the bug in minutes.
I was trying to understand a particular part of a large codebase that I didn't really grasp very well. I loaded the codebase in the tool, and asked it to explain to me how a particular action is performed by the code. I requested specific functions and line numbers. I now have a far better understanding of how the code works, and will be able to write that patch that I've been wanting to write for years -- without using the LLM.
I have been struggling for, literally, years with understanding why another tool that I maintain was misbehaving in a particular way but only in Firefox. I opened the codebase in Firefox, explained the buggy behavior in plain English, and asked it to explain how this could be happening. It picked up some obscure corner case behavior of ffmpeg and mp4 containers that I was not aware of and that perfectly explained why things were misbehaving in the way that they were.
At the same time, there are limitations. Giving an LLM a codebase that was originally generated by an LLM (either the same one or another one) seems to work well. Giving it a codebase that was written by a human and expecting it to correctly update it seems to be more error-prone. I did one or two of those as a trial, and it is more problematic than anything.
An LLM is also not intelligent, notwithstanding the popular term of "Artificial Intelligence". On multiple occasions, I've asked it to write a test case for some code that was not set up to do so; and rather than suggesting a refactor is required, it would instead copy the code that needed to be tested and then test the copy, rather than the original. The tool has made multiple similar errors. I have sometimes people describe agentic coding as "similar to interacting with junior programmers", but that is not the case. A junior programmer will either fill in the gaps in your specifications, or ask for clarification when something seems off. The LLM will not do that; it will do what you ask, exactly that and nothing more. If you missed a corner case in your specification, then all bets are off.
I remember learning about programming language generations in college. A first-generation language is "machine code", a second-generation language is "assembler", a third-generation language is any high-level language such as C, Perl, or Pascal. I've forgotten what set a 3rd-generation language apart from a 4th-generation language. But I remember the definition they gave me for a 5th-generation language: "you tell the computer what to do, and it will do it". At the time, I thought it was ridiculous. Nobody could ever write something like that.
But it's here.
And it's a threat to free software.
A threat to free software?
Yes.
There is the obvious part where most of the well-known LLMs are non-free software. I mean, there are some "open source" LLM models. The windsurf tool that I used doesn't allow you to use them (directly), but they're there. There are also open source applications that implement what the windsurf editor does. So it's definitely possible to work like this without resorting to non-free software and non-free services, even though the non-free LLMs might be a bit ahead of the curve of the free ones. But that's not what I mean.
And there is also the obvious thing which I mentioned earlier in this post, which is that the people who try to build LLMs are doing it in unethical, disgusting ways, causing downtimes and disregarding licenses for whatever they can get their grubby hands on. Ideally we wouldn't be in that situation, and ideally this wouldn't be a problem, but we are where we are.
And there's the obvious thing where the OSI sold itself out and declared that a machine learning program can be open source even when the very things it was built from -- the training data -- is not available. That's a major issue that the free software community needs to fight against, but there's not really anything that that is a threat to free software. You just build your own, free software, LLM, and you're done.
The actual threat is in funding and developer support.
Most large businesses do not care about free-as-in-freedom software. They like the free-as-in-beer part, and they appreciate that the free-as-in-freedom bits can make the software more customizable. They are (mostly) happy to do sponsorships of the free-as-in-freedom projects that they use if that means their free-as-in-beer usage of the software gets improved.
But why would you care about all that when you can just generate the code you need, rather than interacting with an open source community that may or may not care about your business's interests?
Where to go from here
Although I think the moral and environmental issues with LLMs are real and problematic, given the experiments I did I am not convinced that the concept of interacting with a computer system in natural language and to use it to generate code is necessarily deficient. There are pitfalls, but they can be managed. It is possible to use such a system to create throwaway, proof-of-concept type "good enough" code bases. It can be used to interpret code bases and to understand bug reports.
I believe that the major issue with LLMs has to do with that saying about hammers and nails:
If all you have is a hammer, then everything looks like a nail.
LLMs are an outgrowth of machine learning, pushed by large corporations. These large corporations have a lot of money. If all you have is money, then every problem can be fixed by throwing more money at it. The initial language models were promising but not (yet) good enough, and it seemed that one way in which they could be improved was to increase the scale of the statistics: throw more hardware (and thus money) at it, and rather than improving the efficiency of the models, just scale up.
Scaling up is something that megacorporations are very good at. It's only a money problem, after all. Does that mean that "scaling up" is the only way to improve the models, though? I'm not convinced.
Some hardware, such as most modern Apple and Samsung devices, ship with accelerator hardware for machine learning algorithms. There are some models that are small enough to be able to run on these devices. I don't see why it should not be possible to create a small(er) language model that can do some useful part of the above-described use cases; if not locally, then at least on a server that one can run on-prem rather than requiring that you pay rent to one of the LLM companies.
The Software Freedom Conservancy has published an aspirational statement on machine learning-assisted programming that, I think, gets a lot right. It's not quite a definition, but it's something to keep in mind.
Perhaps that's the way forward?
More questions than answers at this point, anyway.




































