TLDR: Using agents for programming can be great, but I fear for the maintainability of the code that's produced. As I'm currently looking for work, this lets me see the value my experience can bring, while also making me fear a future to having to work with a lot of code that was quickly written but is mediocre (at best) and hard to maintain.
I'm currently looking for work. It's frustrating and demotivating. I haven't been in this position in a long time, and it's worse than I remember.
I find it useful to take a break and "relax." Sometimes, I like to do that by writing some code, often for a quick side project.
Yesterday was one such day.
I'd recently been using a library that broke something (deleted a public API) without warning. As part of discussing this issue with the maintainer, I said, "They should use automated tools to check for the removal of public methods" They responded that it was a good idea, but they didn't know where to begin with adding such a check.
Consider me nerd-sniped.
In theory, it's not a complicated task. Simply analyze the diff of a PR and report accordingly.
However, I'd tried making my own GitHub actions previously (to track changes in the number of tests in a PR) and was unsuccessful.
Then I remembered that "AI will solve all our problems."
I was also keen to try and see what Agentic-based coding of a new project was like. True "vibe coding". Start with nothing and get the "AI" to do all the work. Can it really work? Even on "new" things with little existing code on the internet for the LLM to copy from?
So, to distract myself from an unsuccessful days job hunting, can I try to kill multiple birds at once, and also gain more personal experience with "vibe coding"?
I thought I'd give it a try.
As agent support isn't yet in Visual Studio, I decided (had) to use VSCode. I've not really used VSCode for anything heavy on the C# front. This is the first time I've built anything new with it. (If I'm doing something with C# in VSCode previously it's to debug MAUI apps on a Mac.)
So, I created a program that could run as a GitHub Action. I had these goals:
- Be able to analyze diffs on a PR to detect added tests, removed tests, and deleted public methods.
- Be written in C#, so I could understand what was happening and potentially be able to change/extend the code in the future.
- See how good Copilot could be as an agent. (Tell it what I wanted and let it write the code.)
- Gain practical experience with "vibe coding". - Recently, I heard about how being a good craftsperson is about knowing how to make the best use of the tools available, so I thought it good to gain more personal experience.
I eventually got enough of a solution working that I feel my itch has been suitably scratched. If you want to see the current state of the code, it's at https://github.com/mrlacey/MattsPullRequestHelper
A good example of what it can do can be seen in this PR (created while debugging).
What follows are some miscellaneous notes and observations from the process.
- VSCode was not as productive for me as VS. Some of this may be down to muscle memory and there not being things I'm used to. I'm not big on command lines, and would much rather work with a GUI than have to keep switching to the terminal. There were also editor features, extensions, and tool windows that I really missed. The biggest were probably the Test Explorer window and the Live Test runner not being in VSCode.
- Copilot was too happy to give the equivalent of a vague solution to a request. Take, for example, the code it initially generated to get the names of the files that had been changed as part of the PR.
How is this useful?
I asked it to write the code to get the list of files, and it just produced a placeholder.
This reminded me of a time when I was interviewing someone for a job. When asked how they would solve a technical problem, their answer was that they "would write an algorithm." When pushed for details they couldn't give any. Yes, Copilot reminds me of a less experienced developer who knows the right words but not how to do the work.
- Knowing how to phrase things so that it produces what I want/need will take time. There were points where I thought it would be quicker to write something myself rather try and work out how to phrase a request so I got what I wanted. But, I pushed through and in most cases got there.
- When I asked it to fix problems, it looked to address the symptoms rather than the cause. If I didn't know that it was doing something bad/foolish, who knows what I would have ended up with or if it would have ever produced a working solution.
- There were a couple of times it was unable to fix an error and eventually gave up. What would someone who didn't know the programming language do here?
- It tried to do some things that are just fundamentally bad. A few I left it to see if it would fix them. It didn't. I should probably go back and address them if I want to maintain this code.
- I did have to go to the docs to find solutions to some challenges. I reached a point where I thought it would never get it itself. Maybe this was because of dated training models. Maybe it just wasn't capable of doing some things. Maybe I just couldn't explain it in a way it could understand. Maybe it was just a result of the randomness in the response. Perhaps it would have got there if I kept trying long enough.
- It struggled with version numbers and compatibility issues. It initially created the two projects targeting different .NET versions. - This is the kind of abstract, high-level knowledge that is missing. Or just different from a "real" person.
- Some of the justifications for the changes it made and recommendations it gave were just wrong or straight-up anti-patterns. I'm glad I knew this and could avoid some of the bad code it produced.
- Working with regular expressions was a lot easier than "normal". Especially when I had appropriate tests in place to verify that the changes it made were correct/appropriate.
- Being able to say "update the code to make the test all pass" and it actually working was something I've wanted for years. (I tried with LLMs in the past but never got there.) To have it working was kinda magical.
- The tests it produced were rubbish. They were less than worthless and more likely to cause problems in the future than help avoid them. I'm concerned for the people who see AI as a way to avoid having to write tests themselves. I see the opposite. Developers being able to write tests is going to become more important as they'll need to verify that the code does (and continues to do) what it's supposed to. Writing a load of test cases and then having the tool write the code, which I would then verify, felt very productive. I'll definitely do this more in the future.
- The generated code was not as well structured or tested as I'd like, or if I'd done it myself. Part of me thinks this could be a problem, but I'm also conscious that I might be in a similar position if another person had written the code. Apart from the testability issue, I want to say this isn't all that important. Having working and maintainable code is more important than it looking a certain way.
- On several occasions, it rewrote files and removed changes I'd made to what it did previously. This included removing comments I added and re-adding incorrect and unneeded code that I had removed. I think it's going to be useful to make commits before asking it to change the code, so I can more easily track anything it might change that I miss.
- Reviewing changes, including spotting small but important details amongst a large change is going to be increasingly important. I will benefit from more tooling to do this. This doesn't just apply to formal code reviews.
- It can easily introduce broad changes to the code base, and this makes differences hard to spot.
- It would sometimes create duplicate code. Sometimes within methods.
- The most frustrating of the unrelated changes that it would make would be changing the accessibility of methods and classes. It would create them as private. I'd make them public so they were easier to test. It would then revert them to private and stop the test from compiling.
- There were many times it broke stuff - if not for my tests, I wouldn't have known and could have easily wasted lots of time going round in circles making changes and introducing regressions while fixing other things.
- It really made me wonder how some developers can be productive without a large suite of high-quality tests. They must find development slow and error-prone. (Oh yes, many do!)
In the end, I felt that I'd gotten about as far as I would have in the time, even if I hadn't used Copilot. The big difference is that I was spending my time having to think about different things. There was a lot more time thinking about other code (as written by Copilot) and trying to understand it. This offset some of the time I would have spent looking at docs (it managed to do some things for me that I didn't have to look up).
Some of this time was also because I was doing something new for the first time. Most coding starts with an existing code base, and so that might not be true in future. I certainly feel that it could definitely save me more time in future.
Takeaways
- [Full] Visual Studio (rather than Code) is so much more productive for C# development.
- Copilot (especially in agent mode) has the potential to be useful to any/every developer.
- It really feels like working with a less experienced developer who doesn't always listen to (or ensure they fully understand) the requirements before producing a lot of code.
- I was reinforced in my impression that the need to carefully review changes will become even more important in the future. Having additional tools to help with this (both powered by AI and not) will be helpful.
- Having high-quality tests (not those written by Copilot) and good code coverage will become increasingly important.
- I'm sure there is more that I can do/learn to get better at using and configuring the agent to do more of what I want the first time.
Now, there is another project I've been thinking of experimenting with that might be even more of a challenge for Copilot. I'm now more inclined to experiment with this soon...
0 comments:
Post a Comment
I get a lot of comment spam :( - moderation may take a while.