Proper Nouns: A Case Study in Agile LLM-assisted Development

I’ve released version 1.0.1 of the proper nouns library.

This is a new free-as-in-speech Java library that I wrote — Well I sort of wrote it. Truthfully GitHub copilot and whatever LLM model is sitting behind it wrote quite a bit of it. But anyway proper nouns is a library I wrote to scratch an itch. You feed it a word, and the library tells you if the word is very likely to be a name and very unlikely to not be a name so, for instance, it will tell you that Robert is a name and April is a name, but it will not tell you that Dawn is a name because dawn is also commonly used as a simple noun in English. It will tell you that Smith is a name because although smith is a perfectly valid common noun, it’s far more commonly seen as a name in the 21st century.

I wrote this library to be the simplest thing that could possibly work. It has one public method that returns true or false. It does not recognize all names, though it does recognize a very large number of them. It recognizes names in multiple languages including French, German and English. It does not recognize names from scripts like Arabic, Hebrew, and Chinese that don’t use upper and lowercase. The use case for which I created this was to determine whether a word that would otherwise be lowercase might need to be capitalized because it’s a proper name, so that’s what it does.

There are of course many other uses for a name detection library that require more finesse, maybe some sort of probabilistic rating of whether a word is likely to be a name. There might be a need for a library that checks more than a single word, or that considers the human language the string is written in. However, none of that was anything that I needed right now or would obviously need in the near future. For my purposes a simple list of names and a couple of characteristics was more than sufficient. So that’s what I shipped.

Of course, if more uses are discovered later, and someone is willing to contribute the code or the resources to implement further functionality, I can certainly consider it, but I didn’t want to build a gold plated system that did so much more than I actually needed and would take longer to finish than it was worth. It was much more helpful to ship sooner with basic functionality than take a very long time to create an absolutely perfect system that probably isn;t even possible. This is very much an example fo not letting the perfect be the enemy of the good.

What made this project much simpler and easier to do create than I could have done a year ago was GitHub copilot. While there were a few places where copilot got confused, or went off on a tangent and had to be manually corrected, most of the time I could just assign an issue to copilot as an issue and let it write the code.

None of this was anything that I could not have written on my own. However, it’s not particularly inefficient or effective use of my time to set up yet another maven project with yet another do get ignore file and yet another read me and get another releasing instructions for maven Central and all of the usual boiler plate. Copilot can very easily and very quickly create a lot of of that. Copilot can also create a lot of code and implement methods and add features and add tests. I didn’t vibe code this or, more properly, I didn’t one shot it. I didn’t just tell copilot to give me a library that would check to see if a string was a proper name or not. I broke the design up into individual issues that I gave copilot one at a time. (Mostly one at a time; it is possible for copilot to work on several independent tasks at once.) Then I reviewed the copilot code. Occasionally copilot would give me an initial PR that was good enough to commit. More often it took a few rounds of cold review, much like working with a junior developer. GitHub calls, copilot “your pair programmer”, but I wasn’t really pairing with it. It was more like I was assigning tasks to a junior developer and then reviewing them. That’s what coding with copilot feels like.

When I started this project, I initially tried to create it with Kiro using spec driven development. However, Kiro came up with a design that was way more complex than I needed. It would’ve been a lot more work to implement even with LLM assistance. It didn’t just give me a simple boolean answer of whether given string was a proper name or not. It output a probability. It wanted more input information than the string itself. It asked for a lot of factories and extensibility. Kiro’s spec was way over engineered for the purpose. It was far beyond the simplest thing that could possibly work. Instead I threw all that away, and used copilot to draft a very basic application. Of course, it helped that I had very clearly in my head exactly what I wanted, which was a single method in a single class that takes as input a string and returns true only if it’s pretty likely to be a proper noun. I also had some inkling when I started of the basic algorithms I would use. I discovered a few more along the way.

Is the library perfect? No, is it finished? No. Is it useful and shipped today? Yes. That’s an important part, maybe the most important part, of agile development, whether you’re developing with an LLM or not. Do what you need. Do what you need now. Get version 1.0 into production fast. And then iterate.

Even the product plan lists 1000 useful features, a journey of 1000 features still begins with a single unit test. Ship the first feature that delivers value as soon as you can, and then add the other features as time and customer demand suggest. And of course surprisingly often you will discover that those other 999 features you’ve planned? You don’t need about 875 of them, but there are another 52 that are far more important than anything you thought of in the first design.

Leave a Reply