Large language models won't replace the programmers tomorrow

1. Introduction
2. The fickle genie
- 2.1. Full self-driving still needs a driver
- 2.2. Next token prediction is not thinking
3. Minor complaints
4. Conclusion

1. Introduction

Hi!

2. The fickle genie

Odds are, you're fuming with anger having clicked on an obvious clickbait, trying to capitalise on… shall we say… the mass disaster of the day. After all, there were many who underestimated the development of ML models, and there are some skeptical to this day. Some of the artists who were laughing years ago, are protesting today, boycotting the use of statistical models and generative "art". Still, there is a but.

LLMs generate numbers. Numbers can be text tokens, can be pixels, sound, 3d models, or they could be machine instructions including code and outputs of some compilers and systems like verilog. The efficacy of LLMs is predicated on the tooling. And, you see, programming is not just about writing code and knowing the proper names of certain things. Otherwise, compilers would have replaced programmers long ago. Oh no, programming is a language skill, coupled with some real-world experience, that would allow a programmer to spot a mistake in the technical specifications of the code that they are asked to write; more often than not the program does exactly what you asked it to do, but just like a fickle genie, it doesn't correct for "common-sense".

2.1. Full self-driving still needs a driver

The guiding principle of transformers and transformer-like models is next token prediction. Imagine, you remember a lot and speak long sentences without thinking much. That'll be rather similar to the results of the process that most LLMs utilise to generate the "magical" results. These methods can be weaponised to search for text, or they could be used for word substitution.

But, I argue, that's still far from "solving a problem"; it's not about building an action plan, executing the actions and tracking their flow, just stochastic parroting. It can mimic structures that resemble, or rather allow us to project these qualities, when in fact the underlying model has no concept/understanding of what a plan would even mean. Some may disagree and point to the concept of multi-headed attention¹ as a way of imitating a thinking process. I disagree, and as I argued earlier, this is a function of observer projection and not genuine procedural conceptualisation.

You indeed can ask an LLM to criticize your code or its documentation. A (relatively) fresh LLM that was trained on language code and documentation might even be helpful, but not to anyone beyond the first stages of learning, or deeply invested in seeing competence. You can query it for debugger interactions and it might help. There's also a chance that it is hallucinating, and it might lead you down a wild goose chace, though if it does, you can simply repeat the request. The system still requires a human to oversee the operation of the LLM. It's no auto-pilot if you still need a human to take over the driving, if the "full-self-driving" does something so stupid that it could have been caught algorithmically.

So LLMs are not as thorough, and cannot, at present, fully automate the existence of a human behind a screen doing the legwork. It didn't stop a marketing misnomer LLMs – "AIs" to have become popular. Let us suppose that all thinking is algorithmic (which is not a falsifiable theory), and also suppose that a thinking machine might think without having a model of the world explicitly, but implicit in its weights. Can LLMs answer questions on programming in languages that haven't been invented at the time of their training? Will they be as accurate? Can they work off of limited training data, as most humans do? It still doesn't cut it.

Rather than regurgitating our earlier discussion in the old CyberLounge, we will focus on newer developments. The conclusion hasn't changed, but the inputs might fool you into thinking that it did.

So without further ado.

2.2. Next token prediction is not thinking

To find out why, let's get meta: think about how we think. "I'd put this word here, because it looks relatively appropriate"? Maybe it's something I do occasionally when writing documentation in a foreign language, but not routinely. Our way of solving problems is spread over several distinct layers of abstraction, starting with concepts and ending in language. We descend the layers of abstraction from the conceptual stage, where we have plans and actions, concepts and inferences, we have what I would argue, thought in its purest and most abstract form. At this stage we can think in terms of vague pictures, or formulae, or even nothing explicit, nothing we can even verbalise. Case in point, Feynman was thinking with pictures, and his diagrams were helpful, but that abstraction was not a pre-existing statistical result that was extrapolated, but rather an emergent consequence of familiarity with Wick's theorem. LLMs have no room for such an abstract process, or at least, it's hard to tell. True, they could have this, but that is an extraordinary claim, backed by (at best) lack of any evidence to the contrary.

Too vague? Let's consider an example and assume you're making a small UI component, using HTML+CSS or some GUI library. The ML model would be able to generate the form code, but only sometimes and often semi-accurately. A week ago we, Greybeard, asked GPT-4 to write the code for a CSS-only image gallery. The code simply did not work. It was like a straight F student copying the work of a straight A student, without even remotely understanding what they were doing. It'd be an automatic fail for anyone reasonable, but let's indulge the "true believers".

Let's assume there's a tiniest bit of design involved: there are limited field widths and the designer decided to integrate the icons into the form fields. How would an LLM be sure that the result matches the criteria without seeing it? One could think of things like CLIPort that was made for robotic manipulators or LLaVA made to work with vision-related tasks, but modifying GPT to interact with it and to reason about the design is not the easiest task. As a human, and as someone who has worked with HTML for a very long time, given structure of the document, I can project what it would look like; I can do almost as well in my mind's eye, as the browser can in its canvas. The LLM, could in principle interface with the browser to render the results exactly, yet doesn't even "think" to do that. Predictably, it will often horrendously misinterpret the constraints, and sometimes ignore them completely.

Let's go further. A human can modify the page further, incrementally change the design. Can an LLM do the same? It could generate the code wholesale, but not make surgical adjustments: this would require the model finding precisely where to select the text and to then have an improved word mask model to alter the text at least slighlty more effecively than now. Using an LLM with a prompt fed to it to alter the same section will lead to multitudes of hallucination iterations to be handled, and it's not fun to handle whatsoever. The primagean demonstrated the problems in using GitHub Copilot. The LLM simply ignores some of the constraints in the video, it generated a frames-per-second where the time was measured in miliseconds. I know of some models that guess a masked word[^Bert_word-masking][^fill-mask], but doing the inverse with a set goal consistently? It's not impossible, but it may very well be tedious to tune. And maybe said models could be reused. Creating a corpus for these models is a massive work, and one should cover all edge-cases with many models. According to TIME, "OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic". Are there enough people to work on all of these tasks?

3. Minor complaints

It gets sillier! Often enough, LLMs simply stop writing the text and you need to make them continue from that point on manually! I haven't yet seen a cover-all method that allows LLMs to automatically start and stop, GPT-4 included. Maybe GPT-5 will do that? ChatGPT in particular sometimes breaks and writes the code after the highlight, so even if one had direct API access, weaponising this to replace an engineer would be a monumental task, defeating the original intention.

3.1. Some background on Neuroscience

Our brains remember related information, perform action selection² based on the outside context provided by our senses, while filtering inappropriate actions out. That's quite different in comparison to the LLMs, which, in turn, generate the most probable next token. Besides, the modern LLMs are limited by the data provided in training dataset: they don't retrieve new information³. We're still stuck with the machine learning methods that can't learn real-time, requiring the immense arrays of hardware to do the training. The popular ChatGPT failed to cobble up a word of a given length out of the letters I've picked, which Python (that is considered to be slow by many) does in less than a second on my cheap laptop. Several times in a row, because I wanted to be fair towards it and repeated my test. It's not a description for super (or human-level) intelligence, really.

Sure, you could make the argument that some vague future models might approach the problem better. I would revisit this discussion at that time, because right now, we are projecting superhuman intelligence onto a stochastic parrot. Plus, given the no-free-lunch theorem, if there ever will be an artificial general intelligence, it will have to be only partially statistical in nature. Plus, there's a good chance that by the time we have something like AGI, we will have deepened our knowledge and there's still something somewhere that the artificial intelligence does worse than a human (for one, our brains have exceptional power efficiency).

3.2. Ground for improvement

Now, let's talk about a thing to improve. LLMs need to be able to assess what they write. If an LLM writes five or seven-letter words when it's being requested to write six-letter words, it lacks an ability for self-assesment. If it can't plan to read code's files and pick the one needing change, it lacks planning. Planning does not require interaction with third-party systems, but that'll help. And yes, since your LLM isn't typically connected to the OS in some way, it won't interact with the project files or create a project for you. So no, LLMs won't replace the human programmer, not yet. They would needs more parts attached. It's not all doom and gloom, many are thinking about LLMs lacking in capabilities nowadays. There's the project Parsel project that partially addresses this problem. It is described as:

A framework enabling automatic implementation and validation of complex algorithms with code LLMs, taking hierarchical function descriptions in natural language as input"

While this sounds complex, Parsel solves an important task: generating the code from the natural language description using constraints.

We also need to feed data to somehow provide the context. The "Talk to papers" and "Talk to code" demos show us an important detail of the process: the use of text embeddings (vectors pointing to a message for a language model) to look up the related info. That is a small part, which would be quite important for navigating the source code of the project, although best combined with the other search algorithms.

Imagine we want our LLM to draw a form to input the bank account details. It will be able to do the basic one. It will be able to mock something using the Bootstrap CSS framework. It will not see anything, unless connected to another neural net that has such a modality. CLIP and other similar neural networks have the ability to connect text and images, often with limited resolution, and may help a bit already. The whole field advanced slightly with the multimodal neurons representing the concepts being located. Otherwise, I'd simply say our civilization just started tinkering with multiple modalities.

Now, we're getting to the interesting part. How does our system select actions? How does it even know what actions it can perform? Through some API bindings that allow it to work with a codebase? It's not even close to what LLMs currently have. There are many ML solutions for selecting an action, starting with the reinforcement learning agents and finishing with the exotic ideas like animats, though. There's even a SayCan assistant who has this exact ability. The problem here is that RL agents would perfectly know the possible actions, while it's more vague with the code.

And there's much more to machine learning than any large language model had achieved! LLMs are only a small part of what's being done, and not each part is easy to understand and appreciate. We're only starting and it's naive to assume we're going to get the complete imitation of our thinking or an improvement over it this decade.

OpenAI, the same company that created ChatGPT, made a great demo⁴ with robots and reinforcement learning, but people outside the company don't interact with those proprietary networks much, so the fate of this technology for now is to be seen as «fun videos on YouTube with robots playing hide and seek». That for a story, where robots learned how to use tools, something many biological species can't do!

Everyone is talking about ChatGPT, while the same company has GPT-instruct, that can learn on a set of ideas provided and has much less limitations as a less popular product. While one thing is being polished for the public use, a thing that'll give better results is discussed less! It makes me smile when a newbie does that, but when businesses change their strategies over ChatGPT while ignoring everything that was there before it, it is simply hilarious.

It is both amusing and bemusing to think that some people even consider replacing any part of their software engineering teams with "A.""I.". You see, if we approach this in the straightforward way, the very people who work with ML models should be replaced through the sheer amount of data available on ML code. But does the code itself represent the whole process here? Given how much is hidden in the dataset and the model configuration, I highly doubt it. The code is not guaranteed to be straightforward and have a good architecture, it is not even guaranteed to make much sense at the first glance, yet there is a place and time for "scientific style of programming", which we often see in ML. But let's not stop here and pick something much easier. Historically, code that writes code was called different names, for example, "symbolic regression" and "genetic programming". And heck, given how much goes into picking data and tuning the genetic programming libraties, I dream about it being automated. The code is short, usually representing some visualization and a config parser. And yet, each time there's still some small trick to the data, something to optimize. LLMs won't infer formulas and won't configure the Cartesian Genetic Programming systems to make some DSP filter for sound or images soon. For now, they'll help generate the glue code.

3.3. The way forward

Finally, the scientists are tinkering with the ideas, which may put those technologies in our homes, instead of the large research labs with massive funding.

Let's discuss something called a memristor or a memory resistor, starting from the basics. Normal resistors reduce the current flow in electronic circuits and do a lot more useful stuff by converting electric power to heat. So far it is not new, but at some point, the transistors appeared: something that acts like a resistor, but can be controlled by applying the electric power. Now, with the ability to make something complex, like logic gates, people tinkered with the technology more and more, made it smaller and smaller, integrated gates to complex circuits, and now we've got the powerful computers in our pockets. What crazy networks with miriads of parts can we expect from yet another «more complex resistor sibling», then? Memristors have a great potential for machine learning, because each of them has a way to store information, while resistance may be used to process it in analog way. This is quite similar to what neurons in our brains do. The progress of memristor development was partially parallel to the transistors, since the term was coined in 1971 by Leon Chua. I wanted to add one reference to a single-molecule memristor that can be inkjet-printed from an article, but now there seems to be more than one type, plus something that can be tuned by light and another, with a magnetic spin. More importantly, there's an article that tells about the on-line learning ability of the memristor networks now⁵. The memristors may very well provide us with an ability to train such networks at the leisure of our homes at some point in future. But for now, we've got the disconnected ML models doing some parts of the whole we need.

Besides the training, there are other companies having impressive results, for example, Optalysis. They're using the Fourier transform caused by an optical system to immediately perform ML inference tasks. In their article, "Optalysis and Fourier-based transformers", they claim that they were able to impressively accelerate the transformer inference. While it's nowhere near something necessary for training, these devices may soon be an amazing extensions for the workstations, and someday, home computers, also allowing us to run these networks locally. MythicAI had demonstrated a way to run ML tasks on a RAM chip, using its other properties. This can be an alternative to what Optalysis is doing with the Fourier optics.

4. Conclusion

We have demonstrated that at present, us meatbags can look forward to a new type of work, namely fixing what the LLM has generated, instead of writing it out ourselves. Human programmers will be a tad more productive, naturally this will not result in higher compensation. We live in a perverse world, and a 10x improvement in productivity won't make most software engineers 10x the pay, though it should, and under a different economic system, one the US had before 1972 it would.

The advent of LLMs will not reduce the amount of workplaces for people of the software-engineering bend. What it will result in, is you no longer having to write a dumb function to do something simple, but oversee that the function that was generated by the LLM isn't too dumb.

Fearmongering, and perverse incentives will make most script kiddies nervous, because what they need several hours to do, Copilot or ChatGPT will do in a fraction of a second. Guess what, there used to be a profession called "computer", where humans did computations by hand, something like figuring out what the $\log_{10} 3.1416$ is for some logarithmic table or slide rule, the kind of work changed, but a mathematical profession needed for automation never went away. Software engineering will likely rebrand to something else, but the people with particular skills and proclivities will find a position managing the automatic tools.

Footnotes:

Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017

Computational Cognitive Neuroscience, 4th Edition by R. C. O'Reilly, Y. Munakata, M. J. Frank, T. E. Hazy, & Contributors, "Chapter 7: Motor Control and Reinforcement Learning", "Basal Ganglia, Action Selection and Reinforcement Learning"

I hope that ChatGPT will use the results of the user's estimation as the training data, but we'll see.

⁴

OpenAI Plays Hide and Seek…and Breaks The Game! 🤖

⁵

"Memristor-Based Multilayer Neural Networks With Online Gradient Descent Training" by Daniel Soudry, Dotan Di Castro, Asaf Gal, Avinoam Kolodny, and Shahar Kvatinsky