Learnings in 2025
2025 was yet another year where the conversations on Generative AI continued to make a lot of noise. Newer versions of the frontier models were released, and software engineers got better tools—or shall I say, everyone else got tools to write software without software engineers. The debate on AGI raged on, and concerns around the AI bubble have started becoming louder. I will restrict my notes to what I read and what I saw on the ground, garnished with some of my observations. Each of the topics below deserves its own post (or multiple posts), and I plan to be more regular in 2026.
Confusion in defining the components of the AI ecosystem
A lot of new terms have come up in this fast-evolving AI ecosystem, and often they are a cause of confusion. As everyone races breathlessly to build their own assistants, teams of agents, custom models, and platforms, I expect we will eventually have much cleaner definitions. Just to share a sample of the kind of questions that confuse everyone:
- Are AI agents the same as Frontier Models?
- What constitutes an AI agent? Is it just a call to a Frontier Model, or is it a combination of Model, RAG, and Tool Calling?
- What is the difference between an AI Agent and an Agentic Application?
- What is the difference between Skills, Tools, and the Model Context Protocol?
- What is the difference between Agent Runtime, Agent Frameworks and Agent Harnesses?
An agentic application is now composed of multiple components. This includes prompts, context, tools, memory, guardrails, and file systems. Everyone today is trying to sell that One AI Platform to rule them all. Based on my own experience building applications that leverage some or all of these components, and based on the online chatter, having a good understanding of the basic building blocks is a must to build production-ready applications. It is all the more important before investing in a platform that provides “all the capabilities.”
Another topic of conversation is building production-ready applications. Questions that everyone is talking about are:
- How do I test my agents?
- How can I ensure my agentic/LLM-based application has a sub-second response?
- What does it take to deploy to production?
AI coding tools (more on that in the next section) have made building POCs cheap. It is easy to build POCs that are mostly CRUD or make an API call. However, best practices around building applications that rely on Frontier Model APIs and are expected to behave consistently are still a work in progress. Though most of us are not training the models from scratch, treating agentic/LLM-based applications like any other ML model deployment ensures that we follow the fundamentals of model testing by building the ground truths, baselines, and metrics (something that is not obvious while generating the POC code via AI coding tools).
Another emerging perspective is the idea of long-running agents—given the new agentic tools and model response times, it is debatable whether every AI application should be held to high standards of sub-second response, or if the baselines should look at human performance in some of those use cases.
To practitioners and developers, the technical answers to these questions are clear. I expect 2026 should see more streamlined discussions and implementations on applying Generative AI models and tools to drive business outcomes. As Andrej Karpathy said, 2025-2035 will be the decade of agents, and I expect incremental improvement with interesting peaks and troughs along the way.
On Software Engineering
The one place where frontier models have had a big impact is in Software Development. There is absolutely no doubt that building POCs or starting new projects is much easier with the help of Coding IDEs such as Cursor and Antigravity, or coding agents such as Claude Code, Codex, Amp, or OpenCode. Questions around the usefulness of the generated code haven’t gone away, and in my conversations with my peers and team members, there is still a lack of trust in relying on AI coding tools to generate production-grade code. A lot of folks in my conversations do not want AI coding tools anywhere near an existing large codebase. However, some of the anecdotal comments from experts on X seem to hint that this may change.
The X debate continues to rage on with a lot of tall claims, pushbacks, and “wow” moments. In my view, Big Tech is investing considerable resources in improving both the models as well as the tools for code generation. The AI-led approach to software development is here to stay, and being proficient in using these tools will be a required skill in the future. Engineers or engineering teams need to learn to handle the non-determinism inherent in the output of these models and build best practices to handle hallucinations and runaway changes to a large codebase to finish a task. It will become all the more important to build robust tests, and best practices will include shared prompts/AGENT.md files to ensure that commits from different team members (some of them potentially being AI agents) follow the same coding guidelines.
Improvements in these tools have made a lot of folks (including me) anxious about the future of software development. There is definitely an impact on the productivity of engineering teams, and it is not uniformly positive. There is news on team downsizing, hiring freezes, and a bleak future for CS grads. I would avoid being a fortune teller and let the future play out. I have mixed opinions on what possibly can happen, but I will leave this section with questions that I think of everyday:
- Will the Jevons paradox be applicable here, and will we see demand surge as software development becomes more ubiquitous and faster?
- Will we need software engineers in the future? Or will the definition of a software engineer change as these tools become more mature? Two experts whom I look forward to listening to more on this topic in 2026 are Grady Booch and Martin Fowler.
- A lot of software development was the sheer grit of solving complex problems by understanding the interplay of computers, the internet, and software. The more someone spent time understanding the nuances, nuts, and bolts to solve a problem, the better they became at software development. Thanks to open source, all one needed was a computer/laptop and the internet to learn programming. The IDEs were freely available, but that has changed. There are multiple subscription levels to coding tools such as Claude Code and Codex, and the more tokens one can spend, the empirical observation is that the coding tools tend to perform better. Does this mean that an average developer like me who has $200 per month to spare (I don’t) can outdo an expert?
On AI slop
AI slop in images, videos, and online articles has become a huge concern. However, I see another version of AI slop that everyone will eventually have to deal with on a daily basis. This AI slop will be in the form of emails, documents, and code that will be exchanged without much thought. I am quite sure that everyone would have already seen some version of this where they have received what clearly looks AI-generated, too generic, and half-baked with a lot of frivolous words. As output becomes cheaper, we run the risk of getting overwhelmed with AI slop everywhere. Everything from code reviews to team communication will need guardrails to ensure that it does not impact productivity negatively.
A recent update on the arxiv guidelines for submission in the CS category and an acknowledgement of increasing AI slop in science paper submissions are clear signs of things to come. A few thoughts that occupy a share of my mindspace are:
- Assuming good intentions, is this a temporary issue? Just as word processors made it easier to write content and undo mistakes, will we see folks becoming better at using AI to generate better-quality output?
- There are now AI code review tools that review the code generated by another AI tool. How reliable is that approach?
- There have always been experts who review every word/sentence or every line of code to ensure the correctness of a document. Will that be a thing of the past?
- As it becomes easier to generate more output, what downstream impact will it have on teams/systems whose responsibility it is to ensure the correctness of the output or are dependent on the output?
The online chatter on AI slop belongs to one of the following categories:
- AI slop is ok as long as the shipping velocity is high.
- AI slop can be tackled with real taste, and that needs experience/talent.
- AI slop is the blocker for adoption, and only when the vendors solve the AI slop problem will the tools be ready for actual use.
I expect that we will tumble along to find approaches that improve the output of AI tools and new ways of handling the still-existing AI slop.
My wishlist for 2026
- As coding tools improve, I would like to see studies that provide a critical view of their usage in existing large codebases.
- Keeping the AI hype aside for a minute, we have been using AI capabilities for quite some time. All our mobile phones are equipped with state-of-the-art Machine Learning models to process images and audio. Machine Learning models have been used heavily in modern scientific work and continue to have an impact on ongoing projects. I did see some news on the impact of Generative AI in science breakthroughs, but I would like to see a more comprehensive study on the exact contribution.
I sign off with the hope that we get more significant breakthroughs in frontier models as well as breakthroughs using those frontier models in 2026.
Enjoy Reading This Article?
Here are some more articles you might like to read next: