The Unreliability of AI

Since the revolutionary public release of generative AI models, there has been an incredible wave of interest, hype, and expectations around their capabilities. Millions of people around the world are harnessing the powers of text-to-image and language models to explore new ideas and accomplish tasks that would have been even impossible in the past. These technologies sprung up so fast that we have yet to internalize the full extent of their possibilities, limitations, and dangers. And this is only the beginning.

We have all likely tried asking ChatGPT a myriad of questions, from simple queries to relatively complex ones. They are able to answer questions about history, summarize difficult concepts, solve basic cognitive problems, write comprehensive essays faster than humans, and even generate coding solutions. With such powerful and flexible use cases, people have started to dream up incredibly powerful applications, such as:

Q&A chatbots that can answer questions about PDFs and websites
Code generation platforms, legal document generators
Information serializers and summarizers
Copyrighting Tools
agents that can accomplish difficult tasks like booking flight tickets, conducting research, and planning your calendar
UI/UX design drafting

The commercial space has blown up with a massive tsunami of new startups looking to build applications using generative AI, and established companies are scrambling to integrate AI-based features into their existing products and platforms. There has been an incredible surge in excitement and anticipation within investors, with over 40 billion dollars¹ invested into AI in only the past 6 months. The new AI revolution easily surpasses the crypto hype and is becoming the greatest boom since the dot-com bubble. On your Twitter or LinkedIn feed, you’ve probably seen hundreds or even thousands of accounts posting about AI and its capabilities. Every day, there are dozens of new accounts posting their new demos on the website touting their new products. Despite the massive wave of new companies, very few have created a comprehensive product that is available to be consumed by the masses. Why is this?

The simple answer is that AI is unreliable. By only analyzing the best scenarios, it’s easy to think that AI can do anything. That’s why so many startups cherry-pick the examples that they show on Twitter and in their promotion videos– they are meant to show the best that AI has to offer. But, when you decide to sign up for their beta product, it either doesn’t exist, barely works, or you don’t even hear back from them.

When it comes to traditional products such as social media platforms, API back-ends, or video streaming services, you expect the product to essentially be reliable 99.9% of the time. Failures come and go, but they are typically minuscule or solvable within a reasonable timeframe. But, you never expect them to fail consistently– and big concerns and problems emerge when they do. The technology driving most products are predictable, well-tested, and reliable to a (mostly) known extent. On the other hand, generative AI is not an exact science. For example, language models can fail very consistently in predictable or even unpredictable ways. When they fail, it can be hard to know why they failed. And even if you can isolate the reason, it’s often difficult to find a fix. You’ve likely seen ChatGPT fabricate statistics or data, say something blatantly false, or just fail to give an answer. For example, here are some things language models cannot do well:

Solve complex problems with multiple layers and variables
Perform simple arithmetic
Reverse a word (try it)
Answer questions with information that is new or that it is unaware of

There is a layer of uncertainty and unpredictableness in AI that simply isn’t present elsewhere. When you go book an airplane ticket, it almost HAS to be 100% reliable. You click the airline you want, the flight you want, and the type of ticket you want. You make a reservation and you pay. However, if you ask an AI agent to perform the task, there’s a good chance it might not find the right airplane, accidentally books first class, or just fails outright. When drafting a legal document using some LawyerAI software (because lawyers are expensive), it might output some legal jargon or federal violation that isn’t even real. When it comes to these edge-case (and a lot of times, normal) situations, people cannot tolerate this level of inconsistency.

This barrier is likely what most AI companies (including the startup I am working at) face when creating a new product. This is by no means an impossible obstacle. AI engineers are working every day in order to minimize errors, hallucinations, and variability in AI generations. There are strategies like chain of thought to increase language model reliability. Researchers are building new and more powerful and ethical models with safeguards to decrease bias and improve accuracy. However, it is a very difficult task to get a product to a level that is suitable for production. Anyone can have an idea, but it takes a lot of effort to make that idea a reality. But, we’re very early in the new AI revolution. The hope is that as the technology advances, we can get to a point where AI can be seen as reliable as traditional mechanisms that governed the past.

Krytal Hu, from Reuters: “Venture capital funding plunges globally in first half despite AI frenzy”. https://www.reuters.com/business/finance/venture-capital-funding-plunges-globally-first-half-despite-ai-frenzy-2023-07-06/ ↩︎