On Word Calculators
The release of GPT-3 marked a transformative moment in computer science with the advent of a tool capable of summarizing, analyzing, and predicting text completions with unprecedented accuracy. This model generates text that closely mimics human dialogue, achieving a level of coherence and sophistication previously unimaginable. However, it's crucial to understand that autoregressive transformer models, like GPT-3, are not true intelligences but sophisticated word calculators.
Just as calculators excel at arithmetic without being mathematicians, LLMs are adept at word manipulation but lack genuine understanding. They are powerful tools for abstract language tasks but fall short in areas requiring general reasoning. While they perform well in pattern matching, linguistic manipulation, and style transfer, they struggle with long chains of deduction and reasoning about real-world contexts. They lack the fundamental common sense reasoning capabilities that humans take for granted.
LLMs do not possess consciousness, understanding, or beliefs. They operate by analyzing data patterns and generating outputs based on statistical correlations, without any internal understanding of truth or knowledge. This lack of internal understanding aligns with Harry Frankfurt's concept of "bullshit" as described in his essay On Bullshit. Frankfurt argues that a bullshitter is indifferent to the truth and focuses instead on producing persuasive or authoritative-sounding statements without regard for accuracy. Similarly, LLMs generate text that may appear authoritative but lack a genuine conception of truth or justification. They can accidently produce correct guesses but do not have the capability to know or justify these guesses as true or false.
Knowledge is justified true belief and LLMs fundamentally lack the machinery to have true beliefs. LLMs are extremely high-dimensional curves fits againt very large data sets. This is not to belittle the achievement of creating such a tool, but it is important to recognize the limitations of how far we can get with such high-dimensional curve fitting. This is very good for producing unlimited amounts of stochastic bullshit, but it's important to understand that is what it outputs.
However, stochastic bullshit engines can still be incredibly useful word calculators when applied tactically. There may even be vast economic value in being able to produce large amounts of high quality bullshit on demand, as a non-trivial amount of our economy is predicated on bullshit jobs. But there is no path from word calculator to general intelligence without a world model that can incorporate an internal conception of truth. And autoregressive transformers in their current form cannot do this. Maybe future systems, ones not entirely based on autoregressive transformers, will be able to, but as of today they cannot.