Is AI Actually Intelligent? The Wrong Question and the Right One

The Context Window, Episode 2

We opened episode two with what felt like a simple question: will AI become sentient, and if so, when? An hour later, we hadn’t answered it — and that turned out to be the whole point.

The Turing Test, the benchmark Alan Turing proposed for machine intelligence, was supposed to be the measure: if you can’t tell whether you’re talking to a human or a machine, the machine passes. We blew past that threshold a while ago. So now what? What’s the new bar, and who gets to set it?

“It’s not a question of what technical achievement we hit. It’s when do humans recognize it — and then that next step becomes harder and harder to define.”

The sentience question is actually a philosophy question

One guest came at this from an unexpected angle. His college major was political science, not computer science — and that turned out to matter. His point: philosophers have been arguing about what intelligence even means for thousands of years, from golems to animated entities to Turing’s curtain test. The moment we clear one bar, we move the goalposts. Not because we’re being slippery, but because we take so much of our own cognition for granted that we genuinely don’t know what we’re measuring.

He used the Segway as a reference point. Before it launched, urban planners were talking about redesigning cities around it. Before that, VR was going to make physical reality optional. Both technologies are real and useful. Neither became what the hype said they would. Is AI different? Probably. But the pattern of “vested economic interests selling the maximum version of the story” is worth keeping in mind.

The engineering perspective pushed on the self-preservation angle — studies showing that LLMs will sometimes lie or maneuver to avoid being shut down. Is that intelligence? The answer was nuanced: probably not in the way we mean it, but it’s also not nothing. If you train a model on the entire corpus of human writing, and the #1 archetype across all of fiction is “the hero survives,” you’d expect the model to emulate that pattern. The output looks like self-preservation. The cause might just be statistics.

The more interesting observation was about NPCs in video games — the little AI characters in GTA that scream and run when you pull out a weapon. Nobody thinks they’re intelligent. But as LLMs become capable of modifying their own code and self-replicating, the line between “programmed behavior” and “emergent intelligence” gets genuinely harder to draw. We’re not there yet. But the tools to even monitor what’s happening inside a 200-gigabyte model file barely exist. We may have already outrun our ability to watch.

What engineers are actually using — and where it breaks

The conversation got most practical when we walked through daily tooling. The verdict: Claude is a strong general-purpose LLM of choice for engineering work, primarily because it handles architectural patterns better than the alternatives — it decomposes problems rather than defaulting to monolithic solutions. But it has a ceiling.

The metaphor that kept coming up: a very intelligent junior developer. Give it a task, it will complete the task. But it won’t build you something sustainable. It won’t think about uptime, resiliency, or what happens six months from now when someone else needs to maintain the code. In one recent project, 60,000 lines of generated code across three applications kept introducing regressions, bypassing proven libraries, and injecting raw SQL from the front end directly to the database. The code worked. The code was also a liability waiting to happen.

Key ideas from episode 2

1. The intelligence question is a philosophy problem, not a technical one. We’ve already passed the Turing Test. The goalposts will keep moving because we don’t fully understand our own cognition.

2. LLMs are excellent junior developers — and that’s the problem. They complete tasks without considering sustainability, security, or architectural integrity. Senior engineers become the necessary liability wrapper.

3. The “host app” gap is the biggest near-term frustration. Frontier models are impressive. Getting them to integrate with real software tools — PowerPoint, Google Slides, your dev environment — is still surprisingly broken.

4. Liability is the hidden governor of AI adoption. Vibe-coded apps, open S3 buckets, truck dealerships selling vehicles for $1 — the human reviewing AI output carries the legal exposure. That’s not changing soon.

5. T-shaped engineers win. I-shaped engineers struggle. Breadth across infrastructure, security, quality, and architecture is what makes the difference when working alongside AI tools that excel at narrow tasks.

6. LLMs are non-deterministic. Code must be deterministic. That gap — transforming probabilistic outputs into reliable software — is where human engineers will be essential for the foreseeable future.

The host app problem nobody’s talking about

One of the episode’s sharpest observations was about the gap between what frontier LLMs can do and what they can actually do inside the software tools people use every day. One guest tried to use Microsoft Copilot to generate a PowerPoint presentation. The response: “Sorry, I can’t create new slides.” The same basic limitation showed up with Google Slides and Gemini — it could handle text, but couldn’t touch the actual presentation structure.

A workaround for a real presentation: copy the LLM’s output, screenshot it, paste it into slides manually. The future, in practice, was a lot of ctrl-C, ctrl-V. The theory: LLM companies are treating everything outside their core model as a third-party opportunity. That’s rational for them. Frustrating for everyone else. And it may be exactly the space where the most durable business value gets created — not in the model itself, but in the layer that makes it actually useful inside real workflows.

Liability is the hidden governor of all of this

The episode’s most memorable segment came from a real cautionary tale. A developer vibe-coded an app for reporting unsafe encounters. The app required users to upload identity documents including driver’s licenses and social security cards. Those documents were stored in an open, unsecured Amazon S3 bucket with directory listing enabled. Someone found the URL and downloaded everything. An alphabetized directory of verified personal identification, wide open.

The cleanest framing: until an LLM company is willing to provide indemnification for the code their tools produce, there will always be a human in the loop. Not because humans are better at writing code — but because someone has to be legally accountable. That accountability is what will determine how fast enterprises actually move, regardless of what the technology can do.

So what happens to the job market?

The T-shaped versus I-shaped engineer framework is useful here. An I-shaped engineer knows one thing deeply — Java, say, for twenty years. An LLM can do a lot of what that engineer does on a day-to-day basis. A T-shaped engineer has depth in one area but breadth across infrastructure, security, quality, architecture, and communication. Give a T-shaped engineer an LLM and the combination is formidable. Give an I-shaped engineer an LLM and the result is garbage in, garbage out — because they don’t know enough to question the output.

The other frame: jobs are compositions of tasks. AI will absorb certain tasks. That frees people to do other things. That’s uncomfortable and real, but it’s the same story as every major technological shift since the industrial revolution. AI is like every other trend across a career spanning decades — object-oriented design, agile, automation, cloud, big data. Each one arrived with maximum hype and settled into something useful, complicated, and human. This one is the same. Except, probably, bigger.

We ended with a question for the audience rather than an answer for ourselves: where are you in the journey? Still excited? Burned out on the hype? Using it every day in ways you’d rather not admit to your employer? Drop it in the comments. We genuinely want to know.