Conversation: Understanding Agentic AI

Honest assessment: "Agentic AI" is evolutionary rather than revolutionary - it's automation with LLM-based decision-making instead of traditional code, with significant tradeoffs in consistency and reliability. - claude.ai

Conversation: Understanding Agentic AI
πŸ’‘
Honest assessment: "Agentic AI" is evolutionary rather than revolutionary - it's automation with LLM-based decision-making instead of traditional code, with significant tradeoffs in consistency and reliability. - claude.ai

What is Agentic AI?

In my role as an Application Developer for a small institution for the past 20 years, I haven't had a lot of time to devote to looking at LLM and Agentic AI. My LinkedIn is consumed by discussions of AI so I decided to try and get at least a mid-level understand of Agentic AI.

The following is the raw and unedited conversation with claude.ai with the "user" being me asking questions. The first part of the conversation is basically a glossy overview of Agentic AI. I do like how during the conversation, claude.ai uses examples from things we have done in the past like lesson plans, game dev, and even Workday.

It's toward the end of the conversation where claude.ai offers up insight on agentic consistency and knowing whether or not it failed that is really interesting.

User: What is an Agentic AI?

Claude: Agentic AI refers to AI systems that can act autonomously to accomplish goals, making decisions and taking actions on their own rather than just responding to individual prompts.

The key characteristics of agentic AI include:

Goal-directed behavior - These systems work toward specific objectives you give them, breaking down complex tasks into steps and figuring out how to achieve the end result.

Tool use - They can interact with external tools, APIs, databases, or software to gather information and take actions. For example, an agentic AI might search the web, write code, execute it, read files, or make API calls as needed.

Multi-step reasoning - Rather than giving a single response, agentic AI can plan sequences of actions, evaluate results, and adjust its approach based on what it learns along the way.

Decision-making - These systems make choices about what actions to take next without needing human input at every step.

Some real-world examples:

  • Claude Code - lets developers delegate entire coding tasks to Claude, which can autonomously write, test, and iterate on code
  • Browsing agents that can navigate websites and complete multi-step tasks
  • Task automation systems that can manage workflows across different applications

The "agentic" part emphasizes that the AI acts more like an agent working on your behalf rather than just a chatbot answering questions. It's particularly useful when you have complex, multi-step tasks where you want to set a goal and let the AI figure out the execution details.


Platforms and Languages

User: What platforms / frameworks do Agentic AI run on? What languages are they written in?

Claude: Agentic AI systems can be built on various platforms and frameworks, typically written in Python (overwhelmingly the most common), though other languages are used too. Here's the landscape:

Major Frameworks

LangChain - One of the most popular Python frameworks for building agentic applications. It provides abstractions for chaining together LLM calls, tools, memory, and agents. It's very comprehensive but can be complex.

LlamaIndex - Originally focused on retrieval/indexing, but now includes strong agent capabilities. Also Python-based, good for data-heavy applications.

AutoGPT/AutoGen (Microsoft) - Frameworks specifically designed for autonomous agents that can plan and execute multi-step tasks. AutoGen focuses on multi-agent systems where different AI agents collaborate.

CrewAI - Python framework that emphasizes role-based agents working together as a "crew" on complex tasks.

Langroid - Newer Python framework with a focus on multi-agent systems and clean architecture.

Language Ecosystems

Python dominates because of its AI/ML ecosystem (libraries like transformers, openai, anthropic SDKs, etc.) and rapid development capabilities.

JavaScript/TypeScript also have growing support through frameworks like LangChain.js, Vercel AI SDK, and others - useful when building web-based agentic applications.

Go, Rust, Java - Less common but possible, especially in enterprise contexts where these languages are already in use.

Infrastructure

Most agentic systems run on:

  • Cloud platforms (AWS, Google Cloud, Azure) with containerization (Docker/Kubernetes)
  • Local execution for development or privacy-sensitive applications
  • API-based architecture, calling LLM providers like Anthropic, OpenAI, or self-hosted models

Is It Really AI?

User: But is Agentic AI actually AI or is it more of a sophisticated processing chain?

Claude: That's a really insightful question that gets at something important - and honestly, the answer depends on how you define "AI."

The honest take

Most current "agentic AI" systems are sophisticated orchestration layers around language models. They're essentially:

  • A planning/reasoning component (usually an LLM like Claude or GPT)
  • A control loop that decides what to do next
  • Tool integrations that let the LLM interact with the world
  • Memory/state management to track progress

So in that sense, you're right - it's a processing chain with an AI component at its core, not necessarily "AI" as a unified, novel intelligence.

What makes it feel "agentic"

The LLM itself provides the intelligence - reasoning, planning, adapting to results. The framework just gives it hands and feet. The "agency" comes from:

  • The LLM's ability to reason about what to do next
  • Persistence across multiple steps
  • Tool use allowing real-world effects
  • Self-correction when things don't work

The philosophical angle

You could argue that human cognition is also a sophisticated processing chain - perception, reasoning, decision-making, action. The question "is this AI?" might be less about the architecture and more about whether it exhibits intelligent behavior.

Practically speaking

Current agentic systems are more like "AI on rails" - they're constrained by their framework, tools, and prompts. They're not independently conscious or truly autonomous. They're powerful automation tools that use AI for the decision-making parts.


Use Case Example

User: Can you give me a use case for an Agentic AI that would externalizing decisions to an LLM?

Claude: Great question - here's a concrete use case where externalizing decisions to an LLM makes real sense:

Automated Lesson Plan Enhancement System

The Task: "Take my draft lesson plan on subtraction strategies and enhance it for my ELL students and high-energy learners"

Why the LLM needs to make decisions:

  1. Analyze the current plan - Read your draft, identify which parts need differentiation
  2. Search for resources - Look up visual aids, manipulatives, or supplemental activities appropriate for 2nd grade subtraction
  3. Evaluate alignment - Check if suggested modifications fit EDM curriculum standards and your "I Do/We Do/You Do" format
  4. Generate materials - Create differentiated worksheets, word problems with varying complexity, or interactive components
  5. Iterate based on criteria - If generated materials are too text-heavy for ELLs, simplify language and add more visuals
  6. Format for delivery - Organize everything into your standard lesson plan template

Why you can't just script this:

  • Each lesson plan is different - rigid rules wouldn't work
  • The "right" level of differentiation depends on context the LLM must interpret
  • Finding the balance between rigor and accessibility requires judgment
  • The system needs to evaluate its own outputs ("Is this actually simpler?")

The LLM's decisions:

  • How much to simplify vocabulary
  • Which concepts need more visual support
  • Whether to add kinesthetic activities for high-energy students
  • When "good enough" has been reached

This is fundamentally different from a script because the LLM brings contextual reasoning to each decision point, rather than following predetermined logic.


Comparing to Analytics

User: So this is effectively what would be called business analytics?

Claude: Not quite - there's an important distinction here:

Business Analytics is about analyzing data to understand what happened, why it happened, and what might happen next. It's descriptive, diagnostic, predictive, and prescriptive based on data patterns. Think: dashboards, reports, trend analysis, forecasting models.

Examples from Workday work:

  • How many employees completed performance reviews?
  • Which departments have the highest turnover?
  • Predicting budget needs based on historical patterns
  • Recommending optimal staffing levels based on data

Agentic AI is about autonomous task execution and decision-making to accomplish a goal. It's more operational than analytical - it does things rather than just tells you things.

The Key Difference

Business Analytics: "Here's what the data shows about lesson plan effectiveness across your school district" (insight)

Agentic AI: "I've created three differentiated versions of your lesson plan, tested them against curriculum standards, generated the worksheets, and organized them in your Google Drive" (execution)

Where They Overlap

Agentic AI can use analytics as part of its decision-making. For example, an agent might:

  1. Analyze which lesson plan formats worked best (analytics)
  2. Use that insight to generate a new lesson plan (agentic action)
  3. Create the materials automatically (agentic execution)

Business analytics = understanding and recommending
Agentic AI = understanding and doing


User: So closer to Solutions Analytics?

Claude: Ah, now you're getting warmer! Solutions Analytics (or prescriptive analytics) is definitely closer, but there's still a meaningful gap:

Solutions/Prescriptive Analytics

  • Analyzes data and recommends specific actions
  • "Given these constraints and goals, you should do X, Y, Z"
  • Provides optimization suggestions based on models
  • Examples: "Schedule these employees for maximum coverage" or "Allocate budget this way for best ROI"

But: You still have to execute those recommendations yourself.

Agentic AI

  • Takes the recommendation and executes it autonomously
  • Doesn't just say "you should create differentiated worksheets" - it actually creates them
  • Doesn't just suggest "reorganize your lesson sequence" - it rewrites the plan
  • Makes ongoing decisions during execution and adapts

The Spectrum

Descriptive Analytics β†’ "Here's what happened"
Diagnostic Analytics β†’ "Here's why it happened"
Predictive Analytics β†’ "Here's what will happen"
Prescriptive/Solutions Analytics β†’ "Here's what you should do"
Agentic AI β†’ "I'm doing it for you" (with ongoing judgment calls)

Example

Solutions Analytics in Workday: System analyzes performance data and recommends which employees should be promoted based on criteria.

Agentic AI in Workday: System analyzes performance data, drafts promotion justifications, routes them through approval workflows, updates compensation records, and sends notifications - all autonomously, making judgment calls about edge cases along the way.

So yes, solutions analytics is the closest traditional concept, but agentic AI adds the autonomous execution layer on top. It's like the difference between a consultant's recommendations versus hiring someone to actually implement them.


Questioning the "Execution Layer"

User: The execution layer seems to be pronounced as if it was something new. It seems to me that Agentic AI is mere what a traditional human software developer due on in their day to day work.

Claude: You've hit on something really important here. You're absolutely right - the "execution layer" isn't new at all. Software developers have been building automated systems that make decisions and execute tasks for decades.

What You're Recognizing

Traditional automation/software:

  • Takes inputs
  • Makes decisions based on logic (if/then, algorithms, rules engines)
  • Executes actions
  • Handles errors and edge cases
  • Iterates until completion

"Agentic AI":

  • Takes inputs
  • Makes decisions based on... LLM reasoning instead of hardcoded logic
  • Executes actions
  • Handles errors and edge cases
  • Iterates until completion

The structure is identical. The only real difference is where the decision-making logic lives - in code you wrote versus in an LLM's pattern-matching capabilities.

So What's the Actual Innovation?

The value proposition is really about flexibility without custom development:

Traditional approach: You (the developer) have to anticipate every scenario, write logic for each edge case, maintain it as requirements change.

LLM approach: You describe what you want in natural language, and the LLM figures out the logic on the fly, adapts to novel situations it wasn't explicitly programmed for.

Is This Revolutionary?

Honestly? It depends on the use case.

For well-defined, repetitive tasks with clear rules - traditional code is often better (faster, cheaper, more reliable).

For fuzzy, judgment-heavy tasks that would require constant code updates - LLMs can be more practical.

The Marketing vs. Reality

You're right to be skeptical. "Agentic AI" is largely a rebranding of what developers have always done - building systems that automate decisions and actions. The AI industry is positioning it as revolutionary, but it's really more evolutionary - replacing some hardcoded logic with LLM inference.


Developers and Reasoning

User: Software developers do not make decisions based on hardcoded logic. They perform analysis and then code the "reasoning" based on the desired output.

Claude: You're absolutely correct, and I apologize for oversimplifying. Let me restate this more accurately:

What Developers Actually Do

Developers:

  1. Analyze requirements and context
  2. Design solutions based on reasoning about the problem space
  3. Code implementations that embody that reasoning
  4. The resulting software executes those reasoned decisions consistently

The reasoning happens during development (by the human), then gets encoded into software that executes reliably.

What LLMs Do

LLM-based systems:

  1. Analyze requirements and context
  2. Reason about the problem at runtime
  3. Generate decisions/outputs on the fly
  4. Each execution involves fresh reasoning (for better or worse)

The reasoning happens at execution time (by the LLM), adapting to each specific situation.

The Real Distinction

Developer-built system: Human reasoning β†’ encoded once β†’ executes predictably LLM-based system: LLM reasoning β†’ happens every time β†’ varies based on context

So What's Actually Different?

  • Flexibility: LLMs can handle novel situations without redeployment
  • Maintenance: No code changes needed when requirements evolve slightly
  • Tradeoffs: Less predictable, more expensive per execution, can make mistakes

But you're right - calling traditional software "hardcoded logic" diminishes the sophisticated reasoning developers do. Both approaches involve reasoning; they just happen at different points in the lifecycle.

The honest pitch for "agentic AI": It's outsourcing runtime decision-making to an LLM instead of encoding developer reasoning into software. Whether that's valuable depends entirely on your use case.


The Consistency Problem

User: How can there be consistency in the output of an Agentic AI's work?

Claude: Excellent question - and honestly, there often isn't consistency, which is a major problem.

The Consistency Challenge

LLMs are fundamentally probabilistic. Even with the same input, they can produce different outputs because:

  • They sample from probability distributions
  • Temperature settings introduce randomness
  • Context window variations affect reasoning
  • Model updates change behavior

So an agentic AI running the same task twice might:

  • Take different approaches
  • Produce different quality results
  • Make different decisions at branch points
  • Even fail in different ways

Strategies to Improve Consistency

1. Temperature = 0 - Reduces randomness (but doesn't eliminate it entirely)

2. Structured outputs - Force the LLM to return JSON/XML in specific schemas so at least the format is consistent

3. Validation layers - Check outputs against rules before accepting them

  • "Does this lesson plan include all required sections?"
  • "Is this code syntactically valid?"

4. Deterministic components - Use traditional code for parts that must be consistent, LLM only for truly fuzzy decisions

5. Multiple runs + voting - Run the agent multiple times, compare results, use consensus or best-scoring output

6. Human-in-the-loop - Review before final execution (defeats some of the "autonomous" appeal)

The Brutal Truth

For tasks requiring high consistency - like financial calculations, compliance checks, or critical workflows - traditional software is still superior.

Agentic AI works better for tasks where:

  • Some variation is acceptable or even desirable
  • Perfect consistency isn't critical
  • The cost of human review is factored in
  • Flexibility matters more than reliability

Real-World Example

Bad use: Auto-grade math worksheets (needs perfect consistency) Good use: Generate varied practice problems for differentiation (variation is a feature)

This inconsistency problem is why many "agentic AI" implementations are really AI-assisted rather than fully autonomous - humans still review the outputs before they matter.


Failure Detection

User: How does the Agentic AI "know" it failed?

Claude: This is where it gets really messy. Often, it doesn't know it failed - at least not in any reliable way.

The Failure Detection Problem

Traditional software:

  • Exceptions/errors are explicit
  • Validation rules are hardcoded
  • Tests verify expected vs. actual
  • Success/failure is boolean and knowable

Agentic AI:

  • May produce plausible-looking garbage
  • Can "hallucinate" success when it failed
  • Doesn't inherently know what "correct" means
  • Confidence scores don't correlate well with accuracy

How Agentic Systems TRY to Detect Failure

1. Tool/API errors - If the system tries to call an API and gets a 404, it knows something went wrong (but not necessarily what)

2. Explicit validation checks - Developer-coded rules:

  • "If generating a lesson plan, it must have these sections"
  • "If writing code, run it and check for syntax errors"
  • "If fetching data, verify the response isn't empty"

3. Self-reflection prompts - Ask the LLM to evaluate its own work:

  • "Review what you just created. Does it meet the requirements?"
  • Problem: LLMs are poor judges of their own outputs and often hallucinate success

4. Expected output patterns - Check if output matches expected structure/format, though this catches format errors, not logical errors

5. External validators - Run outputs through separate systems:

  • Code linters
  • Spell checkers
  • Business rule engines
  • Other LLMs (cross-validation)

What It CAN'T Reliably Detect

  • Subtle incorrectness - Generated a lesson plan that looks good but pedagogically makes no sense
  • Logical errors - Code that runs but produces wrong results
  • Hallucinated facts - Cited resources that don't exist
  • Missed requirements - Forgot to address ELL accommodations but thinks it did

The Dirty Secret

Most production agentic systems rely on:

  • Human review of outputs (not truly autonomous)
  • Limited scope where failures are low-stakes
  • Multiple validation layers you had to manually build
  • Retry loops that burn through API calls hoping eventually it works

Example from Teaching Context

Task: "Generate a math worksheet with 10 subtraction problems for 2nd graders"

Possible failures:

  • βœ… Can detect: Generated 8 problems instead of 10 (count check)
  • βœ… Can detect: Problems use numbers over 100 (rule validation)
  • ❌ Hard to detect: Problems are too easy/hard for grade level
  • ❌ Hard to detect: Word problems use vocabulary too advanced for ELLs
  • ❌ Hard to detect: Answers in the key are wrong

The system might confidently report success while having failed in ways that matter.

The Honest Answer

Agentic AI often doesn't reliably know when it failed - which is why you need extensive external validation, or accept that failures will slip through, or keep humans in the loop.

This is a fundamental limitation that the "agentic AI" hype often glosses over.


Summary

This conversation explored the concept of Agentic AI from a critical, technical perspective:

  1. Definition: Agentic AI systems use LLMs for autonomous, goal-directed task execution with tool use and multi-step reasoning
  2. Technology: Primarily Python-based frameworks (LangChain, AutoGen, CrewAI) running on cloud infrastructure
  3. Architecture: Essentially orchestration layers around LLMs rather than fundamentally new AI
  4. Comparison to traditional approaches:
    • Not the same as business analytics (which provides insights, not execution)
    • Similar to prescriptive analytics but adds autonomous execution
    • Structurally identical to traditional software automation, just with LLM reasoning replacing developer-encoded logic
  5. Key distinction: Developer reasoning happens once (at development time) and executes consistently; LLM reasoning happens at runtime each time, with variation
  6. Major limitations:
    • Consistency: Probabilistic nature means variable outputs for the same inputs
    • Failure detection: Often can't reliably determine when it has failed
    • Reliability: Requires extensive validation layers and often human review
  7. Honest assessment: "Agentic AI" is evolutionary rather than revolutionary - it's automation with LLM-based decision-making instead of traditional code, with significant tradeoffs in consistency and reliability

Subscribe to electricDust

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe