Content is user-generated and unverified.

What Work is AI Actually Doing? A Dinner Table Guide

Paper Link: ArXiv preprint 2510.23669v2

Executive Summary (The Elevator Pitch)

Researchers analyzed 4 million real conversations with Claude AI to discover that people primarily use AI for complex, creative tasks like brainstorming and synthesis—not routine work—with just 5% of all work tasks accounting for 59% of AI usage. The study reveals three distinct types of work tasks and found that "dynamic problem-solving" tasks attract the most AI assistance, while surprisingly, tasks requiring social intelligence show almost no correlation with AI adoption. This suggests AI is becoming a "thinking partner" for cognitive heavy lifting rather than a replacement for routine tasks or human interaction.

Authors & Institutions

Authors:

Peeyush Agarwal - Netaji Subhas University of Technology (NSUT), Delhi, India
Harsh Agarwal - Adobe Inc., Noida, India
Akshat Rana - Netaji Subhas University of Technology (NSUT), Delhi, India

Institutional Context: This is a collaboration between an Indian university (NSUT Delhi) and Adobe's India office, bringing together academic research and industry perspective.

Conflicts of Interest Assessment

Potential Conflicts:

One author (Harsh Agarwal) works for Adobe, a major AI product company, though Adobe's products weren't specifically studied
The study uses Anthropic's data exclusively, which could create selection bias toward Anthropic's user base
No funding sources are explicitly disclosed in the provided excerpt

Overall Assessment: The conflicts appear minimal and manageable. The paper is transparent about using Anthropic's dataset, and the methodology appears independent of commercial interests.

The Data: What They Actually Studied

Primary Dataset:

4 million anonymized conversations with Claude AI
Mapped to 3,514 work tasks from the U.S. Department of Labor's O*NET database
O*NET breaks down hundreds of jobs into thousands of specific tasks (like "analyze data to determine feasibility of product proposals")

How They Measured Tasks:

Created a framework with 7 key dimensions: Routine, Cognitive, Social Intelligence, Creativity, Domain Knowledge, Complexity, and Decision Making
Broke each dimension into 5 specific parameters (35 total measurements)
Used Google's Gemini 2.5 Pro AI to score each task on all 35 parameters on a 1-10 scale

What They Found:

AI usage follows an extreme "long tail" distribution—most tasks get little use, a few get massive use
Top correlations: Idea generation, information processing, and originality
Bottom correlations: Predictable outcomes and repetitive tasks
Three distinct "archetypes" of work emerged from clustering analysis

Strengths: What They Got Right

Real-World Data vs. Theory This is the first large-scale study using actual AI usage patterns rather than expert predictions or surveys about what might happen. They analyzed what millions of people are actually doing with AI.

Comprehensive Framework Instead of simple "routine vs. non-routine" categories, they developed a sophisticated 7-dimension, 35-parameter system that captures the nuance of modern knowledge work. This is like going from "hot or cold" to having a sophisticated weather system.

Large Sample Size With 4 million conversations across 3,514 different work tasks, this isn't a small pilot study—it's big enough to detect real patterns and avoid random noise.

Multiple Statistical Techniques They didn't just count usage; they used Principal Component Analysis to find hidden patterns, K-Means clustering to identify task types, and MANOVA to prove the clusters are genuinely different. This is rigorous multivariate analysis, not just descriptive statistics.

Transparent Methodology They clearly explain their LLM-based scoring approach, acknowledge its limitations, and provide detailed information about their analytical choices (like why they chose 3 clusters). Replication would be possible.

Actionable Insights The three task archetypes (Dynamic Problem Solving, Procedural & Analytical Work, Standardized Operational Tasks) provide a practical framework that businesses and policymakers can actually use.

Weaknesses: Where to Be Skeptical

Single AI Platform Bias They only studied Claude AI users, who might be systematically different from ChatGPT, Gemini, or other AI tool users. It's like studying iPhone users and assuming all smartphone users behave the same way.

"AI Scoring AI" Problem They used one AI (Gemini) to judge work tasks, then analyzed how another AI (Claude) is used for those tasks. This creates potential circular reasoning and may embed Silicon Valley assumptions about what counts as "complex" or "creative" work.

Snapshot, Not a Movie This is cross-sectional data from one point in time. AI usage patterns may have dramatically changed since data collection, and we can't see how adoption evolves as people get more sophisticated with the tools.

The Missing "Why" Question The study shows what tasks attract AI usage but relies heavily on correlation to infer why. They assume high usage of creative tasks means people want "cognitive offloading," but users might be experimenting, procrastinating, or using AI poorly for these tasks.

O*NET May Be Outdated The O*NET task taxonomy was designed before modern AI existed. It might not capture new forms of work or might group together activities that AI affects very differently. It's like using a 1990s map to navigate today's city.

Representative Sample Uncertainty We don't know if Claude users are representative of all workers. They might be early adopters, tech workers, or specific demographics that differ from the broader workforce. The study can't weight results to match actual labor market composition.

Social Intelligence Puzzle The finding that social intelligence shows "near-zero correlation" is presented as clear-cut, but this could mean (a) AI can't do social tasks, (b) people don't trust AI for social tasks yet, or (c) O*NET's definition of "social intelligence" doesn't match how people actually use AI socially. The study doesn't distinguish between these explanations.

Concentration vs. Value Confusion Just because 5% of tasks account for 59% of usage doesn't necessarily mean those are the most "important" or "valuable" tasks—they might just be the easiest to delegate or the most fun to experiment with using AI.

The Bottom Line

This is solid empirical work that moves the AI-and-work conversation from speculation to data. The methodology is sophisticated and mostly sound, though the reliance on AI-generated task scoring and the single-platform limitation should make us cautious about overgeneralizing. The finding that AI is primarily used for complex cognitive tasks rather than routine work is genuinely surprising and well-supported, but the "why" behind these patterns needs more investigation.

Content is user-generated and unverified.