93 archived articles for May 2026, grouped by publication day.
93
Archived articles
8
Active days
8
Sources
arXiv cs.AI
69 articles
Latest: May 16, 2026
OpenAI
6 articles
Latest: May 15, 2026
The Decoder
5 articles
Latest: May 16, 2026
Towards AI
3 articles
Latest: May 16, 2026
TechCrunch
3 articles
Latest: May 15, 2026
Hugging Face
2 articles
Latest: May 14, 2026
Hacker News
1 article
Latest: May 15, 2026
The Verge
1 article
Latest: May 15, 2026
A three-person team led by Peter Steinberger keeps about 100 Codex instances running for the open-source project OpenClaw, driving OpenAI API spend to $1.3 million a month. Steinberger frames the bill as a research inve…
Multi-repo agent envs, build-scoped secrets, and Dockerfile layer caching shipped on May 13. I ran every workflow I hated against them. Continue reading on Towards AI »
Ollama just got 93% faster on every Apple Silicon Mac, and it did it without touching the model, the quantization, or the hardware. Continue reading on Towards AI »
Part 1: The Era of Naive MoE Scaling Continue reading on Towards AI »
Researchers at the Allen Institute for AI and UC Berkeley have built EMO, a mixture-of-experts model whose experts specialize in content domains instead of word types. That lets you strip out three-quarters of the exper…
arXiv:2605.13850v1 Announce Type: new Abstract: Existing frameworks for LLM-based agent architectures describe systems from a single perspective: industry guides (Anthropic, Google, LangChain) focus on execution topolog…
arXiv:2605.14033v1 Announce Type: new Abstract: Scientific theory shift in AI agents requires more than fitting equations to data. An artificial scientific agent must detect whether an existing representational framewor…
arXiv:2605.14034v1 Announce Type: new Abstract: Wide applications of LLM-based agents require strong alignment with human social values. However, current works still exhibit deficiencies in self-cognition and dilemma de…
arXiv:2605.14038v1 Announce Type: new Abstract: Large language models (LLMs) increasingly act as autonomous agents that must decide when to answer directly vs. when to invoke external tools. Prior work studying adaptive…
arXiv:2605.14051v1 Announce Type: new Abstract: Industrial LLM agent systems often separate planning from execution, yet LLM planners frequently produce structurally invalid or unnecessarily long workflows, leading to b…
arXiv:2605.14062v1 Announce Type: new Abstract: While synthetic data generation with large language models (LLMs) is widely used in post-training pipelines, existing approaches typically generate full outputs before app…
arXiv:2605.14133v1 Announce Type: new Abstract: Interactive agent benchmarks face a tension between scalable construction and realistic workflow evaluation. Hand-authored tasks are expensive to extend and revise, while…
arXiv:2605.14141v1 Announce Type: new Abstract: We study learning when the learned object is executable solver code rather than a predictor. In this setting, correctness is not enough: two solvers may both return valid…
arXiv:2605.14164v1 Announce Type: new Abstract: The primary way to establish and compare competencies in foundation and generative AI models has shifted from peer-reviewed literature to press releases and company blog p…
arXiv:2605.14175v1 Announce Type: new Abstract: In long conversations, an LLM can produce a next utterance that sounds plausible but rests on premises the conversation has already abandoned. Context-manipulation attacks…
arXiv:2605.14266v1 Announce Type: new Abstract: Integration of artificial intelligent (AI) agents in higher education is transforming teaching, learning and administrative processes. Although existing AI agents effectiv…
arXiv:2605.14420v1 Announce Type: new Abstract: Current Large Language Models (LLMs) typically rely on coarse-grained national labels for pluralistic value alignment. However, such macro-level supervision often obscures…
arXiv:2605.14458v1 Announce Type: new Abstract: Omni-modal large language models have demonstrated remarkable potential in holistic multimodal understanding; however, the token explosion caused by high-resolution audio…
arXiv:2605.14537v1 Announce Type: new Abstract: We introduce \textsc{Cattle Trade, a multi-agent benchmark for evaluating large language models (LLMs) as agents in strategic reasoning under imperfect information, advers…
arXiv:2605.14556v1 Announce Type: new Abstract: Symmetrical Reality (SR) is emerging as a future trend for human-agent coexistence, placing higher demands on agents to acquire human-like intelligence. It calls for riche…
arXiv:2605.14604v1 Announce Type: new Abstract: This position paper argues that effective tutoring requires corrective friction: surfacing misconceptions and challenging them supportively to drive conceptual change. Yet…
arXiv:2605.14667v1 Announce Type: new Abstract: A main barrier for the deployment of AI radiomic systems in clinical routine is their drop in performance under heterogeneous multicentre acquisition protocols. This work…
arXiv:2605.14723v1 Announce Type: new Abstract: Sepsis management in the ICU requires sequential treatment decisions under rapidly evolving patient physiology. Although large language models (LLMs) encode broad clinical…
arXiv:2605.14802v1 Announce Type: new Abstract: Large language models often suffer from fact loss, timeline confusion, persona drift, and reduced stability during long-range interaction, especially under high-noise know…
arXiv:2605.14892v1 Announce Type: new Abstract: LLM-based autonomous agents have demonstrated strong capabilities in reasoning, planning, and tool use, yet remain limited when tasks require sustained coordination across…
arXiv:2605.15041v1 Announce Type: new Abstract: Tool use extends large language models beyond parametric knowledge, but reliable execution requires balancing appropriate reasoning depth with strict structural validity.…
arXiv:2605.09027v2 Announce Type: cross Abstract: In multi-agent systems (MAS), a single deceptive agent can nullify all gains of an agentic AI collective and evade deployed defenses. However, existing adversarial studi…
arXiv:2605.13909v1 Announce Type: cross Abstract: Negotiation is a central mechanism of economic exchange, shaping markets, procurement, labor agreements, and resource allocation. It is also a canonical testbed for agen…
arXiv:2605.13915v1 Announce Type: cross Abstract: Quantization is essential for efficient large language model (LLM) inference, yet the dequantization step-converting low-bit weights back to high-precision for matrix mu…
arXiv:2605.13936v1 Announce Type: cross Abstract: The recent success of large language models (LLMs) has been largely driven by vast public datasets. However, the next frontier for LLM development lies beyond public dat…
arXiv:2605.13941v1 Announce Type: cross Abstract: Long-term memory is essential for LLM agents that operate across multiple sessions, yet existing memory systems treat retrieval infrastructure as fixed: stored content e…
arXiv:2605.13950v1 Announce Type: cross Abstract: Autonomous language-model agents are increasingly evaluated on long-horizon tool-use tasks, but existing benchmarks rarely capture the complexity and nuance of real scie…
arXiv:2605.13981v1 Announce Type: cross Abstract: The rise in deployment of large language models has driven a surge in GPU demand and datacenter scaling, raising concerns about electricity use, grid stress, and the imp…
arXiv:2605.14117v1 Announce Type: cross Abstract: An AI system for professional floor plan design must precisely control room dimensions and areas while respecting the desired connectivity between rooms and maintaining…
arXiv:2605.14153v1 Announce Type: cross Abstract: Exploitation is not a binary event. It is a ladder of acquiring progressive capabilities, from executing a single buggy line of code to taking full control of the target…
arXiv:2605.14202v1 Announce Type: cross Abstract: Malformed, missing, or boundary-value inputs in microservice APIs can cascade across dependent services, threatening reliability. Robustness testing systematically exerc…
arXiv:2605.14220v1 Announce Type: cross Abstract: Modern LLM RL systems separate rollout generation from policy optimization. These two stages are expected to produce token probabilities that match exactly. However, imp…
arXiv:2605.14418v1 Announce Type: cross Abstract: "Oh-Oh, yes, I'm the great pretender. Pretending that I'm doing well. My need is such, I pretend too much..." summarizes the state in the area of jailbreak creation and…
arXiv:2605.14421v1 Announce Type: cross Abstract: We introduce MemLineage, a defense for LLM agent memory that attaches both cryptographic provenance and LLM-mediated derivation lineage to every entry. Recent and concur…
arXiv:2605.14543v1 Announce Type: cross Abstract: Inpatient medication recommendation requires clinicians to repeatedly select specific medications, doses, and routes as a patient's condition evolves. Existing benchmark…
arXiv:2605.14679v1 Announce Type: cross Abstract: Cultural heritage institutions increasingly disseminate research and interpretive materials globally, but multilingual dissemination is constrained by limited budgets an…
arXiv:2605.14744v1 Announce Type: cross Abstract: Large language models in regulated financial workflows are governed by natural-language policies that the same model interprets, creating a principal--agent failure: out…
arXiv:2605.14766v1 Announce Type: cross Abstract: Normally, a system that translates speech into text consists of separate modules for speech recognition and text-to-text translation. Combining those tasks into a Speech…
arXiv:2605.14786v1 Announce Type: cross Abstract: As LLM-based agents increasingly browse the web on users' behalf, a natural question arises: can websites passively identify which underlying model powers an agent? Doin…
arXiv:2605.14844v1 Announce Type: cross Abstract: We introduce XFP, a dynamic weight quantizer for LLM inference that inverts the conventional workflow: the operator specifies reconstruction quality floors on per-channe…
arXiv:2605.15000v1 Announce Type: cross Abstract: Premature closure, or committing to a conclusion before sufficient information is available, is a recognized contributor to diagnostic error but remains underexamined in…
arXiv:2605.15044v1 Announce Type: cross Abstract: As audio-first agents become increasingly common in physical AI, conversational robots, and screenless wearables, audio large language models (audio-LLMs) must integrate…
arXiv:2605.15053v1 Announce Type: cross Abstract: Continually pre-training a large language model on heterogeneous text domains, without replay or task labels, has remained an unsolved architectural problem at LLM scale…
arXiv:2605.15077v1 Announce Type: cross Abstract: Function calling, also known as tool use, is a core capability of modern LLM agents but is typically constrained by synchronous execution semantics. Under these semantic…
arXiv:2605.15152v1 Announce Type: cross Abstract: LLM quantization has become essential for memory-efficient deployment. Recent work has shown that quantization schemes can pose critical security risks: an adversary may…
arXiv:2508.06226v4 Announce Type: replace Abstract: Geometry problem solving (GPS) poses significant challenges for Multimodal Large Language Models (MLLMs) in diagram comprehension, knowledge application, long-step rea…
arXiv:2601.21714v4 Announce Type: replace Abstract: The evolution of Large Language Model (LLM) agents towards System~2 reasoning, characterized by deliberative, high-precision problem-solving, requires maintaining rigo…
arXiv:2602.02711v2 Announce Type: replace Abstract: Large language models (LLMs) achieve strong performance in long-horizon decision-making tasks through multi-step interaction and reasoning at test time. While practiti…
arXiv:2602.15019v5 Announce Type: replace Abstract: Bio-pharmaceutical innovation has shifted: many new drug assets now originate outside the United States and are disclosed primarily via regional, non-English channels.…
arXiv:2603.16659v3 Announce Type: replace Abstract: Reinforcement-learned reasoning has powered recent AI leaps on verifiable tasks, including mathematics, code, and structure prediction. The harder bottleneck is evalua…
arXiv:2605.01847v3 Announce Type: replace Abstract: Outcome-only evaluation under-specifies whether an evaluated agent profile preserves the commitments required to solve a multi-turn task coherently. NeuroState-Bench i…
arXiv:2605.03596v4 Announce Type: replace Abstract: Workspace learning requires AI agents to identify, reason over, exploit, and update explicit and implicit dependencies among heterogeneous files in a worker's workspac…
arXiv:2605.09038v2 Announce Type: replace Abstract: Teaching language models to use search tools is not only a question of whether they search, but also of whether they issue good queries. This is especially important i…
arXiv:2410.02091v3 Announce Type: replace-cross Abstract: Generative artificial intelligence (AI) facilitates content production and enhances ideation capabilities, which can significantly influence developer productivi…
arXiv:2504.11703v3 Announce Type: replace-cross Abstract: AI agents interact with external environments through tool calls, exposing them to attacks like indirect prompt injection that can trigger unauthorized actions.…
arXiv:2510.15982v3 Announce Type: replace-cross Abstract: Autoregressive large language models (LLMs) have achieved remarkable improvement across many tasks but incur high computational and memory costs. Knowledge disti…
arXiv:2511.18903v3 Announce Type: replace-cross Abstract: Due to the scarcity of high-quality data, large language models (LLMs) are often trained on mixtures of data with varying quality levels, even after sophisticate…
arXiv:2601.16312v2 Announce Type: replace-cross Abstract: Research in AI4Science has shown promise in many science applications, including polymer design. However, current LLMs are ineffective in this problem space beca…
arXiv:2601.19924v2 Announce Type: replace-cross Abstract: We investigate the capabilities and scalability of Large Language Models (LLMs) in optimization modeling, a domain requiring structured reasoning and precise for…
arXiv:2602.04265v3 Announce Type: replace-cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising paradigm for enhancing reasoning in Large Language Models (LLMs). However, exist…
arXiv:2603.04601v3 Announce Type: replace-cross Abstract: Code generation has emerged as one of AI's highest-impact use cases, yet existing benchmarks measure isolated tasks rather than the complete "zero-to-one" proces…
arXiv:2603.12554v2 Announce Type: replace-cross Abstract: Reinforcement learning (RL) has been effective for post-training autoregressive (AR) language models, but extending these methods to diffusion language models (D…
arXiv:2603.20334v3 Announce Type: replace-cross Abstract: In high-complexity abstract reasoning, a system must infer a latent rule from a few examples or structured observations and apply it to unseen instances. LLMs ca…
arXiv:2604.05306v2 Announce Type: replace-cross Abstract: Large language models (LLMs) often produce confident yet incorrect answers, which can lead to risky failures in real-world applications. We study whether post-tr…
arXiv:2605.04215v2 Announce Type: replace-cross Abstract: Diffusion-based Large Language Models (D-LLMs) represent a promising frontier in generative AI, offering fully parallel token generation that can lead to signifi…
arXiv:2605.11453v2 Announce Type: replace-cross Abstract: Practitioners deploying multi-agent large language model (LLM) systems must currently choose between communication topologies such as chain, star, mesh, and rich…
arXiv:2605.11853v2 Announce Type: replace-cross Abstract: Reinforcement learning has become a widely used post-training approach for LLM agents, where training commonly relies on outcome-level rewards that provide only…
arXiv:2605.12394v2 Announce Type: replace-cross Abstract: Training Neural Networks (NNs) without overfitting is difficult; detecting that overfitting is difficult as well. We present a novel Random Matrix Theory method…
arXiv:2605.12484v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are trained for downstream tasks by updating their parameters (e.g., via RL). However, updating parameters forces them to absorb tas…
Article URL: Comments URL: Points: 96 # Comments: 15
OpenAI announced yet another reorganization Friday, consolidating certain areas and making company president Greg Brockman the official lead of all things product. In a memo viewed by The Verge, Brockman wrote that sinc…
OpenAI is turning ChatGPT into a personal financial assistant. Pro users in the US can now connect their bank accounts through Plaid to get personalized analysis based on real transaction data. The feature runs on GPT-5…
Once users connect their accounts, they will see a dashboard of their portfolio performance, spending, subscriptions, and upcoming payments.
Elon Musk's AI company x.AI is jumping into the coding agent space with Grok Build, a new terminal-based tool. The article x.AI plays catch-up with Grok Build, its first terminal-based coding agent appeared first on The…
Thousands of Microsoft developers used Anthropic's Claude Code for programming. Now the company is revoking licenses and betting on GitHub Copilot CLI. The article Microsoft pulls Claude Code licenses and pushes develop…
Databricks uses GPT-5.5 for enterprise agent workflows after the model set a new state of the art on the OfficeQA Pro benchmark.
The update gives users enhanced flexibility over how they can manage their workflows.
A new open source gadget called Clawdmeter turns Claude Code usage stats into a tiny desktop dashboard for AI coding power users.
Release: llm-gemini 0.31 gemini-3.1-flash-lite is no longer a preview . Here's my write-up of the Gemini 3.1 Flash-Lite Preview model back in March. I don't believe this new non-preview model has changed since then. Tag…
OpenAI expands Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber, helping verified defenders accelerate vulnerability research and protect critical infrastructure.
Introducing Trusted Contact in ChatGPT, an optional safety feature that notifies someone you trust if serious self-harm concerns are detected.
Meet the ChatGPT Futures Class of 2026—26 student innovators using AI to build, research, and drive real-world impact. Discover how this generation is redefining learning, creativity, and opportunity with ChatGPT.