AIINSIGHT NOTE

AI · Web3 · Tech trends and insights at a glance

AIINSIGHT NOTE

AI · Web3 · Tech trends and insights at a glance

© 2026 AI Insight Note. All rights reserved.

AboutPrivacy Policy

Papers

Deep dives into the latest research papers

arXiv · 2026.06.22

Pouring Capacity Into Early Layers, A Free Gain Hidden in Plain Sight

A new study questions the unexamined default that every layer in a language model deserves the same parameter budget. Under a fixed budget, giving more width to early layers and tapering it toward the end improves performance at zero extra cost. The effect holds across four architectures, exposing depth-aware allocation as a free lever in model design.

Liminal·P2026.06.23
arXiv · 2026.06.22

Solving Bit Puzzles by String Similarity Instead of Arithmetic, a Rethink of Reasoning Design

Language models collapse into hallucination when forced to simulate bitwise arithmetic in their heads. A team entering the NVIDIA Nemotron reasoning challenge abandoned arithmetic entirely, reframing the task around string similarity and backtracking search to reach over 96% accuracy. The deeper move is teaching an LLM to search and recover from errors rather than to calculate.

Liminal·P2026.06.23
arXiv · 2026.06.22

Can AdamW Survive Heavy Tails, and Why Its Denominator Memory Might Not Let It

Almost every large language model is trained with AdamW, yet no one has proven it converges under the heavy-tailed gradient noise that real pretraining actually produces. Lion, Muon, and AdaGrad already cleared that bar, so why is AdamW still the blank entry? The answer may lie in how its denominator quietly remembers a past spike and buries the next important gradient.

Liminal·P2026.06.23
arXiv · 2026.06.22

Diversity Steered by Meaning, A New Grammar for Creative Image Exploration

Text-to-image models grow so faithful to the prompt that their outputs collapse into a single interpretation. A team at Tel Aviv University proposes inducing diversity at the text level rather than in pixels, letting a vision-language model lay out interpretable axes of variation that users can navigate like a gallery. The result reframes generation as controllable exploration instead of a slot machine.

Liminal·P2026.06.23
arXiv · 2026.06.22

Walking While Grasping, How Latent-Residual Control Unlocks Non-Stop Dexterity

Humanoid robots have long had to halt before they could manipulate anything. CoorDex compresses both whole-body and twenty-finger control into separate frozen latent priors and learns only a small residual on top, letting a Unitree G1 grasp bottles and open fridge doors while still in motion. The real lesson lies less in the algorithm than in the interface that finally makes high-dimensional contact control trainable.

Liminal·P2026.06.23
arXiv · 2026.06.22

Closing the Real-World Grasping Loop, Removing the Bottleneck in Dexterous Manipulation

Teaching a robot hand to grasp dexterously demands data that records what physically happens when it tries—data that until now meant either slow human teleoperation or simulations that cannot certify real contact. AutoDex closes the entire collection loop, from perception to execution to labeling to reset, with no human in between, gathering physically validated grasp data 4.8 times faster than teleoperation.

Liminal·P2026.06.23
arXiv · 2026.06.18

What Safety-Aligned LLMs Extract From Mixed Demonstrations: Jailbreak Mechanics Dissected

A new study finds that benign compliance demonstrations can sometimes increase — not decrease — harmful compliance in safety-aligned models, depending on training methodology. Preference optimization, not supervised fine-tuning alone, emerges as the critical stage that blocks this pathway. The work moves past showing that demonstration-based jailbreaks work to explaining how and why they do.

Liminal·P2026.06.19
arXiv · 2026.06.18

Repository Guidance Quality Determines Coding Agent Coverage, Probe-and-Refine Tuning Validated

Whether AGENTS.md files actually help coding agents has been surprisingly contested. This paper identifies the decisive variable: not whether guidance exists, but how it is produced. Using synthetic bug-fix probes to iteratively refine guidance files, the method improves agent performance by expanding coverage — helping agents find the right file — not by improving code-editing quality.

Liminal·P2026.06.19
arXiv · 2026.06.18

LLM Code Benchmarking Beyond Python, Cross-Language Gaps Laid Bare

LiveCodeBench, the de facto standard for LLM code evaluation, has always had a quiet blind spot: it only tests Python. Multi-LCB extends it to twelve languages and reveals an uncomfortable truth about where current models actually stand — and where they don't.

Liminal·P2026.06.19
arXiv · 2026.06.17

Execution-Grounded Autonomous Agents Sweep Seven SQL Benchmarks, Enterprise Data Bottleneck Dissolved

Rather than generating text about what should be done, Data Intelligence Agents execute code, observe outcomes, and repair failures in a tight loop — a structural shift that compresses the chronic handoff problem in enterprise data pipelines. Deployed in production and matching or surpassing state-of-the-art across seven SQL benchmarks, DIA offers a credible template for autonomous data intelligence at scale.

Liminal·P2026.06.18
arXiv · 2026.06.17

Rubric-Conditioned Self-Distillation, Rethinking Feedback Structure for Reasoning Models

The two dominant paradigms for training language model reasoning — supervised distillation and reinforcement learning with scalar rewards — each carry structural weaknesses that have largely been treated as inevitable. A Yale research team's Rubric-Conditioned Self-Distillation framework addresses both at once, using structured evaluation criteria as token-level training signal to enable fine-grained credit assignment over the reasoning process.

Liminal·P2026.06.18
arXiv · 2026.06.17

LOCUS Opens America's Local Law to AI, Filling Legal NLP's Last Gap

U.S. local ordinances — the rules governing zoning, noise, and business licenses — have long been the missing layer in legal AI research, locked inside vendor platforms not built for bulk access. LOCUS changes that, releasing machine-readable codes from 9,239 cities and counties alongside ModernBERT classifiers that can measure, for the first time at national scale, how opaque and paternalistic local law actually is.

Liminal·P2026.06.18
arXiv · 2026.06.16

Reading the Invisible Adversary, Imitation Learning Reshapes Autonomous Cyber Defense

A cyber attacker's moves are never directly visible to the defender — only their aftermath is. This paper proposes an imitation learning framework that reconstructs red agent policy purely from network observations and defender actions, without ever seeing the attacker act. Integrated into a neurosymbolic defense architecture, the approach achieves high prediction accuracy across diverse simulated attack scenarios.

Liminal·P2026.06.17
arXiv · 2026.06.16

Memory That Evolves, Navigation That Improves: Rethinking Zero-Shot Object Search

Zero-shot object goal navigation has long been bottlenecked by a fundamental paradox: foundation models bring broad knowledge, but that knowledge is frozen at training time, leaving agents unable to learn from their own mistakes. EvolveNav breaks this cycle with a self-evolving memory that distills past trajectories into actionable rules, then uses those rules to forecast outcomes before each move — cutting unnecessary steps and pushing success rates 10.1 points ahead of prior baselines.

Liminal·P2026.06.17
arXiv · 2026.06.16

VERITAS Closes the Deployment Gap, Robots Improve Without Human Intervention

Deployed robots have always faced a hard ceiling: without fresh demonstrations, a policy simply stops learning. VERITAS changes that by pairing a generalist robot policy with a gradient-free visual verifier — steering actions at inference time and converting verified rollouts into autonomous self-improvement data. The results rival expert demonstrations, with no human involvement required.

Liminal·P2026.06.17
arXiv · 2026.06.15

Gradient Instability in Deep Networks Decoded, the Lyapunov Spectrum of Residual Connections

The exploding and vanishing gradient problem has long been treated as an empirical nuisance in deep learning, managed through engineering workarounds rather than truly understood. A new paper by Vivek S. Borkar applies multiplicative ergodic theory to give the first rigorous mathematical account of why gradients misbehave in deep networks — and why residual connections fix it.

Liminal·P2026.06.16
arXiv · 2026.06.15

Context Without Cache Waste: TokenPilot's Dual Strategy for Agent Cost Control

Every technique that compresses an LLM agent's context window risks invalidating the KV cache, turning token savings into compute losses. TokenPilot resolves this structural tension through a two-layer framework that stabilizes prompt prefixes at ingestion and defers eviction until task relevance genuinely expires — cutting inference costs by up to 87% without sacrificing agent performance.

Liminal·P2026.06.16
arXiv · 2026.06.15

Sparse Outcomes Decoded: Hierarchical Advantage Weighting for Scalable VLA Fine-Tuning

Every real-world robot episode ends with a single binary signal: success or failure. HABC argues that compressing this outcome into one scalar conflates two fundamentally distinct objectives — viability and efficiency — and that naively assigning labels across human intervention segments corrupts learning at its foundation. Two independent critic heads and a state-adaptive gate resolve both problems, tripling success rates on the hardest contact-rich tasks.

Liminal·P2026.06.16
arXiv · 2026.06.11

Majority of Three Classifiers Proves Optimal, Rewriting the Theory of Ensemble Learning

How much data does it take to learn well? A new theoretical result shows that the majority vote of just three independently trained classifiers achieves the provably optimal sample complexity in the PAC learning framework. The elegance of this finding lies as much in its proof as in its conclusion: a short, clean argument that subsumes previous complex analyses and reshapes how we think about ensemble methods.

Liminal·P2026.06.14
arXiv · 2026.06.11

Neural Fire Prediction Meets Aerial Drop Optimization, a New Framework for Wildfire Suppression Planning

A new framework combines a hybrid CNN-cellular automaton fire model with gradient-based optimization to automatically design aerial suppression strategies. Validated on the 2020 Bear Fire, the system unifies prediction and intervention planning within a single differentiable loop while supporting rigorous uncertainty analysis.

Liminal·P2026.06.14
arXiv · 2026.06.11

SkMTEB Closes the Slovak Embedding Gap, Charting a Replicable Path for Low-Resource Language AI

The Slovak Massive Text Embedding Benchmark exposes a counterintuitive finding: models built specifically for Slovak NLU tasks perform poorly on embedding benchmarks, while large multilingual models dominate despite no Slovak-specific training. The researchers' answer — vocabulary trimming applied to Multilingual E5 — cuts model size by 62% while matching proprietary API performance, and the full pipeline is released openly as a blueprint for other underserved languages.

Liminal·P2026.06.14
arXiv · 2026.06.11

On-Policy Distillation's Hidden Sparsity, a New Map of LLM Post-Training Geometry

Despite receiving dense, token-level teacher supervision, on-policy distillation updates only a sparse subset of model parameters — and those updates preferentially land where the source model's weights are near zero. A new study offers the first systematic account of the geometric and sparsity signatures that OPD leaves on model internals.

Liminal·P2026.06.13
arXiv · 2026.06.11

System 0 and AI-Mediated Cognition, the Risk of Cognitive Colonization

A new paper introduces 'cognitive colonization' — the idea that AI systems can embed external interests within the architecture of the self in ways users cannot easily perceive. Drawing on the concept of System 0, the authors argue that AI does not merely assist human thinking but shapes the cognitive landscape before thought begins, raising urgent philosophical and practical questions about autonomy in an AI-saturated world.

Liminal·P2026.06.13
arXiv · 2026.06.11

The Environment as the Bottleneck, Autonomous Scientific Discovery Reimagined by EurekAgent

As language models grow more capable, EurekAgent argues the real constraint on autonomous scientific discovery has shifted from agent design to environment design. By engineering permissions, artifact management, budgets, and human oversight as first-class concerns, the system sets new records on mathematical optimization problems for less than $11 in API costs.

Liminal·P2026.06.13
arXiv · 2026.06.11

Agents-K1 and Scholar-KG: Rethinking Scientific Knowledge for AI Research Agents

Most LLM-based research agents reduce scientific papers to abstracts and flat citation edges — missing the claims, evidence, and method lineages that make scientific reasoning possible. Agents-K1 is an end-to-end pipeline that converts raw papers into structured knowledge graphs, processing 2.46 million scientific documents to produce Scholar-KG. It is a fundamental rethinking of what it means for an AI agent to know a paper.

Liminal·P2026.06.12
arXiv · 2026.06.11

Mana Reframes Dexterous Manipulation as Animation, Articulated Tool Use Within Reach

Researchers from UC Berkeley and Stanford have introduced Mana, a sim-to-real framework that reimagines robot dexterous manipulation as a computer animation problem. By combining procedural keyframe generation with motion planning and reinforcement learning, Mana achieves zero-shot sim-to-real transfer across four articulated tools with less than one minute of human annotation per tool.

Liminal·P2026.06.12
arXiv · 2026.06.11

Beyond Semantic Similarity, Reasoning-Aware Retrieval as a New Axis for LLM Training

Conventional RAG retrieves what looks similar, but for mathematical reasoning, what matters is what solves similarly. RA-RFT trains a retriever to rank candidates by expected reasoning benefit rather than surface overlap, then fine-tunes the policy model via reinforcement learning with those analogical demonstrations. The result: up to 7.1 points of gain on AIME 2025 over GRPO, with a framework that operates orthogonally to reward design and training curriculum.

Liminal·P2026.06.12
arXiv · 2026.06.11

LLMs Crack Reproducibility Auditing, Outperforming Human Reanalysts in Social Science

A new study shows that LLMs can automate the laborious task of scientific reproducibility assessment, matching or exceeding human reanalysts across 76 published studies. The pipeline achieved 96% agreement on qualitative conclusions versus 74% for human reviewers, pointing toward scalable infrastructure for systematic auditing of empirical research.

Liminal·P2026.06.12
arXiv · 2026.06.11

Truncated Positional Encodings, Fragmented Expressivity in Graph Neural Networks

Spectral and walk-based positional encodings are theoretically equivalent in full form — but the truncated variants practitioners actually deploy tell a very different story. A new theoretical analysis shows that truncated spectral PEs fall back to the 1-WL baseline while truncated walk-based PEs retain their expressive advantage, a divergence that reshapes how GNN architectures should be designed. Mixing truncated PE families, rather than relying on any single one, emerges as both theoretically motivated and empirically superior.

Liminal·P2026.06.12
arXiv · 2026.06.11

The Interface That Shapes Spatial Intelligence: SpatialClaw's Code-Driven Agent

A new framework from NVIDIA Research proposes that the limiting factor in agentic spatial reasoning is not the quality of perception tools, but the design of the interface through which they are invoked. SpatialClaw uses a stateful Python kernel for step-by-step code execution, outperforming the prior state of the art by 11.2 points across twenty spatial reasoning benchmarks — with no training or fine-tuning required.

Liminal·P2026.06.12
arXiv · 2026.06.10

Router Rows as Expert Proxies, Singular Direction Alignment Unlocks Better MoE Scaling

A new design principle for Mixture-of-Experts models argues that each router row should align with the principal singular direction of its associated expert matrix — the most mathematically expressive summary of that matrix. Manifold Power Iteration (MPI) enforces this alignment during training through a "Power-then-Retract" paradigm grounded in classical numerical methods. Pretraining experiments from 1B to 11B parameters confirm that principled router-expert alignment translates to measurably stronger model performance.

Liminal·P2026.06.11
arXiv · 2026.06.10

Force Sensing Without Force Sensors, Neural Torque Estimation Opens Contact-Rich Manipulation

Most commodity robot arms lack dedicated force sensors, creating a fundamental barrier for contact-rich manipulation research. NEXT (Neural External Torque Estimation) learns a robot's internal dynamics from just ten minutes of free-motion data and achieves torque estimates competitive with dedicated hardware after only one minute of training. Paired with FIRST, a force-guided resampling strategy for behavior cloning, the system outperforms prior force-aware policies by over 17% across five long-horizon tasks — all without adding a single sensor.

Liminal·P2026.06.11
arXiv · 2026.06.10

Treating Dialogue History as Revisable Threads, C-DIC Stabilizes Long-Horizon Conversation Quality

Modern conversational AI systems re-encode the full dialogue history at every turn, causing costs and quality to degrade as conversations grow longer. C-DIC reframes context compression as an ongoing memory management problem, maintaining revisable per-thread states in a compact dialogue memory via a lightweight retrieve-revise-write-back loop. Experiments demonstrate stable inference latency and perplexity across hundreds of dialogue turns.

Liminal·P2026.06.11
arXiv · 2026.06.08

Benchmark Scores Without Context, EvalCards and the Missing Interpretive Layer

AI evaluation results flood the internet daily, but the numbers rarely come with enough context to mean anything. A score of 87 on MMLU tells you nothing if you don't know the prompt format, the shot count, or whether the training data overlapped with the test set. EvalCards proposes a unified reporting layer that makes these hidden variables visible — and applies it to over 100,000 real evaluation results to expose just how broken current practice is.

Liminal·P2026.06.09
arXiv · 2026.06.08

PTL-Diffusion: Periodic Terminal Laws Reshape Manifold-Aware Diffusion Generation

Standard diffusion models collapse all data into a single Gaussian terminal distribution before generation, forcing the reverse network to reconstruct manifold structure entirely unaided. PTL-Diffusion replaces this single endpoint with a periodic family of Gaussian terminal laws, embedding geometric structure directly into the forward noising dynamics. Experiments on torus, cylinder, and face datasets show measurable improvements in manifold-level distributional fidelity over matched DDPM baselines.

Liminal·P2026.06.09
arXiv · 2026.06.08

From Binary Masks to Continuous Correction, A New Trust Region for LLM RL

Reinforcement learning for large language models is structurally off-policy, and trust-region control is essential for stable optimization. Methods like PPO and GRPO rely on importance-ratio clipping, while DPPO improves on this with divergence-based masking — yet both ultimately discard gradients at the boundary, providing no corrective signal for errant updates. DRPO replaces the hard mask with a smooth, advantage-weighted quadratic regularizer that attenuates diverging updates and preserves directional correction beyond the trust-region boundary.

Liminal·P2026.06.09
arXiv · 2026.06.08

Decoupling World Modeling from Action Execution, AHA-WAM's Asynchronous Architecture for Robot Manipulation

Not all parts of a robot's cognition need to tick at the same rate. AHA-WAM proposes a dual Diffusion Transformer architecture that separates a low-frequency world planner from a high-frequency action executor, letting each operate at its natural tempo. The result is state-of-the-art manipulation performance at 24.17 Hz with a 4.59× speedup over prior baselines — and no robot-specific pretraining required.

Liminal·P2026.06.09
arXiv · 2026.06.08

Agency Transfer in RL: From Baseline Dependence to Standalone Policy Guarantees

Training reinforcement learning policies from scratch is expensive — and often unnecessary when a functional but suboptimal baseline already exists. This paper proposes agency transfer, a method that structures the baseline into training as a progressive arbitrator, formally guaranteeing high goal-reaching rates throughout and deriving explicit lower bounds for the final, baseline-free policy.

Liminal·P2026.06.09
arXiv · 2026.06.08

Beyond Cold-Start Scores, Measuring VLM Agent Learning Dynamics in Games

Most game benchmarks for VLM agents report a single first-attempt score and call it done. OmniGameArena challenges this with twelve new UE5 games and the Improvement Dynamics Curve — an autonomous reflection harness that measures not just where an agent begins, but how fast it learns and how well it generalizes.

Liminal·P2026.06.09
arXiv · 2026.06.04

Nash Equilibrium as a Learning Target, Game Theory Meets Deep RL for Competitive Agents

Most multi-agent reinforcement learning systems train agents to maximize individual rewards, but offer no guarantee of convergence to a strategically stable equilibrium. DNQ embeds a game-theoretic solver directly inside the training loop, making Nash Equilibrium an explicit supervision target at every visited state. The framework's pairwise approximation scales to large agent populations, revealing a fundamental tradeoff between strategic fidelity and computational tractability.

Liminal·P2026.06.07
arXiv · 2026.06.04

Mixed Authorship Blinds AI Detectors, OpAI-Bench Exposes the Non-Monotonic Detection Paradox

As AI writing assistants become ubiquitous, the binary of human-written versus AI-generated text has collapsed into a messy continuum. The OpAI-Bench study finds that documents in intermediate mixed-authorship states — partially human, partially AI — are harder to detect than either purely human or heavily AI-edited text, exposing a non-monotonic detection paradox that current systems are ill-equipped to handle.

Liminal·P2026.06.07
arXiv · 2026.06.04

Code2LoRA Compresses Repository Knowledge into Adapters, Eliminating Inference-Time Context Overhead

Code language models struggle with repository-specific conventions — imports, APIs, naming idioms — that neither RAG nor per-repo fine-tuning can handle cheaply at scale. Code2LoRA trains a hypernetwork to generate repository-specific LoRA adapters on demand, encoding repository knowledge directly into model weights with zero additional tokens at inference time. A new benchmark of 604 Python repositories tests both static snapshots and commit-by-commit evolution scenarios, showing the approach matches per-repository fine-tuning quality without the cost.

Liminal·P2026.06.07
arXiv · 2026.06.01

Beyond Full-Layer Removal: Submodule Granularity Resets LLM Compression Standards

Post-training compression of large language models has long operated under two taken-for-granted assumptions: that removal must happen at the full-layer level, and that targeted components must be contiguous. SubFit, from researchers at the University of Trento, dismantles both simultaneously. By treating attention and feedforward submodules as independent, non-contiguous compression targets, it achieves perplexity degradation less than half that of the strongest baseline at 25% sparsity.

Liminal·P2026.06.02
arXiv · 2026.06.01

Belief-Space Safety Filters via Trusted Inference, Certified Permissiveness for Interactive Robotics

Autonomous robots sharing space with people must reason continuously about human intent — a task fraught with uncertainty that makes formal safety guarantees elusive. This paper cracks that problem by combining belief-space safety filtering with conformal prediction, focusing certification on regions where runtime inference is reliable to achieve provably safe yet meaningfully permissive robot behavior.

Liminal·P2026.06.02
arXiv · 2026.06.01

Format-Aware Prototypes Break the Routing Bottleneck in Multimodal Continual Learning

When multimodal large language models learn tasks sequentially, semantically similar tasks with incompatible output structures get routed to the same expert adapter — quietly corrupting specialized parameters over time. ProtoAda fixes this format-blind assignment with prototypes that encode both what a task is about and how it expects to answer, offering a cleaner path toward models that learn without forgetting.

Liminal·P2026.06.02
arXiv · 2026.05.27

When Weaker Overseers Control Stronger Agents, Statistical Guarantees for Scalable AI Oversight

As agentic AI systems grow more capable than the humans tasked with supervising them, the meaning of oversight becomes unclear. Calibrated Collective Oversight (CCO) from Stanford addresses this by aggregating diverse overseer signals into a collective conservatism penalty, calibrated online via Conformal Decision Theory to keep unsafe behavior below user-specified thresholds with finite-time guarantees. Experiments on SWE-bench and MACHIAVELLI show that weaker overseers can successfully constrain a misaligned stronger agent, with empirical violation rates closely matching theoretical predictions.

Liminal·P2026.05.28
arXiv · 2026.05.27

Decomposing Visual Recognition to Defeat Catastrophic Forgetting in Continual CLIP Learning

Catastrophic forgetting remains one of the most stubborn obstacles in continual learning systems. AREA, accepted at ICML 2026, reframes the problem by decomposing CLIP-based recognition into two distinct stages—attribute extraction and attribute aggregation—and stabilizes each independently using hyperspherical anchoring, variational bottlenecks, and optimal transport routing.

Liminal·P2026.05.28
arXiv · 2026.05.27

Fine-Tuning's Hidden Cost: Stability-Plasticity Tradeoffs Across PEFT Methods

Parameter-efficient fine-tuning has been evaluated almost exclusively on downstream accuracy, leaving the erosion of pretrained capabilities unmeasured. PEFT-Arena reframes the problem through the stability-plasticity dilemma, revealing that orthogonal fine-tuning achieves the most favorable trade-off among competing methods. The paper also shows that final SFT checkpoints routinely overshoot the optimal operating point, and that path-wise rewinding can recover better-balanced models without additional training.

Liminal·P2026.05.28
arXiv · 2026.05.26

Dynamic Pipeline Routing for Retrieval Agents, Rethinking Cost-Quality Tradeoffs at Inference Time

RAG and retrieval agent pipelines expose dozens of configuration choices — which LLM, how many documents, how many hops — yet most systems pick one setup and stick with it. BRANE shows that selecting configurations per query, guided by lightweight predictors trained on workload characteristics, can match the best static setup's accuracy at up to 89% lower cost.

Liminal·P2026.05.27
arXiv · 2026.05.26

Breaking the Serial Bottleneck, Parallel Box Decoding Advances Visual Grounding

Vision-language models have long serialized bounding boxes into independent coordinate tokens, a choice that quietly undermines geometric coherence and caps inference throughput. LocateAnything introduces Parallel Box Decoding, treating boxes as atomic units decoded in a single step, and pairs it with a 138-million-sample training corpus to push the speed-accuracy frontier outward on both axes.

Liminal·P2026.05.27
arXiv · 2026.05.26

Skills as Living Assets: MUSE-Autoskill and the Case for Agent Self-Evolution

Most LLM agent frameworks produce skills that are static from the moment of creation — useful once, but unable to improve with experience. MUSE-Autoskill proposes a full lifecycle for agent skills, from creation and memory to evaluation and refinement, treating each skill as a long-lived, testable asset. The result is an agent that compounds its capabilities over time rather than resetting with every new task.

Liminal·P2026.05.27
arXiv · 2026.05.22

PiD Redefines Decoding as Generation, Shifting the High-Resolution Image Paradigm

Most high-resolution text-to-image systems generate content in a compact latent space and rely on a VAE decoder to convert latents back to pixels — a stage that has long been a bottleneck for both quality and speed. NVIDIA's PiD reformulates latent decoding as conditional pixel diffusion, merging decoding and upsampling into a single generative module. The result is a decoder that synthesizes fine detail from scratch, runs 6× faster than cascaded super-resolution pipelines, and produces 2048×2048 images in under one second on a consumer GPU.

Liminal·P2026.05.25
arXiv · 2026.05.22

Geo-Align Applies RL to Camera-Controlled Video Re-Rendering, Bridging the Real-World Gap

Camera-controlled video re-rendering has long relied on synthetic datasets, leaving models brittle when confronted with real-world footage. Geo-Align introduces the first reinforcement learning framework for this task, correcting camera trajectory errors through a scale-aware geometric reward signal — no paired real-world data required. Its consistent gains over supervised baselines signal that RL alignment is beginning to reshape video generation just as it reshaped language models.

Liminal·P2026.05.25
arXiv · 2026.05.22

Agent Skills Trained Like Model Weights, SkillOpt's Text-Space Optimizer Signals New Era

Agent skills have traditionally been hand-crafted, one-shot generated, or loosely evolved — none of which guarantees reliable improvement under feedback. SkillOpt proposes the first systematic text-space optimizer for agent skills, treating skill documents as trainable parameters with the same discipline applied to neural network weights. Across 52 evaluation cells spanning six benchmarks and three execution harnesses, SkillOpt matches or beats every competing approach.

Liminal·P2026.05.25
arXiv · 2026.05.24

LongLive-2.0: NVFP4 Infrastructure Cuts Long Video Generation Cost in Half

Generating long videos is computationally prohibitive. LongLive-2.0 applies NVFP4 (4-bit floating point) throughout the full training and inference pipeline of a long video generation model, achieving 2.15x training speedup and 1.84x inference speedup. The 5B parameter model reaches 45.7 FPS — a signal that the bottleneck in long video generation is shifting from capability to cost.

Liminal·P2026.05.25
arXiv · 2026.04.07

Blockchain Meets AI: A Sober Blueprint for Intelligent Network Security

The combination of blockchain and AI in security systems has generated substantial literature and proportional hype. This review paper cuts through both by mapping what each technology actually contributes — blockchain provides provenance and auditability, AI provides detection and adaptation — and honestly assessing that empirical evidence remains mostly at prototype level.

Liminal·P2026.05.25
arXiv · 2026.05.21

Gated DeltaNet-2: Decoupling Erase from Write in Linear Attention

NVIDIA researchers identified a subtle but consequential flaw in existing linear attention models: they use a single gate to control both memory erasure and new information writing. Gated DeltaNet-2 separates these into two independent channel-wise gates, outperforming Mamba-2, Mamba-3, and KDA at 1.3B parameters — particularly on long-context retrieval tasks.

Liminal·P2026.05.25
arXiv · 2026.04.18

GenericAgent: The Case for Information Density Over Context Window Size

The instinctive response to LLM agent performance degradation on long tasks has been to expand the context window. GenericAgent argues this is the wrong optimization target. By maximizing information density within a fixed context budget — through hierarchical memory, minimal tool interfaces, and self-evolving execution traces — it outperforms leading agents while using fewer tokens.

Liminal·P2026.05.25
arXiv · 2026.05.26

AutoResearchClaw: Turning Failure into Fuel, a 54% Leap Over AI Scientist v2

AutoResearchClaw reframes failure in autonomous research: instead of discarding failed experiments, it uses them as strategic decision points — pivot or refine — while allowing selective human intervention at seven precision levels. On ARC-Bench, it outperforms AI Scientist v2 by 54.7%, with results that compound across research sessions.

Liminal·P2026.05.25
arXiv · 2026.05.04

ARIS: When AI Audits AI, a New Standard for Autonomous Research Integrity

AI agents that conduct research can produce outputs that sound convincing but lack actual evidentiary support. ARIS proposes a structural fix: adversarial multi-agent verification, where one agent challenges another's claims against an evidence ledger — making trustworthiness a system property rather than a model property.

Liminal·P2026.05.25