The Weekly Inference #013

»This Week

Anthropic’s near-trillion-dollar valuation landing the same week AI cracked two multi-decade mathematical problems captures the defining tension of this moment: the technology is outrunning every prior assumption about its ceiling, while simultaneously failing to agree on basic facts, resisting correction, and generating enterprise bills that outpace measurable returns. What connects Gemini Spark’s bid for permanent residence on personal devices, Nvidia’s $150 billion Taiwan commitment, and GitHub Copilot’s unpopular token-billing pivot is a single underlying dynamic — the race to lock in infrastructure, platforms, and pricing power before the market fully understands what it’s actually buying. The capability frontier and the reliability floor are moving in opposite directions, and the capital flooding in this week is betting the former wins before the latter becomes disqualifying.

This Week
Top Stories

»Top Stories

»AI Model Training & Infrastructure

270 articles

Anthropic published the Claude Opus 4.5 system card detailing containment strategies across products, alongside guidance on building robust model organisms for training safety research [1] [2]
Researchers and practitioners are developing shared evaluation playbooks for third-party AI assessments, while PyTorch profiling tools offer infrastructure-level visibility into model training performance [3] [4]
AI search agents show a documented tendency to confirm prior beliefs rather than conduct genuine web research, raising reliability concerns for training data pipelines and inference applications [5]

Why it matters: As AI systems grow more capable and widely deployed, the gap between rigorous evaluation standards and actual model behavior in production represents a concrete risk that safety researchers and infrastructure engineers are racing to close simultaneously.

Cited sources:

[1] Claude Opus 4.8: The System Card thezvi.substack.com
[2] Advice for making robust-to-training model organisms alignmentforum.org
[3] A shared playbook for trustworthy third party evaluations openai.com
[4] Profiling in PyTorch (Part 1): A Beginner’s Guide to torch.profiler huggingface.co
[5] AI search agents often confirm what they already know instead of actually researching the web the-decoder.com

»AI Robotics and Chip Systems

96 articles

Nvidia committed $150B to Taiwan chip manufacturing [1] while SoftBank pledged up to €75B to build French data centers [2], reflecting massive geographic diversification of AI infrastructure investment outside the US.
Mistral’s CEO confirmed the company is exploring designing its own chips as it scales infrastructure [3], mirroring a broader industry push toward custom silicon that chip designers are increasingly finding career success in [4].
Amazon solved a key technical bottleneck enabling next-generation data center architecture [5], as Irish households absorb millions in hidden costs from existing data center energy demand [6].

Why it matters: The race to control AI compute is fragmenting across continents and company types — from hyperscalers to AI startups — meaning chip supply, energy policy, and national investment strategies are now direct determinants of who leads in AI capability.

Cited sources:

[1] Nvidia bets $150B on Taiwan as Trump’s plan to make US an AI hub backfires arstechnica.com
[2] SoftBank says it will invest up to €75 billion to build French data centers techcrunch.com
[3] Mistral to explore designing own chips, CEO says, as it ramps up infrastructure build cnbc.com
[4] Finding Success in Industry as a Chip Designer spectrum.ieee.org
[5] Amazon Thinks the Future of Data Centers Depends on a Technical Problem It Just Solved wired.com
[6] ‘Hidden datacentre tax’ costing Irish households millions, report says theguardian.com

»AI Startup Funding Rounds

74 articles

Anthropic finalized a $65 billion funding round at a $961–965 billion valuation, surpassing OpenAI to become the world’s most valuable AI company [1] [2] [3]
The deal’s cap table reflects a structure closer to industrial policy than traditional venture capital, with major sovereign and strategic backers dominating [3] [4]
Smaller AI infrastructure deals also closed the same week, including Oslo-based Cloudgeni raising €858k for secure cloud AI agents [5] and Focused Energy raising $240M [6]

Why it matters: Anthropic’s near-trillion-dollar valuation compresses the window for competitors and reframes AI startup funding as a geopolitical instrument, not just a technology bet.

Cited sources:

[1] Anthropic finalises $65bn funding deal to surpass OpenAI’s valuation ft.com
[2] Anthropic reaches valuation of $965bn, beating OpenAI to become world’s most valuable AI firm theguardian.com
[3] Anthropic just closed a $65B round at a $965B valuation, and the cap table reveals something closer to industrial policy than a venture deal siliconcanals.com
[4] The Week’s 10 Biggest Funding Rounds: Anthropic Dominates In An Otherwise Slower Week For Megarounds news.crunchbase.com
[5] Oslo-based Cloudgeni raises €858k to build reliable AI agents for secure cloud infrastructure eu-startups.com
[6] Focused Energy raises $240M, TrueLayer acquires In3, and London regains top spot tech.eu

»AI Mathematical Research Breakthroughs

28 articles

AI systems recently cracked an 80-year-old mathematical problem and a separate 50-year-old problem, with start-ups racing to deploy similar tools across pure and applied mathematics [1] [2] [3]
Terence Tao argues AI could introduce a division of labor to mathematics for the first time in history, allowing researchers to offload routine verification and pattern-searching while focusing on creative leaps [4]
OpenAI’s math breakthroughs leverage AI’s core strengths in exhaustive search and pattern recognition, with researchers noting full automation of R&D could substantially accelerate scientific progress [5] [6]

Why it matters: Mathematics has historically been a domain where AI showed the least traction — these results suggest the field is now a frontier rather than a ceiling, with implications for every science that depends on unsolved math.

Cited sources:

[1] An AI Solution to an 80‑Year‑Old Problem Has Shocked Mathematicians singularityhub.com
[2] Mathematical AI helps researchers crack 50-year-old problem newscientist.com
[3] Start-ups are racing to revolutionise mathematics with AI newscientist.com
[4] Terence Tao argues AI could bring division of labor to math for the first time in history the-decoder.com
[5] OpenAI’s math breakthrough played to AI’s strengths understandingai.org
[6] Full automation of AI R&D probably yields a large speed up even without a software-only singularity alignmentforum.org

»Gemini Spark AI Assistant Launch

27 articles

Google launched Gemini Spark, a 24/7 AI assistant designed for persistent, personal access to users’ daily lives and devices [1] [2]
Reviewers found Gemini Spark broadly useful in real-world tasks [1], though hands-on testing revealed quirks in how it interprets personal relationships and context [2]; separately, Google has been addressing quota-burning bugs in Gemini’s usage limits [3]
Apple is reportedly working to integrate a large Gemini model directly into the iPhone to power a rebuilt Siri [4], while iOS 27 leaks indicate Siri is being redesigned to behave more like ChatGPT [5]

Why it matters: Gemini Spark’s launch puts a persistent, life-integrated AI assistant in direct competition with a rapidly evolving Siri — the battle for the default AI layer on personal devices is no longer hypothetical.

Cited sources:

[1] I put Google’s 24/7 AI assistant Gemini Spark to work, and it’s actually pretty useful techcrunch.com
[2] Hands-On With Gemini Spark: I Gave It Access to My Life and It Friend-Zoned My Boyfriend wired.com
[3] Google fixes several bugs in Gemini usage limits that burned through quotas too fast the-decoder.com
[4] Apple working to cram massive Gemini model into iPhone to power new Siri arstechnica.com
[5] Apple iOS 27 Leaks: Siri Is Being Remade to Be More Like ChatGPT decrypt.co

»LLM Reliability and Failure Modes

13 articles

Researchers warn that all major LLMs are vulnerable to multi-turn manipulation attacks [1], and a separate study finds AI models cannot agree on basic facts the majority of the time [2], while LLMs persist in believing false statements even after receiving explicit warnings that those statements are false [3].
Ensemble monitoring using diverse signals outperforms throwing more compute at single-model oversight [4], and most AI agents fail in production because they are architected incorrectly from the start [5].
AI-generated or AI-manipulated content has made crowd size imagery untrustworthy [6], and human–AI interactions are actively reshaping personal identity and social networks [7].

Why it matters: The cumulative picture is one of compounding unreliability — models hallucinate, disagree, resist correction, and can be manipulated conversationally, meaning organizations deploying LLMs without layered, diverse monitoring are exposed to failure modes that no single safeguard can catch.

Cited sources:

[1] All Major LLMs Exposed to Multi-Turn Manipulation, Warn Researchers infosecurity-magazine.com
[2] AI Models Can’t Agree on Basic Facts Most of the Time, Study Shows decrypt.co
[3] LLMs believe false statements even after explicit warnings that they’re false arstechnica.com
[4] Ensemble monitoring for AI control: diverse signals outweigh more compute lesswrong.com
[5] Most AI Agents Fail in Production Because They’re Built Backwards towardsdatascience.com
[6] Don’t believe crowd sizes anymore reddit.com
[7] Human–AI interactions reshape the self and our social networks nature.com

»AI Coding Agents and Workflows

12 articles

Salesforce reported AI agents compressed a 231-day migration project to just 13 days with fewer incidents, while Cognition’s Scott Wu argues coding agents should augment rather than replace human developers [1] [2] [1]
Warp is making a large bet on building open source tooling powered by GPT-5.5, reflecting a broader industry push to embed frontier models directly into developer workflows [3]
Organizational and economic adoption lags persist — agentic AI demands structural redesign of teams and processes, and many companies have yet to see productivity gains reach their bottom lines [4] [5] [6]

Why it matters: The gap between AI coding agents’ headline performance numbers and real-world organizational payoff reveals that the bottleneck is no longer model capability — it’s whether companies can restructure workflows fast enough to capture the value.

Cited sources:

[1] Cognition’s Scott Wu says AI coding agents shouldn’t replace humans techcrunch.com
[2] Salesforce claims AI agents cut a 231-day migration to 13 days with fewer incidents the-decoder.com
[3] Warp’s big bet on building open source with GPT-5.5 openai.com
[4] Rethinking organizational design in the age of agentic AI technologyreview.com
[5] 🔮 Why AI isn’t showing up on your bottom line exponentialview.co
[6] The agentic divide: Why “good enough” AI isn’t enough to survive the new economy restofworld.org

»AI API Pricing & Token Economics

11 articles

GitHub Copilot’s shift to token-based billing drew sharp criticism from developers, with users calling it “a joke” as unpredictable usage costs replace flat subscription fees [1], while firms more broadly report AI spending outpacing measurable business value [2].
”Tokenmaxxing” — the practice of padding prompts to maximize model output — is declining as companies scrutinize agentic coding costs, pushing developers toward leaner prompt strategies [3], and some organizations are now running self-hosted open-weight models to escape per-token API charges from Anthropic and OpenAI [4].
AI bills continue rising for enterprise customers even as underlying inference costs fall, because heavier usage and more capable (costlier) models offset efficiency gains [5], while Anthropic and OpenAI show strong product-market fit that sustains pricing power [6].

Why it matters: The gap between falling AI infrastructure costs and rising enterprise AI bills reveals that vendors are successfully capturing efficiency gains as margin rather than passing savings to customers — making self-hosting and prompt discipline increasingly rational economic choices for high-volume users.

Cited sources:

[1] ‘What a joke’: Github Copilot’s new token-based billing spurs consternation among devs techcrunch.com
[2] Firms spent heavily on AI. Now rising costs are outpacing its value scmp.com
[3] ‘Tokenmaxxing’ Starts to Fade as Companies Eye Agentic Coding Costs newcomer.co
[4] Anybody ran the numbers and decided self hosting open weight models for your employees makes more sense than your company paying Anthropic/OpenAI/etc? reddit.com
[5] 📈 Why AI bills rise as costs fall exponentialview.co
[6] I think Anthropic and OpenAI have found product-market fit simonwillison.net

»Codex Enterprise Deployments

6 articles

Cisco, Endava, and Braintrust have each deployed OpenAI’s Codex to automate enterprise software engineering tasks, with Endava building a fully agentic engineering organization and Cisco redefining internal development workflows using Codex agents [1] [2] [3]
Codex can now operate Windows PCs autonomously — hunting bugs, running tests, and executing multi-step coding tasks without human intervention — extending its reach beyond text generation into active system control [4]
Specialized deployments include Codex-powered self-improving tax agents that refine their own logic over time, demonstrating the platform’s expansion into domain-specific, high-stakes professional workflows [5]

Why it matters: With major enterprises across consulting, networking, and finance embedding Codex into core engineering pipelines, AI-assisted development is shifting from a productivity tool to an autonomous operational layer — raising the stakes for organizations that have not yet built governance frameworks around agentic code execution.

Cited sources:

[1] How Endava builds an agentic organization with Codex openai.com
[2] Cisco and OpenAI redefine enterprise engineering with Codex openai.com
[3] How Braintrust turns customer requests into code with Codex openai.com
[4] OpenAI’s Codex can now operate your Windows PC autonomously, hunting bugs and testing apps on its own the-decoder.com
[5] Building self-improving tax agents with Codex openai.com

»AI in Advertising and Media Platforms

26 articles

Google folded Display & Video 360 display ads into its AI-first Demand Gen platform [1] while simultaneously testing new branded search controls inside AI Max campaigns [2], consolidating its advertising stack around AI-native tooling.
OpenAI secured a strategic content partnership with Brazil’s Grupo Folha and Grupo UOL [3] and is working with retail media platform Skai to bring commerce advertisers directly into ChatGPT [4], while publishers are quietly signing “six-figure” AI content licensing deals through Snowflake’s platform [5].
Indian founders are leveraging a recent court ruling to revive antitrust criticism of Google’s ad business [6], adding legal pressure as AI search erodes traditional click-based traffic [7] and TikTok advances its super app ambitions [8].

Why it matters: The advertising industry’s core revenue logic — clicks, display impressions, and open-web publisher deals — is being restructured simultaneously from multiple directions, meaning brands, publishers, and platforms are all being forced to renegotiate their relationships with AI at the same time.

Cited sources:

[1] Google folds Display Ads into AI-first Demand Gen platform artificialintelligence-news.com
[2] Google appears to be testing new branded search controls in AI Max campaigns searchengineland.com
[3] OpenAI, Grupo Folha and Grupo UOL announce strategic content partnership openai.com
[4] Future of Marketing Briefing: OpenAI is working with Skai to bring retail and commerce advertisers into ChatGPT digiday.com
[5] Publishers quietly cut ‘six-figure’ deals via Snowflake’s AI licensing platform digiday.com
[6] Founders seize on Indian court ruling to revive criticism of Google’s ad business techcrunch.com
[7] AI search may kill the click. But users still need to trust the answers fastcompany.com
[8] TikTok’s road to becoming a super app techcrunch.com

»Autonomous AI Systems & Applications

10 articles

The UK military is examining protocols that would allow lethal strikes to be executed without human approval [1], while the NBA moves to deploy an AI system for automatic out-of-bounds calls [2], reflecting AI autonomy expanding across both high-stakes and commercial domains.
BMW declared humanoid robots “the future” of car manufacturing [3], and MISUMI Group committed $1B to AI and digital manufacturing expansion across the Americas [4], marking major industrial bets on autonomous physical systems.
Boston Children’s Hospital deployed AI to surface previously undetected diagnoses [5], and broader governance concerns around autonomous AI operating in physical environments are drawing scrutiny [6].

Why it matters: As autonomous AI systems move from software into physical and lethal domains simultaneously, the gap between deployment speed and governance frameworks is becoming a concrete policy and safety liability, not a theoretical one.

Cited sources:

[1] UK military looks at allowing lethal strikes without human approval ft.com
[2] NBA plans AI system for automatic out-of-bounds calls artificialintelligence-news.com
[3] Humanoid robots ‘the future’ of car making, says BMW bbc.com
[4] MISUMI Group invests $1B in Americas, global AI and digital manufacturing therobotreport.com
[5] Boston Children’s uses AI to unlock new diagnoses openai.com
[6] Autonomous AI systems test governance in physical environments artificialintelligence-news.com

»AI Deployment in Banking

6 articles

TD Bank’s new AI agent cut mortgage decision time by 15 hours [1], while MUFG is pursuing a full AI-native transformation through a partnership with OpenAI [2], marking a concrete operational shift across major global banks.
Banks are increasing AI investment primarily to reduce costs [3], with AI also being deployed to convert website visitors into customers and close revenue gaps [4] — extending AI’s role beyond back-office efficiency into active revenue generation.
74% of professionals consider AI essential, yet report their organizations lag in implementation [5], a gap that financial services regulators and practitioners are actively working to navigate [6].

Why it matters: Banks are moving past pilot programs into production-scale AI deployment across lending, customer acquisition, and core operations — institutions that close the implementation gap fastest stand to gain durable competitive advantages in cost structure and customer conversion.

Cited sources:

[1] TD’s new AI agent shaves 15 hours off mortgage decisions americanbanker.com
[2] MUFG aims to become AI-native with OpenAI openai.com
[3] Exclusive research: Banks up AI investment to cut costs americanbanker.com
[4] From visitors to customers: How AI Is closing the revenue gap for banks americanbanker.com
[5] 74% of Professionals Call AI Essential But Their Companies Lag Behind marketingaiinstitute.com
[6] Navigating AI in financial services americanbanker.com

»AI Search and Browser Alternatives

8 articles

Users and creators alike are pushing back against Google’s AI Overviews and AI Mode, with viral posts mocking AI-generated search results for mundane queries like looking up a Family Guy clip [1] [2] [3] [4]
At least 9 Google Search alternatives exist that restore traditional link-based results for users frustrated with AI-dominated answers [5]
Browser alternatives to Chrome and Safari are gaining attention in 2026 as dissatisfaction with dominant platforms grows alongside search discontent [6]

Why it matters: User frustration with AI-first search is creating real market opportunity for alternative search engines and browsers — a pressure Google hasn’t faced at this scale in over a decade.

Cited sources:

[1] Google’s AI mode is threatening me… i was just trying to look up a family guy clip… reddit.com
[2] let me ask Google what am I allowed to search! reddit.com
[3] “Groogle” reddit.com
[4] New google search useless query just dropped reddit.com
[5] Tired of AI Overviews? I found 9 Google Search alternatives that showed me links again zdnet.com
[6] As the browser wars heat up, here are the hottest alternatives to Chrome and Safari in 2026 techcrunch.com

»AI Booed at Graduation Ceremonies

6 articles

Comedian Ronny Chieng urged Harvard graduates to “Destroy AI” to loud cheers, exemplifying a wave of campus commencement speeches in which pro-AI remarks drew audible booing from student audiences [1] [2]
Students explained their reactions by saying pro-AI speakers were “not reading the room,” reflecting frustration that commencement addresses promoted technology many graduates view as a threat to their futures [3]
Critics warn that “AI successionists” — those who openly welcome AI replacing human roles — are gaining influence, prompting calls to build a new humanism as a cultural counterweight [4]

Why it matters: Graduation ceremonies are a cultural barometer of generational values, and the consistent booing signals that the next wave of degree-holders entering the workforce is skeptical of — or actively hostile to — the AI-optimist consensus dominant in tech and business leadership.

Cited sources:

[1] Ronny Chieng Tells Harvard to ‘Destroy AI’ as Graduates Cheer reddit.com
[2] The AI Hype Index: AI gets booed in graduation season technologyreview.com
[3] US students on why they booed their pro-AI graduation speakers: ‘They’re not reading the room’ theguardian.com
[4] The people who actually want AI to replace humanity - We need to create a new humanism before the “AI successionists” win. reddit.com

~/

The Weekly Inference #013

»This Week

»Top Stories

»AI Model Training & Infrastructure

»AI Robotics and Chip Systems

»AI Startup Funding Rounds

»AI Mathematical Research Breakthroughs

»Gemini Spark AI Assistant Launch

»LLM Reliability and Failure Modes

»AI Coding Agents and Workflows

»AI API Pricing & Token Economics

»Codex Enterprise Deployments

»AI in Advertising and Media Platforms

»Autonomous AI Systems & Applications

»AI Deployment in Banking

»AI Search and Browser Alternatives

»AI Booed at Graduation Ceremonies