WhyOps

Blog

Insights on AI agent observability, debugging, and building reliable autonomous systems.

Kimi K2.7-Code is Out. The Open-Source Coding Model That Thinks Less

A post in r/singularity with the title "Kimi 2.7 code is released & open-sourced" hit 100 upvotes fast. Over on X, the announcement from @Kimi_Moonshot crossed 2,600 likes while i was refreshing the p

Jun 12, 20267 min read

Claude Fable 5

Inside Claude Fable 5: The Beast, the Limiter, the Fallout

Simon Willison spent 5.5 hours throwing everything he had at it. His verdict: "it's a beast." Across the same hours, another phrase was circulating in the same threads: "a Ferrari with a 30mph limiter

Jun 11, 20267 min read

Grok Build

Grok Build Hype vs Reality: A Look at Real User Reactions

"Grok Build feels like the previous generation of coding models." That's from a Reddit post on r/grok. From someone who actually paid $99 to try SuperGrok Heavy. Another commenter chimed in with their

May 31, 20267 min read

open-slide

open-slide Hit 4k Stars. The Reactions Told a Real Story

Yiwei Ho posted every slide at the Rayboba event was built with open-slide. Not a demo. Not a mockup. Decks for a community event hosted by Raycast. And the slides looked clean. This matters because s

May 30, 20267 min read

Claude Design alternative

Two Prompts Was All It Took to Ditch Claude Design for Open Source

Someone on Reddit signed up for a Claude subscription to try Claude Design. They hit their weekly quota after two prompts. Two prompts. The thread was full of people nodding along. The hype was enormo

May 30, 20267 min read

oh-my-pi coding agent

The oh-my-pi Hype: Hashline, Reactions, and What Everyone Missed

There's a number floating around that i cannot stop thinking about. Grok 4 Fast went from 6.7% to 68.3% on coding edits. No new model release. No training data dump. Just a better tool format. The cre

May 30, 20267 min read

Step 3.7 Flash

Step 3.7 Flash: The 198B MoE Model Everyone Is Actually Running

someone on X posted a photo of a DGX Spark sitting on a regular desk with a terminal window running step 3.7 flash. a 198 billion parameter vision model. on a box that fits next to a monitor. and it w

May 30, 20267 min read

Claude Opus 4-8

Anthropic Called Opus 4.8 Modest. The Internet Disagreed.

Anthropic called Claude Opus 4.8 "a modest but tangible improvement" in their own launch post. That one line tells you more about where AI stands right now than any benchmark score on a spreadsheet. T

May 30, 20267 min read

GPT-Realtime-2

The hype around GPT-Realtime-2 and what actually landed

Thirty four points and five Hacker News comments is not what a voice breakthrough is supposed to look like. That was one of the first weird signals around OpenAI’s May 7 release of GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. On ...

May 8, 20267 min read