Friday, July 3, 2026

Artificial Necessity Manifesto

For twenty years I've dreamed of building software where users truly control their data and how it shows up in their lives. I never could — the scope was too vast. Two more public efforts are Mitch Kapor's Chandler, Google Wave - ambitious projects with real funding that couldn't pull it off.

This is the year AI changes that. I'm building a Collaborative Conducting Environment — a new kind of tool where you and AI work as partners, complementing each other's strengths and covering for each other's weaknesses. It's not about sending the AI off to do your work. It's about working together.

I'm already using my CCE for 100% of my AI software development, not Cursor, not Claude Code, my tool - because for my workflow it's already radically more productive. I could explain what it *is*, but those would be wasted words, as it changes every day, and I can't explain what it will become, because it's a new kind of thing. Every explanation is based on expectations - and that makes them fall short. You have to use it to understand it.

What I can explain is what I believe and why I'm building it.

Memory Safe Software. Now.

Every week, the software you rely on — your browser, your PDF reader, your phone — gets hacked because it's built with tools that let attackers reach into your computer's memory and take control. These aren't rare events. Chrome alone patches hundreds of these vulnerabilities a year. Your bank, your photos, your passwords — can all be stolen because we keep building software the same way we did in 1972. The answer isn't patching faster. Garbage Collected Memory Safety removes entire classes of vulnerabilites that are present in C, C++, and yes, Rust.

In 2024, our highest office issued a call to action: The White House: Future Software Should Be Memory Safe. I'm answering that call. I am already working in my Collaborative Conducting Environment, well on my way to deliver memory-safe versions of the critical software we all use — a web browser, PDF viewer, image editor, and AI collaboration studio, shipping on all platforms. And I'm doing it by myself. One programmer. Incredible AI leverage. Because I want people to see what AI is truly capable of when you work with it as a partner, instead of sending it off to work alone.

White House: Path Towards Secure and Measurable Software: Final-ONCD-Technical-Report.pdf

AI-Sovereign Software. Yours.

In 1985, Richard Stallman published a radical vision: users should control their software, not the other way around. The GNU GPL and the open source movement that followed reshaped the industry — but forty years later, the software most people actually use is still proprietary. Windows, macOS, iOS, iMessage, Snapchat, Instagram. Open source won the server room and lost the living room.

In the near future, every piece of software and data becomes more powerful when deeply integrated with AI. But that creates a tension. The history of software is locking you in for revenue. As users, none of us want software that captures. We want software that does what we want.

I'm building a configurable AI-embedded platform where you own your data and control your UI — email, messaging, writing, artistic work, coding, everything. Not designed around revenue capture. Designed around you.

The last piece of software you need, because it becomes whatever you need it to be.

http://artificialnecessity.com

Progress.

I use my self-authored AI harness everday. It's the only way I code with AI anymore.

My browser engine is already rendering complex HTML5 with Chrome-reference conformance across dozens of layout cases involving the all major layout features.

And for those of you who want something tangible, below is a picture of what my desktop looks like every day. On the left you can see my memory-safe PDF viewer (written in < 10 hours) along side FoxitPDF. On the top you can see the current state of my HTML5 test results conformance vs Chrome - for my memory-safe HTML5 engine. On the botom you can see my Book-View window, used for display of collaborative fiction writing sessions with my Story-Writer AI collaboration orchistration. And behind it all you can see a dozen windows of my CCE.

Here are some recent HTML5 conformance of Chrome vs my FluidDOM, 100% memory safe down to the GPU pixel drawing.

..and here is a report pull of my git-hub repos..

..that is 670k lines of code + 400k lines of markdown, produced in <1 year by one person...

...is it good? I sold software at 17 y/o, sold a company to Google at 27 after writing half the code..

...and I retired at 33. I think it's good. You can judge when I release in Q3 2026..

Thursday, July 2, 2026

Rust Is Not Memory Safe

Rust Is Not a Memory-Safe Language

David Jeske (Artificial Necessity LLC), with Claude (Anthropic)

Status: Draft Last Updated: 2026-07-02

Abstract

Rust is not a memory-safe language. Memory safety, as the term has always meant in language design — the property that programs in the language cannot violate memory integrity — is a compositional property of a language and its runtime: it holds for all programs, over all data shapes, unconditionally. Rust does not have this property. What Rust has is a static checker that verifies an ownership discipline over tree-shaped data only, plus a set of unsafe escape hatches whose use is not optional but structurally mandatory: any cyclic, self-referential, or densely shared data structure — doubly-linked lists, graphs, DOMs, scene graphs, caches, entity systems — is inexpressible under the checker and forces the program into hand-written unsafe, runtime reference counting, or unchecked index schemes. Because these structures dominate real systems software, the unverified surface of a Rust program grows with the program's structural complexity, and at engine scale the observed end state — in Servo, Bevy, and Zed alike — is that every large Rust project builds its own bespoke, unverified object-lifetime runtime. A language whose safety property is conditional on universally quantified, undischarged proof obligations across its entire ecosystem, whose checker cannot express the core data structures of the domain it targets, and whose flagship projects manage memory through hand-audited runtimes, is not a memory-safe language under any definition of the term that does not also admit reference-counted C++. Garbage-collected platforms are memory safe. Rust is a memory-disciplined language with a memory-safe subset for trees — and the difference is not pedantry; it is the difference between a property of the language and a property of small programs.

1. Thesis

Rust is not a memory-safe language. This paper defends that statement literally, under the ordinary meaning "memory-safe language" has carried since the term was coined: a language in which programs cannot commit memory-safety violations — no use-after-free, no dangling dereference, no out-of-bounds access — as a property of the language, holding for all programs a developer can write in it.

The prevailing description of Rust — in its own project materials, in vendor adoption cases, and in US federal cybersecurity guidance, which classify Rust alongside garbage-collected languages under the label "memory-safe" [8][9] — is wrong, and wrong in a way that matters. It equates two safety architectures that differ not in degree but in kind, and it awards the categorical label to a language whose guarantee is conditional, non-compositional, incomplete over data shapes, and — decisively — weakest in exactly the software domains the label is invoked to justify.

Three facts, developed in the sections that follow, establish the thesis:

The checked subset is incomplete. Rust's borrow checker verifies programs whose data ownership forms a tree. Cyclic and self-referential structures are inexpressible under it. (§3)
The guarantee is non-compositional. Code fully accepted by the checker can trigger undefined behavior by calling unsound unsafe code anywhere in its dependency closure — and the ecosystem measurably fails these proof obligations, including in the standard library. (§2, [1])
At scale, Rust converges to C++'s shape. Every flagship large Rust system has been forced to construct its own unverified object-lifetime runtime — arriving at the same safety architecture the equivalent C++ systems use. The claimed categorical difference from C++ disappears precisely in the domains it was claimed for. (§4)

A language of which all three are true does not satisfy the definition. The remainder of the paper is the demonstration.

2. What "Memory Safe" Means, and Why Rust Does Not Meet It

2.1 The definition

A memory-safe language is one in which memory-safety violations are inexpressible by construction for all programs. Garbage-collected languages meet this definition through their runtime: the collector guarantees no reachable object is reclaimed, bounds checks cover access, and consequently:

Completeness: every data shape — cyclic, shared, self-referential — is safe. class Node { Node? Parent; List<Node> Children; } is safe, done.
Compositionality: safe code calling safe code is safe, unconditionally. No library can export an API that safe callers can use to corrupt memory.
Fixed trusted base: the trusted surface is one runtime artifact, institutionally hardened for decades, whose failure modes are uncorrelated with application logic — a JIT bug is triggered by codegen patterns, not by an application "holding an API wrong."

This is the property the term "memory safe" was minted for. C#, Java, Go, and JavaScript have it. (C# undermines its own lexical boundary — IntPtr, Marshal.*, and MemoryMarshal are callable without the unsafe keyword, an indefensible design defect that makes its trusted surface ungreppable — but this is a policing failure over a known API list, not a structural incompleteness. The compositional property over the safe API surface stands.)

2.2 Rust's actual property

Rust's property is different in kind, on every axis:

It is conditional — and the conditions are the wrong kind. The precise statement of what the borrow checker delivers is: code it accepts is violation-free provided the compiler is sound, the standard library's internal unsafe is correct, and every unsafe block in the transitive dependency closure correctly discharges its proof obligations. This paper concedes the first two conditions without argument: compiler soundness is analogous to JIT soundness, a fixed, centralized residual that every safety architecture carries and that §6's definition absorbs. (Open rustc soundness holes exist and are demonstrable [2], as JIT bugs exist for the CLR; neither is load-bearing here.) The condition that has no analogue — the one the thesis rests on — is the third: safety quantified over every unsafe author in the dependency closure, a surface that is distributed across thousands of unaffiliated maintainers rather than centralized, that grows with the program rather than staying fixed, and whose failures are triggered by the application's own usage patterns rather than uncorrelated with them. And it measurably fails: the RUDRA study [1] found 264 new memory-safety bugs across crates.io — including in the standard library — many reachable from callers that never leave the checked subset.

It is non-compositional — because its boundary is a promise, not a mechanism. In a memory-safe language, unsafety is contained at the boundary by machinery: the collector and the bounds checks sit at the line and enforce safety dynamically, no matter what the caller does. Because enforcement is mechanical, the safe side can be offered arbitrary generality — alias anything, mutate any graph — and the boundary holds regardless. In Rust, the boundary around an unsafe internal is enforced by practice: the wrapper is sound only if no possible sequence of safe calls can violate the internal invariants — a proof-shaped obligation — but nothing in the toolchain requires, checks, or even represents that proof. What stands at the line is the author's informal reasoning, review, and convention; the compiler takes the wrapper's signature on faith. (Genuine machine-checked proof exists for Rust — the RustBelt project formally verified a handful of standard-library abstractions — and its rarity is the tell: it is a per-abstraction academic research effort covering a rounding error of the ecosystem, against aliasing rules whose formal statement is itself unfinished. If proof were the operative enforcement, RustBelt would not be a research program; it would be part of the compiler.) The author therefore has exactly one lever for making the promise true: narrow the exposed API until the dangerous usage patterns become inexpressible. Soundness is purchased with generality; even a perfectly implemented boundary is a constraint of use, because rendering behavior consumable by checked contexts means amputating its generality away from the uncheckable. (Type-level techniques — branded lifetimes, GhostCell-style encodings — can push some invariants into the boundary, but the encodable frontier is famously narrow and paid for in exactly this ergonomic and expressive loss.)

This leaves an unsafe wrapper author a trichotomy: amputate the generality, hand-build runtime enforcement (dynamic borrow flags, index validity checks — i.e., locally rebuild the memory-safe language's mechanical fence, per structure, unaudited), or ship the proof obligation to the callers. The RUDRA study [1] is the measured rate of the third option across the ecosystem: obligations shipped and botched, so that code fully accepted by the checker triggers undefined behavior through its own usage patterns. "My code contains no unsafe" bounds authorship, not exposure — safety becomes a universally quantified claim over thousands of unaffiliated maintainers upholding aliasing invariants (Stacked/Tree Borrows) subtler than C++'s, under a memory model that remains unfinalized [4]. A guarantee contingent on an ecosystem-wide universal quantifier is not a language property. It is a hope.

And it is incomplete — which is the structural core of the thesis, and the subject of the next section.

3. The Checker Cannot Express Real Data Structures

The borrow checker's ownership model — one owner per value, borrows forming statically scoped tree regions, aliasing XOR mutation — verifies exactly the programs whose data is tree-shaped. Therefore:

Any structure containing a cycle, a back-edge, or self-reference is inexpressible under the checker.

The excluded set is not exotic. It is: doubly-linked lists, graphs, intrusive lists, trees with parent pointers, DOMs, scene graphs, observer registries, cross-referencing caches, buffer pools, and entity systems — the load-bearing structures of engines, browsers, databases, editors, and servers. The community's own canonical text, Learning Rust With Entirely Too Many Linked Lists [3], is a book-length demonstration that a first-semester data structure requires expert unsafe in Rust.

When the program needs a non-tree shape — and every interesting program does — there are exactly three exits. Each one surrenders the property.

Exit 1: hand-written unsafe. The programmer assumes proof obligations subtler than C++'s, under an aliasing model still being defined [4], with RUDRA [1] as the measured ecosystem failure rate. This is not memory safety; this is manual memory management with stricter, less-specified rules.

Exit 2: Rc<RefCell<T>> / Weak. The defect here is not that reference counting is unsound — it is that in Rust it is elective. Python's reference counting is a language mechanism: universal, embedded in the runtime, and completed by a cycle detector, so its guarantees are properties of the language. Rust's Rc is a library type, applied per-structure by programmer discipline, with cycles broken by manual Weak placement — and pervasive application is impractical for the same reasons it is with shared_ptr in C++: refcount traffic on hot paths, runtime borrow-panic hazards in deep call stacks, and pervasive ergonomic drag. Nobody builds whole systems this way, in either language, for the same reasons. A safety story that holds only where the programmer opted in, cannot practically be opted into everywhere, and leaks cycles wherever Weak placement is missed, is conditional on usage — a property of disciplined programs, not of the language. Had Rust wanted reference counting to ground a language-level claim, it would be embedded and cycle-collected, as Python's is; it is not, deliberately, as a performance trade — a legitimate trade, but one that trades away the label.

Exit 3: arenas and indices (slotmap, petgraph, generational arenas, ECS). This exit is not memory safe, full stop — and seeing why requires only stating what an arena is. An arena is a heap: a user-space allocator in which slots are allocations, indices are pointers, and slot reuse is free followed by malloc recycling an address. It satisfies the borrow checker because indices are opaque integers — pointers the compiler cannot see — which is the defect, not the feature. A stale index is therefore not analogous to use-after-free; it is use-after-free, executed against the program's operative allocator, which merely happens to be implemented one level above malloc. And it carries use-after-free's signature security payload: the dead handle reads whatever object now occupies the slot — another session's credentials, another player's state, another document's contents — private information disclosed across an object-lifetime boundary, with corruption available on the write side. This is the vulnerability class the term "memory safety" exists to name; disclosure of recycled storage through a dead reference does not require a wild hardware pointer, as the industry's own hardening work acknowledges by treating intra-pool reuse as first-class attack surface [10]. The only way to certify this exit as memory safe is to gerrymander the definition so that the hand-rolled heap does not count as memory. Generational indices mitigate — converting silent disclosure into a lookup panic — but they are optional, conventional, and unverified: a runtime check the author may or may not have installed, enforced by practice, invisible to the checker. Note also that the container-index defense available to memory-safe languages does not apply here: a stale index into a C# list is a bug inside a memory-safe object model; the arena is not a container inside an object model — at scale it is the object model, load-bearing as the allocator, so its use-after-free is allocator use-after-free.

The consequence: the unverified surface of a Rust program is not a constant, as a runtime's is. It grows with the structural complexity of the program's data. The more graph-shaped the problem, the less the checker covers — a scaling law pointed directly at systems software.

4. At Scale, Every Project Builds Its Own Runtime

Section 3's scaling law makes a prediction: at the scale of an Unreal, a Unity, a Blender, a MySQL, a BigTable, a League-of-Legends-class server — codebases whose cores are densely cyclic, mutably-shared object graphs mutated under tight budgets from deep call stacks — the three exits individually collapse. Rc<RefCell> becomes refcount traffic on every hot path plus a nested-borrow panic minefield; arenas become a bespoke object model with hand-enforced handle discipline; raw unsafe becomes C++ with harder rules. The prediction is that large Rust projects will be forced to build their own object-lifetime runtimes.

The flagship projects confirm it, unanimously:

Servo — Mozilla's browser engine, the language's original motivating project — and the reframing it forces. The marketed claim is that Rust is categorically different from C++. Compare shapes: Blink, the C++ engine in Chrome, manages its DOM with Oilpan, a tracing garbage collector, surrounded by hand-disciplined C++ [10]. Servo, unable to express the DOM under the checker, manages its DOM with SpiderMonkey's tracing garbage collector, integrated through extensive unsafe FFI, policed by a bespoke custom-lint suite enforcing invariants the borrow checker cannot see [5]. The same shape: cyclic core on a tracing collector, practice-enforced glue at the boundary. This is not evidence that Servo's engineers erred — a DOM interoperating with a JavaScript heap plausibly demands collection in any language, which is itself the §5 point. It is evidence that in the domain that motivated the language's creation, the language and C++ converge to the same safety architecture. A categorical difference that vanishes at the destination was never categorical.
Bevy — the largest Rust-native game engine. Its ECS is presented as a performance architecture; it is equally the arena/index exit imposed engine-wide, because a conventional scene graph is inexpressible under the checker. The ECS core is itself dense with unsafe and carries a multi-year public trail of soundness bugs in precisely that code [6].
Zed — the editor. Its gpui framework implements a custom entity system: reference-counted handles with runtime lease semantics [7] — a hand-built dynamic ownership runtime, constructed because static ownership could not express an editor.

The pattern is a memory-management instance of Greenspun's Tenth Rule: every sufficiently large Rust program contains an ad-hoc, informally specified, unverified reimplementation of an object-lifetime runtime — different in every codebase, invariants living in comments, fuzzed by no one. And §2.2 explains why this outcome is forced rather than accidental: mechanism is the only thing that actually contains unsafety at a boundary, and when a system needs generality that cannot be amputated away, its authors have no option left but to build the mechanism themselves.

Now state the comparison honestly. A large C# system and a large Rust system both end up with a trusted lifetime runtime in the load-bearing role — but the two occupants of that role are not the same kind of thing. The C# system's runtime is a mechanism in the §2.2 sense: a precise moving collector for which reachability is ground truth, complete over all data shapes, compositional, its failure modes uncorrelated with application logic, its trusted surface fixed no matter what the application does — and it ships with the platform, hardened by twenty-five years of institutional fuzzing, with the application's million lines entirely on the safe side of it. The Rust system's stand-in is a practice-enforced promise: per-structure wrappers around unsafe internals, covering only the shapes their authors anticipated, sound only against the caller patterns their authors imagined, with a trusted surface that grows with every extension — project-local, young, audited by no one. The role is the same; the occupant differs in kind. The language did not remove the trusted runtime from systems programming. It replaced a mechanism with a promise, and privatized the promise.

This is the final blow to the label. A memory-safe language does not require its largest programs to hand-build the machinery of memory safety. That the machinery must be built is the proof that the language does not contain it.

5. Objections

"Most Rust code never writes unsafe." Authorship is not exposure (§2.2). Non-compositionality means the application is hostage to the unsafe it transitively calls, and §3 shows that volume is set by the program's data shapes, not the author's discipline.

"ECS and arenas are good architecture regardless — data-oriented design wins on cache behavior." Sometimes true on performance, and irrelevant to the safety claim. An architecture mandated by the verifier's expressiveness gap, whose handle discipline the verifier cannot check, is not evidence of the verifier's success. The checker verified the tree-shaped parts and was absent for the hard parts.

"Panics beat UB." Genuinely true, and a different claim. "Fails better than C++" is not "memory safe" — and GC platforms deliver the better failure modes and compositionality and completeness, without per-structure manual schemes.

"The Android data shows Rust reducing vulnerabilities." The cited datasets [8] aggregate Rust with Java and Kotlin as "memory-safe languages" versus C/C++. They prove the value of leaving C++. They say nothing about Rust versus GC platforms — this paper's comparison — on which no isolating data exists.

"GC is unacceptable in systems software." Examine the systems software that actually exists. Chrome ships multiple collectors on its hottest paths: V8's generational GC for the JavaScript heap, and Oilpan — a tracing garbage collector Google built in C++, specifically to manage Blink's DOM — adopted after years of reference-counting that cyclic graph convinced the most performance-scrutinized C++ team in the industry that a tracing collector was the correct mechanism [10]. Firefox pairs SpiderMonkey's GC with a cycle collector. Unreal Engine — the reference C++ game engine — ships a mark-and-sweep collector for its UObject graph; Unity's gameplay layer runs on the .NET GC. The planet's infrastructure layer is collected: Kubernetes, Docker, etcd, CockroachDB on Go's GC; Kafka, Cassandra, Elasticsearch on the JVM's. Garbage collection is not grudgingly tolerated in systems software; it is standard equipment in the flagship systems, adopted independently by C++ teams with every anti-GC incentive, because graph-shaped cores demand mechanism. Meanwhile the inventory of systems software written in Rust consists of excellent components — Stylo and WebRender inside Firefox, Firecracker, Pingora, kernel drivers — which are tree-shaped or embedded in larger hosts, exactly per §6, while the whole-system flagships embed a garbage collector at their graph-shaped core: Servo runs its DOM on SpiderMonkey's collector; Deno wraps V8's. There is no shipped Rust browser, database engine, mainstream operating system, or major game engine. The objection's premise is refuted by every system it gestures at, and §4 already explained why: the large Rust systems did not escape runtime lifetime management — they reimplemented it. At scale the choice is not "GC vs. no GC." It is "a mechanism or a promise" — the platform's fence, or your own.

"So the term is just being used loosely — why does it matter?" Because the label is now written into federal procurement guidance [9] as an undifferentiated category, steering rewrites of exactly the graph-shaped systems where the property does not hold. A category error at policy scale is not pedantry.

6. What Rust Actually Is

Rust is a memory-disciplined language: a static ownership checker of real value, whose verified domain is tree-shaped data, in programs small enough that the transitive unsafe surface stays auditable, conditional on compiler soundness and on ecosystem-wide proof obligations that are measurably not being met. Within that domain — parsers, serializers, codecs, compression kernels, CLI tools, straight-line pipelines — the checker covers essentially the whole program and delivers no-runtime performance with genuine static assurance. ripgrep and serde are honest exemplars, and nothing here diminishes them.

But examine even the granted niche closely, and the advantage — as distinct from the guarantee — narrows toward the vanishing point. The marketed value proposition is "safety without garbage collection, where garbage collection hurts." Now cross the two axes. Where the checker's guarantee holds — batch tools, parsers, pipelines: bounded working sets, reused buffers, no latency contract, process exits — is precisely where a collector costs nothing; a ripgrep-shaped workload is the GC's easiest case, and its actual performance derives from algorithmic engineering (literal-prefix SIMD scanning, lazy-DFA regex, parallel traversal) available in any compiled language. Where a collector genuinely hurts — long-lived, latency-bound, graph-shaped systems — is precisely where §3 and §4 showed the guarantee is gone. The sweet spot the label advertises — safety without GC, where GC hurts — is a near-empty quadrant: where the safety holds, the collector was free; where the collector costs, the safety has left. The honest residual quadrant is hard-real-time, allocation-forbidden, tree-shaped kernels — embedded targets, codecs under latency contracts, kernel modules — real, valuable, and a small fraction of what the label is being applied to.

The comparison lands harder still because the memory-safe platforms did not cede the niche. Modern .NET ships a compiler-enforced static lifetime checker for exactly the hot-window pattern: ref struct semantics (Span<T>, ReadOnlySpan<T>) with escape rules the compiler proves — no boxing, no heap capture, no crossing async or closure boundaries, scoped lifetime narrowing, stackalloc feeding directly into checked stack windows [11]. This is borrow checking deployed at the scope where it is complete: stack-lifetime views are tree-shaped by nature, so the checker in that role is total, and it operates inside a language whose object graph is covered by mechanism. The architecture the evidence in this paper points to is thus not hypothetical — it ships: mechanism for the graph, checked lifetimes for the windows. Rust's core idea is vindicated at window scope and indicted at object-model scope; its error was not the borrow checker but the claim that the borrow checker could be the object model.

But a memory-safe language is one whose property holds for the language — all programs, all data shapes, compositionally — assuming only that the trusted mechanism is itself correct and transmits its guarantees faithfully to the native boundary. Every safety property bottoms out at such an assumption; the honest question is what shape the assumption takes. For a GC platform it is a single one: one runtime artifact, fixed in size regardless of what the application does, uncorrelated with application logic, institutionally hardened. Rust fails the definition even granting it the analogous assumption — grant compiler soundness and a correct standard library, and the property still does not follow, because the remaining conditions are of a different character entirely: distributed across every unsafe author in the dependency closure, growing with the program's structural complexity, and triggered by the application's own usage patterns. Rust's property is conditional in a way the assumption cannot absorb, non-compositional where the definition requires compositional, incomplete where it requires total, and absent at scale where the marketing aimed it. The tree-shaped subset is safe. The language is not. Conflating the two hands the categorical label earned by garbage-collected runtimes to a language that, at the scale of the systems it was built to replace, asks every project to build the runtime itself.

Rust is not a memory-safe language. It is a good language wearing the wrong label — and the systems being rewritten under that label deserve the accurate one.

References

Bae, Y., Kim, Y., Askar, A., Lim, J., Kim, T. — RUDRA: Finding Memory Safety Bugs in Rust at the Ecosystem Scale. SOSP 2021. https://dl.acm.org/doi/10.1145/3477132.3483570
cve-rs — memory-safety violations constructed with no unsafe, on stable compilers, via open soundness issues. https://github.com/Speykious/cve-rs ; underlying issue: https://github.com/rust-lang/rust/issues/25860
Beingessner, A. — Learning Rust With Entirely Too Many Linked Lists. https://rust-unofficial.github.io/too-many-lists/
Jung, R., et al. — Stacked Borrows: An Aliasing Model for Rust (POPL 2020); Tree Borrows successor work; Unsafe Code Guidelines effort (memory model unfinalized). https://plv.mpi-sws.org/rustbelt/stacked-borrows/ ; https://github.com/rust-lang/unsafe-code-guidelines ; RustBelt (machine-checked verification of selected std abstractions): https://plv.mpi-sws.org/rustbelt/
Servo project — DOM design: SpiderMonkey GC integration and custom lint enforcement. https://github.com/servo/servo/blob/main/components/script/dom/mod.rs
Bevy Engine — bevy_ecs internals and public soundness-issue history. https://github.com/bevyengine/bevy/tree/main/crates/bevy_ecs ; https://github.com/bevyengine/bevy/issues?q=label%3AC-Unsound
Zed Industries — gpui entity/ownership model. https://github.com/zed-industries/zed/tree/main/crates/gpui
Google Security Blog — Memory Safe Languages in Android 13. https://security.googleblog.com/2022/12/memory-safe-languages-in-android-13.html
ONCD — Back to the Building Blocks: A Path Toward Secure and Measurable Software (Feb 2024); CISA et al. — The Case for Memory Safe Roadmaps (Dec 2023). https://bidenwhitehouse.archives.gov/oncd/briefing-room/2024/02/26/press-release-technical-report/ ; https://www.cisa.gov/resources-tools/resources/case-memory-safe-roadmaps
Chromium — Oilpan: a tracing garbage collector for Blink's DOM objects (C++); MiraclePtr/PartitionAlloc use-after-free hardening (intra-pool reuse as attack surface). https://chromium.googlesource.com/chromium/src/+/main/third_party/blink/renderer/platform/heap/BlinkGCDesign.md ; https://v8.dev/blog/oilpan-library ; https://security.googleblog.com/2022/09/use-after-freedom-miracleptr.html
Microsoft — C# ref struct semantics and ref-safety rules (compiler-enforced escape analysis for stack-referential types); Span<T> and low-level struct improvements. https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/ref-struct ; https://learn.microsoft.com/en-us/dotnet/csharp/advanced-topics/performance/ref-safety

Tuesday, June 16, 2026

Towards Conducting AI's through High-Bandwidth Colloration

We are all experimenting, feeling out the path to the future of creative AI / human collaboration.

Along the way, I think there is a real danger in getting over focused on the curation of "preferences" in how working "feels" instead of results.

Like the coffee afficuanato, who insists a $3000 coffee grinder increases their experience of the morning drip measurably, when I can scientifically prove he's not getting more caffeine into his bloodstream than I am with a $75 Cuisinart grinder. When we focus on the "how it feels" of methods, instead of the "what comes out" results, then IMO we are talking about detatched token snobbery. Not value. Not method innovation.

TLDR - When the LLM talks in a way that's wrong, it will code in a way that's wrong. We can only correct it if we read what it's saying. LLMS process information differently than we do. The more can get *both* of our capabilities deployed at the token emission site, the better the value velocity is. I have a *strong* preference for Opus 4.6, because it is on the razors edge of smart enough to do the work and sycophantic enough to listen to me. When I already know the solution or algorithm I know will work, I don't have much tolorance for arguing with a word calculator about it.

Jeske's AI Coding Screenshot Diary

------

I admit this is probably something I should make a more public post about... You all are my test audience, to get my words sharp and figure out what I'm saying. I also find the public channels frustratingly unsatisfying.

LLMs work in a next token probability space that i call the "context manifold", because mathetmatically it's an N-dimensional shape function that combines with Model probabilties to produce token ouput.. My assertions are (a) the the more tightly I can keep that next token probability space clean and aligned, the faster value comes out the other side, (b) I can *feel* or *sense* the context manifold alignment or misalignment by reading it, (c) I can only do this if I can SEE the token steam.

Tools like Claude Code, Cursor, and Antigravity, for me, feel like telling someone how to play baseball without being able to see the game. Logical instructions, thrown over a wall, that I can't learn from or guide. I need to see the token stream, so I can detect drift as early as possible. So I can fix the context shape in this session, and learn to better shape the next 100 sessions.

I call my workflow conducting, not babysitting, because im making the orchestra performance possible, not changing diapers. It feels more like juggling 100 balls than queueing a prompt and getting coffee or sitting around reviewing diffs.

...And i took it to an extreme, where i built my own LLM creative collaboration and coding harness because my goal is not "less annoying", my goal is increasing the bandwidth of human/llm communication. More automated tooling like Cursor, Claude Code, and Antigravity are not just massively less productive for me, they are actively obstructive to my productivity by hiding information and getting in the way of me shaping the the next-token prediction streams for maximum value creation.

-----

At a higher level, I see an important bifrucation happening, whcih I break down in this way:

eyes-off agentic - Most tooling I see is trending to more automated loops and to hiding and summarizing information. In this model, the human brain provides the high level goals, and lets the AI token stream do what it wants to, with lower frequency check-ins on functionality value more than form.
eyes-on agentic - This is what I call it when we *partner* our human brain intelligence deeply into the LLM next token decisions and the shape of the probability space. It's admitting the code-is-context, and getting involved tightly-shapeing-code-to-shape-LLM behavior. This doesn't mean always code-reviewing or always intervening. This means human brain modeling of what the LLM will do and is doing, in order to catch probably space drift and code/context shape drift early. To keep velocity and parallelism highest, by minimizing length of time absolute nonsense LLM probably space word salad affects the surviving code-base.

By measurement - I believe I'm conservatively 50x more productive (quality and velocity) using eyes-on agentic.

My productivity value from eyes-on is not abstrat. I quantify it, and it's accelerating. (graphs below)

Tom has pointed out many ways that my workflow may be something that most people simply can-not manage, and I accept that possibility. You can't stick a 16 year old new driver in an indy car and expect anything other than disaster.

I run and read/skim 2-6 simultaneously running and visible llm chat sessions. I suspect not alot of people can keep up with this. Howveer, I assert that the value many are leaving on the table by being eyes-off, is as I stated above, the chance to "get our human brain intelligence deeply woven into the LLM next token decisions" at whatever bandwidth they can handle.

My custom coded LLM chat-interaction is WYSIWYG Markdown with no "elliding" of LLM tokens, and i turn off "thinking" in coding sessions because i find it counterproductive and annoying (ive been told GPT codex thinking adds value, have not tried it enough). I don't use canned "skills", but instead I write and refine and hone custom implementation specs per task. My system prompt for coding is BARE - tool use definitions and 8 lines of general framing. I don't use system prompt "rules", as I find they pollute the context and do more harm than good in the long haul. (i'm using methods other than prompt "rules" to get adherence to pattern requirements).

I have a *strong* preference for Opus 4.6, because it is on the razors edge of smart enough to do the work and sycophantic enough to listen to me. It's easier for me to manage the sycophancy than it is for me to suffer the wastefull long-winded opinioned arguing and deviations from my instructions I get from Opus 4.8 and Gemini 3+. I admit I have not used GPT Codex enough.

I have not been able to code with Fable yet. My long 2 hour chat with Fable suggests that it will be notably better at eye-off autonomous coding work, but I did *not* enjoy the design session I had with it. Fable still takes too many turns of arguments for it to stop answering with model bias and start reasoning from facts and ground truth. When I already know the solution or algorithm I want to use, I don't have much energy for arguing with a word-calculator about it.

-----

The "eyes-off" coding ceiling is certainly improving...

My non-coder wife built an entire mobile web babysitter organizer app by herself! (And its good!) She is more of the magic there than she takes credit for (she has a cs degree and did y2k programming in cobol in her 20s before she shifted to sales). im it blown away. Its also a categorically bounded type of work. Also,opus dumped everything in a 5500 line jsx that would like have ate itself eventually if i jadnt intervened.

I setup our 12 year old son Jack with claude desktop and a pattern for working on 2d webgames. He messed around making his version of a side scroller category called "gravity flip obstacle course". Then he wanted to do AI unreal engine vibing. That is not viable right now. I did some research. I experimented with claude code and Godot. Gdscript was a full fail mess, but i was already intending to do godot c#. I pivoted and got him into a claude code chat that converted his 2d gravity flip into godot c#. Categorically better at godot c# than gdscript,than trying to vibe code unreal.

I sat him down at it, walked away, and 1.5 hours later, he had a 3d viewport blocky godot knight running around an undulating terrain of sand, with a comic proportion medieval castle "town" where he could walk up to a vendor, by a sword, and swing it by clicking right mouse button. This is a 12 year old who cant program, barely can do bounded programming class "puzzles". That is insane. That is also not becoming a product without skilled intervention, but the *learning* happening there is the closest thing to Diamond Age and the "young womans illustrated primer ive seen"

Part of the magic in both of hese cases, is putting the AI into a space it can succeed, and keeping it in that space.

If one doesn't do this, it's more like going to the roulette table and betting on black.

------------

Below is a graph of code+markdown lines contributed to my coding projects since August 2025.

Of course we can all admit that "lines of code+markdown" is a narrow metric. It doesn't tell you anything about work product. And so, in that respect we merely have to decide how much to trust the conversation and the presenter.

I stopped using any other AI coding tools April 9th 2026, because i find my harness more productive and more pleasant. (code-named AstroNMCL) I'm not writing a tool to write a tool. I'm writing a tool to produce software and creative output. And I'm producing it.

10 mos - September August 2025 to June 2026 - 1.1M lines (110k/mo)
6 mos - December 2025 to June 2026 - 750k lines (125k/mo)

4 mos - Feb 2026 to June 2026 - 550k lines (137k/mo)

Below is a similar graph of "Story Fiction" Prose Lines I've conducted / co-written over the same timeframe.

It sits at about 2.7M words as of 6/15.

What is the quality? I like to say better than Twilight, worse than Hemmingway. The key thing here is that this isn't random chunky paragraphs out of an LLM, and it isn't "write this chapter for me". This is collaboratively constructed long-form novel-fiction as a constructed artifact - almost like software, produced from world+character+goal+harness design. My prompts and harness itself are designed to do things to scaffold the LLMs needs when constructing fiction, and I'm deeply involved in the next-token preduction.

If you made it this far. Thank you, I appreciate it.

What my 3 monitor setup looks like during a typical session: