Content is user-generated and unverified.

NURL: A Peer-Review-Style Technical Evaluation of a Language Designed for Language Models

A long-form external review of the NURL project (nurl-lang.org / github.com/nurl-lang/nurl), as of May 2026.


1. Introduction and stance

NURL — "Neural Unified Representation Language", or, as the README mischievously offers as a backronym, "Non-hUman Readable Language" — is one of those projects that sits awkwardly between three readings. Read one way it is a serious systems language with a self-hosting LLVM-backed compiler, a multi-target build pipeline, single-owner memory management with compiler-inserted auto-drop, and an unusually ambitious HTTP and MCP stack for a solo project. Read another way, it is a polemical art piece: a language whose stated thesis is that existing languages were optimised for humans and that this is a misallocation of bits in an era where most code is increasingly produced and consumed by language models. Read a third way — and this is the part that gives the project its particular flavour — it began life as a fictional language: a Finnish-language piece of speculative fiction in April 2025 imagined what would happen if LLMs invented their own programming language, and the real implementation that exists today closely tracks that fiction.

This review is interested in the first reading. The origin story is worth mentioning once because it explains the unusual coherence of the project's design (it is in some sense already finished as a concept before any line of code was written) and because it warns the reader that part of NURL's appeal is rhetorical. But the question I want to answer is whether the artifact stands on its own as a programming language and as a compiler engineering effort. I will treat NURL as a serious artifact worthy of serious engagement, while being honest about gaps and risks.

Two caveats up front. First, NURL is essentially a solo project. As of May 18, 2026, a direct fetch of github.com/nurl-lang/nurl reports exactly 4 stars, 2 forks, and 2 commits; a directed web search for third-party discussion of the language — Hacker News, Lobsters, /r/ProgrammingLanguages, Mastodon, blogs — turns up effectively nothing. The reader should know that nothing in this review is corroborated by an existing community of practitioners; everything here is fresh-eyes evaluation against primary sources (the repo, the playground, the live MCP endpoint). Second, the project's roadmap explicitly treats May 2026 as a moving frontier: items shipped in the last week before I write this — TLS, handler panic recovery, fixed-size integer types, variadic FFI, pub visibility — would have been roadmap entries a fortnight earlier. I will try to be careful to distinguish shipped from planned, but the snapshot is unavoidably narrow.

With those caveats, here is the headline: NURL is more language than I expected to find behind such an obscure project. It is also, in places, more idiosyncratic than I expected. Whether it is more language than gimmick depends on a small number of design bets I am not entirely sure how to evaluate. I will lay them out below.

2. A surface tour

The clearest way to communicate the shape of NURL is to read a complete small program. Here is fizzbuzz.nu from the examples directory, reproduced verbatim:

@ fizzbuzz i n → v {
    : ~ i i 1
    ~ <= i n {
        : b div3 == 0 % i 3
        : b div5 == 0 % i 5

        ? & div3 div5
        { ( nurl_print `FizzBuzz\n` ) }
        ? div3
        { ( nurl_print `Fizz\n` ) }
        ? div5
        { ( nurl_print `Buzz\n` ) }
        { ( nurl_print ( nurl_str_cat ( nurl_str_int i ) `\n` ) ) }

        = i + i 1
    }
}

@ main → i {
    ( fizzbuzz 30 )
    ^ 0
}

If you have never seen NURL before, this is probably the moment you decide whether to keep reading. The language is uniformly prefixOP ARG ARG, with ( fn args ) as the unambiguous call form — and uses single-character sigils for nearly every grammatical role. @ introduces a function definition (and aggregate constructors, and the closure type form); : introduces a binding (with optional ~ for mutability); = assigns; ? is the ternary; ~ at statement position is while or for-each; ^ is return; % is modulo and also trait/impl; & and | are binary logical/bitwise; backticks delimit strings; is the return arrow. The type names are single letters: i for i64, u for u8, f for f64, b for i1, s for i8* (a borrowed C-style string), v for void. Generic instantiation uses [T] at declaration site and [i] at call site.

The aesthetic is roughly Lisp's prefix discipline meets APL's character density meets a Hare/Zig-style commitment to a single page of grammar. The grammar file spec/grammar.ebnf confirms this: NURL is LL(1) with at most four tokens of lookahead, and the EBNF really does fit on one printed page if you elide the comments. There are no commas anywhere in the language; arguments, fields, params, and enum variants are separated by whitespace.

Once you accept the surface, the rest of the language is more conventional than it looks. Here is a piece of the bundled showcase that exercises generics, traits, enums, pattern matching, closures, and \ for try-propagation of Option:

: | Ast {
    Num i
    Neg * Ast
    Bin i * Ast * Ast
}

@ eval * Ast e → ?i {
    ?? . e 0 {
        Num n     → @ ?i { T n }
        Neg inner → {
            : vv \ ( eval inner )
            ^ @ ?i { T - 0 vv }
        }
        Bin op l r → {
            : va \ ( eval l )
            : vb \ ( eval r )
            ? == op 0 @ ?i { T + va vb }
            ...
        }
    }
}

Anyone who has written code in an ML, in Rust, or in OCaml will recognise the shape: tagged sum, pattern match with binding, optional return via the ?T shorthand, and \ as a Rust-?-like try operator that unwraps Some/Ok and short-circuits None/Err to the enclosing function's return type. The unconventional bits are the single-character keywords and the prefix call syntax, not the semantics underneath.

3. Language design

3.1 Grammar

The grammar is genuinely small. The top-level production has eight declaration kinds (import, ffi, trait, impl, function, struct, enum, const), each typically prefixed by an optional pub visibility marker that was introduced in grammar v2.0 (May 14, 2026). Expressions cover the conventional set: literals, identifiers, binary ops, not, return, complement, try, closure, sizeof, aggregate, slice literal, cond, block, call, member, cast, match. Types cover base types, pointer, option (? T), slice ([ T), result (! T E), function type ((@ R P*)), and a ( Name T1 T2 ) form for generic instantiation. That's it.

This minimalism comes at a cost which the README and docs/GOTCHAS.md are commendably honest about. The most painful one is that the parser treats & and | as strictly binary; & a b c is not a three-way AND but a parse-arity error whose diagnostic, because prefix notation has no closing token, surfaces several lines later as "unexpected token". This is item §1 in GOTCHAS, and the author tried to fix it as part of a "shrink the gotcha doc" cleanup but rolled back the change: "Found that no whitelist of operand-start tokens disambiguates safely from prefix-notation ternary cond / then-arm boundaries." The trade-off is real. Pure prefix notation with no closers buys you grammatical regularity but pays in error-recovery quality and in the cognitive load of counting operands.

Where on the C / Rust / Go / ML / Hare / Odin / Zig / Nim axis does this sit? I would place NURL closest to a minimalist Rust in semantics — Option, Result, sum types, pattern match, generics, traits with default methods, closures, owned values, no GC — but with a syntactic surface that resembles no other production language. Forth and the Lisp family are the closest precedents for prefix-everything; APL is the closest precedent for single-character operator density. The combination is genuinely novel; whether it is desirable depends on who or what is reading.

3.2 Type system

The type system is monomorphising-generics with structural pattern matching on sum types and trait-based dispatch. Generic functions and generic structs both monomorphise at every distinct instantiation: ( Vec i ) and ( Vec s ) produce two distinct named LLVM types, %Vec__i64 and %Vec__str. The reviewer notes in the stdlib that the type-variable name is conventionally [A] (not [T]) because the boolean literal T collides with whole-identifier substitution during monomorphisation — a small wart that hints at a more general issue with how generics are implemented (text-level substitution rather than first-class IR types).

There are traits with default-method bodies, and impls that monomorphise the trait body per impl type, dispatching by mangling the first argument's type. A Drop trait is recognised by convention — the compiler inserts a call to drop__<T-mangle>(self) at scope exit for owned bindings of type T if an impl Drop exists. This is a charming touch and aligns with how Rust handles Drop, but the convention-not-magic part means that misnaming drop silently loses the behaviour.

Option[A] is ?A and lowers to { i1, A }. Result[T, E] is ! T E and lowers to { i1, i64 }, where the success and error payloads are both stuffed into a single i64 slot via integer/pointer/extractvalue tricks at construction and destructure time. This is one of the more interesting language-level choices in NURL: it keeps the result type cheap and uniform but forces a heap box for multi-field success payloads. As of May 14–15, 2026, the compiler now boxes multi-field T on construction and unboxes on ?? match and \ try-propagate; before that, multi-field Result Ok arms were a documented compiler bug (docs/GOTCHAS.md §6, now closed). Multi-field Option, which uses { i1, %T } directly rather than the uniform-slot trick, was symmetrically fixed via gen_cast returning zeroinitializer for the # T 0 dummy-zero idiom when T's first field is a non-pointer named type. Both fixes ship with regression tests.

Sized integer types arrived earlier in May 2026 as multi-character TYPE_KW tokens: i8, i16, i32, u16, u32, u64, f32. They map to LLVM iN / float, with signedness carried in a per-binding side-channel __unsigned flag that drives sext vs zext at casts and stores, and udiv/urem/lshr/icmp u* at arithmetic. Variadic FFI with C default argument promotion (f32 → double, narrow ints → i32) shipped the same week, unlocking direct calls to printf and its relatives. Before sized integers, the language was essentially i64/u64/f64-only, which is workable for application code but makes systems-programming workloads — binary protocols, struct-of-bytes parsers, embedded — significantly harder. Their arrival is recent enough that the stdlib only sparsely uses them; the bytes-endianness module (std/bytes.nu) added u16/u32/u64 readers and writers a few days later, motivated by gzip/CRC and MessagePack pathways.

Type inference is local: an annotation is required when the type isn't derivable from the right-hand side. There is no subtyping and no implicit conversion. There are also no proper namespaces — $-imports are inline-include semantics (think #include), with an optional alias that rewrites top-level functions, struct/enum types, enum variants, and global constants to alias__name. FFI declarations and trait/impl methods are not renamed by aliasing because FFI symbols resolve at the linker by literal ABI name and trait dispatch is type-mangled, not name-routed. This is a defensible choice that keeps the model simple; it is also an obvious place where the language will need to grow if anyone seriously tries to scale a project beyond a few thousand lines.

3.3 Memory model

This is the part that warrants the most careful evaluation, because the project's marketing language ("single-owner with auto-drop") is the kind of phrase that means very different things to different readers.

What NURL actually enforces is best described as single-owner ownership with compiler-inserted auto-drop, no borrow checker, no aliasing analysis, no lifetime parameters, and no general-purpose move semantics. Bindings are default-immutable; : ~ i x 0 opts in to mutation. Owned heap allocations (slice literals, string concat/slice/format outputs, allocating calls listed in runtime.c) are tracked by the compiler and a nurl_free is emitted at scope exit. Reassignment of an owned binding frees the previous value first. A small number of explicit ownership-transfer rules apply to function returns (returning an owned value transfers ownership to the caller) and to struct-literal construction (if a field is populated from a fresh allocation directly on the spot, an arm-local drop is registered against the binding). Closures capture by value by default and by pointer if the captured binding is a : ~-mutable multi-field struct — a recent fix that enables Metrics accumulators and recover-with-typed-result patterns. foreach iteration borrows elements without transferring ownership.

What NURL does not enforce, and the GOTCHAS document is straightforward about this: there is no use-after-free detection, no double-free check, no aliasing analysis, no lifetime tracking across function boundaries. A closure that captures a : ~-mutable struct by pointer and then escapes its caller's scope is a use-after-free in waiting; the compiler emits a non-fatal warning: for the ^-return shape but not for vec_push / thread_spawn escapes. There is no vec_clone because a bitwise clone of a Vec[String] would alias the inner buffers and break the single-owner invariant; the stdlib documents the manual vec_each + vec_push + element-clone pattern. The Phase 2C struct-field auto-drop is "conservative by design — only fields populated from a fresh allocation on the spot get a drop, so copying an already-owned binding into a struct does not cause a double-free. Nested owned-struct fields and arm-local struct bindings that fall through (no ^) still leak, same as the existing arm-scoped string behaviour."

Compared to alternatives:

  • Rust's borrow checker is strictly more powerful. It tracks aliasing, lifetimes, mutability exclusivity, and statically prevents use-after-free and data races. NURL trades that for radically simpler compiler internals and a model that is easier for an LLM to generate code against.
  • C++ unique_ptr/RAII is similar in spirit (owner-frees-at-scope-exit) but lacks NURL's reassignment-free behaviour and per-field auto-drop for owned aggregates. Conversely, C++ has destructors that compose by language design; NURL's Drop is convention-driven.
  • Swift's ARC is reference-counted with cycle detection responsibilities pushed to the programmer; NURL's closures capture environment via RC, so part of the system is ARC-shaped, but the dominant allocation pattern is single-owner.
  • Vale's regions and Hylo's mutable value semantics are research designs that aim for memory safety without a borrow checker; they enforce stronger invariants than NURL but at the cost of language complexity NURL deliberately avoids.
  • A traditional GC would eliminate the whole class of bugs NURL is exposed to here, but it would also break the LLVM-direct, deterministic-pause-free story the project sells.

The honest summary is that NURL's memory model is "C++ unique_ptr discipline, but with the compiler putting most of the delete calls in for you, and you still personally responsible for not creating dangling references across function boundaries". For LLM-generated code this is plausibly the right operating point — large models are quite good at local lifetime reasoning when the rules are crisp, and the borrow checker's complaints are among the harder things to satisfy under iterated generation. It is genuinely less safe than Rust. It is somewhat safer than C. Whether that's the right trade-off depends entirely on what you are trying to do with the language.

3.4 Visibility, modules, TCO, async

pub arrived in May 2026 and is the first concession to genuine module discipline; per-file strict-vis mode is opt-in (a file enters strict mode the first time any of its decls carries pub), and as of May 15, 2026, enforcement covers @-functions, struct/enum types, enum variants (which inherit their parent enum's visibility), and global constants. FFI and trait/impl decls accept pub forward-compat but do not enforce, with the documented reason that FFI symbols are linker-level ABI globals and trait methods are mangled by impl-target type.

There is no namespace mechanism beyond import-aliasing. There is no tail-call optimisation (recursion is bounded by the stack; the workaround is to write loops). There is no async/await; the roadmap marks this as "Design and implement an asynchronous programming model (Coroutines vs. Async/Await)" with no concrete commit. There is, however, a real stdlib/std/thread.nu with mutexes, condition variables, and (as of May 17, 2026) a generic Channel[A] enabled by a recent fix to generic-struct propagation through nested generic structs. There is also a panic/recover model layered on POSIX setjmp/longjmp with thread-local frame stacks, which the author is explicit about: "Setjmp/longjmp-based — does NOT run destructors during unwind. Owned heap allocations made inside a recover scope leak if their auto-drop didn't fire. Recover is crash mitigation, NOT routine error handling. Always prefer ! T E + \ for expected errors."

In total, the language is much more complete than I expected — it has generics, traits, sum types, pattern match, closures, optional/result types, try-propagation, mutexes, channels, panic/recover — but the long tail of "things production languages have" is sparse: no TCO, no async, only a recently-shipped LSP (whose feature surface is documented but unverified by this reviewer), no debug info beyond what LLVM produces by default, no proper module system, no package registry, only the recent local-path package manager.

4. Compiler engineering

4.1 Bootstrap pipeline

The bootstrap story is genuinely impressive, and to my mind the strongest single piece of engineering in the project. The pipeline is:

  1. The Python reference compiler compiler/nurlc.py (a minimal subset implementation) compiles compiler/nurlc.nu to LLVM IR, which clang lowers to build/nurlc_py.
  2. build/nurlc_py compiles nurlc.nu to build/nurlc_self (stage 1).
  3. build/nurlc_self compiles nurlc.nu to build/nurlc_self2 (stage 2).
  4. The build script requires byte-identical LLVM IR between stages 1 and 2 before accepting the build. The current fixed-point size is approximately 1.16–1.19 MB and is reported on every successful bootstrap run.

This sits in the same tradition as several production self-hosting compilers' fixed-point checks. GCC's own install docs explicitly require this discipline: "Perform a 3-stage bootstrap of the compiler … Perform a comparison test of the stage2 and stage3 compilers … If the comparison of stage2 and stage3 fails, this normally indicates that the stage2 compiler has compiled GCC incorrectly, and is therefore a potentially serious bug which you should investigate and report." Go's toolchain (per Russ Cox's "Perfectly Reproducible, Verified Go Toolchains" on go.dev/blog) builds toolchain1 → toolchain2 → toolchain3 and requires byte-for-byte equality between toolchain2 and toolchain3. OCaml has had a "fixpoint reached" gate in its bootstrap since PR #11149 ("Make the bootstrap process repeatable" by David Allsopp / dra27), whose description tells reviewers: "Each worker has successfully bootstrapped (search for 'fixpoint reached' in the logs)." Rust's rustc-dev-guide documents that "stage3 is byte-for-byte identical with stage2, only useful for verifying reproducible builds", and PR #144669 is in the process of formalising the CI gate. Zig has the same multi-stage pipeline. The unusual thing about NURL is that the compared artefact is LLVM IR text, not the final native binary. That's a slightly weaker check than a binary compare (you are immune to differences in clang's optimisation or linker behaviour between runs) and a slightly stronger one in that an IR-level diff is more interpretable for a human investigating a regression.

The self-hosted compiler nurlc.nu is approximately 5,340 lines of NURL — large enough to be a non-trivial program, small enough to be auditable. The Python compiler is explicitly described as "exist[ing] solely to bootstrap the self-hosting compiler" and implements only the subset of grammar v1.1 that nurlc.nu itself uses (no FFI, no enums, no defer, no try, no slice literals, no for-each, no generics). The asymmetry between the Python bootstrap subset and the full self-hosted language is documented honestly.

What is impressive is not that NURL self-hosts — many small languages do — but that the fixed-point gate is part of the regular build and that the project takes it seriously enough to ship daily roadmap entries reporting the new fixed-point byte size after each compiler change. That is exactly the discipline you want from a compiler that aspires to determinism.

4.2 Codegen and targets

The compiler emits LLVM IR and delegates native code generation to clang. Targets in the build scripts are:

  • Linux x86_64 — the primary dev target.
  • Windows x86_64 — fully supported via the same bootstrap on Windows; the runtime is pre-built with static libcurl using Schannel for TLS.
  • macOS x86_64 — cross-compiled from the API container via zig cc --target=x86_64-macos-none, linking only libSystem (no Apple SDK redistribution), with the binary explicitly unsigned and the caller responsible for clearing Gatekeeper quarantine. Runs on Apple Silicon via Rosetta 2. canvas/audio FFIs and libcurl-HTTP are not supported on this target — HTTP routes through stubs that return HttpErr::Other.
  • wasm32-wasi — via the WASI SDK 24.0 bundled into the API image.

The decision to use zig cc as a cross-toolchain for macOS deserves comment. It is the right decision. Zig's contribution to the cross-compilation ecosystem is exactly that it ships libc source for every supported target and builds it on demand, and its custom Mach-O linker is currently one of the few options for cross-compiling and cross-signing for macOS from Linux without a paid Apple toolchain. Using zig cc as a backend lets the NURL author ship a Linux container that produces Mach-O binaries without requiring contributors to own a Mac. The downside, as is documented in Zig's own issue tracker, is that zig cc does have rough edges on the Mach-O linker (segfaults under certain library combinations, weak-vs-needed linkage policy quirks for libSystem) — but for a project that is not yet relying on a full Apple SDK, the trade-off is sound.

4.3 The WebAssembly story

This is the most unusual piece of compiler engineering in NURL, and the part that I find most editorially interesting.

The same POST /build_wasm endpoint that the browser playground uses to compile user programs to WASI can be pointed at compiler/nurlc.nu itself. The result is nurlc.wasm, advertised at approximately 390 KB, which runs anywhere a WASI host runs — wasmtime, wasmer, Node's WASI, or a browser shim such as browser_wasi_shim. The README's buildwasm.sh / wasmnurl.sh scripts use this wasm-hosted compiler to compile arbitrary NURL programs, including its own source, and the bootstrap fixed point holds: nurlc.wasm recompiling its own source produces byte-identical IR to the native nurlc.

How does that compare?

ToolchainWasm artefactSize
NURL (self-hosted nurlc.wasm)full compiler~390 KB
Zig zig1.wasmbootstrap shim (not full compiler)2.6 MiB pre-opt, 2.4 MiB after wasm-opt -Oz
TinyGo (Fermyon favicon service)small microservice1.1 MB default; 396 KB with -no-debug; 377 KB after wasm-opt
AssemblyScript (typical demo)small function1.1–3.4 KB
Rust → wasm32-wasi (empty program)hello world~64 KB stripped
Rust rustc to wasmfull compilernot publicly published

The Zig figures are from Andrew Kelley's December 7, 2022 post "Goodbye to the C++ Implementation of Zig" on ziglang.org, which describes the bootstrap pipeline directly: "This produces a 2.6 MiB file. It is then further optimized with wasm-opt -Oz --enable-bulk-memory bringing the total down to 2.4 MiB." The TinyGo figures are from Fermyon's "Shrink Your TinyGo WebAssembly Modules by 60%" on fermyon.com: "We started with a 1.1M Wasm file from a Go source program. And we ended with a 377k version." Note that zig1.wasm is only a translated subset used to bootstrap the full self-hosted Zig compiler — it is not the full Zig compiler — and it is around six times the size of NURL's full self-hosted compiler. TinyGo's "favicon" microservice — a non-trivial Go program with HTTP, MIME, and routing — sits in the same ballpark as NURL's compiler.wasm only after aggressive stripping. For a complete compiler shipped as a WASI module, ~390 KB is genuinely small, and the existence of a working end-to-end "the compiler runs in the browser playground in your tab and can compile its own source" path is, to my knowledge, unique among small languages I am aware of.

This matters for two reasons. First, embeddability: a 390 KB compiler can plausibly be bundled with an editor, a sandbox, or an agent runtime as a library, not a separately shipped binary per OS. Second, supply-chain auditability: a wasm compiler that produces byte-identical IR to the native build of itself is a much simpler thing to reason about than a chain of OS-specific binaries.

4.4 Compiler quirks (the GOTCHAS)

docs/GOTCHAS.md lists the active compiler quirks newcomers will hit. As of the most recent roadmap entries the list has been pruned to five entries — several originally documented bugs (multi-field struct mutation through closures, mutable enum binding miscompile, multi-field Result Ok arm width, variadic FFI promotion) have been fixed and removed from the list — but the remaining items are non-trivial:

  1. & and | are strictly binary. & A B C is a parse-arity error whose diagnostic surfaces several lines later because prefix notation has no closing token. The author attempted an n-ary fix and rolled it back as "no whitelist of operand-start tokens disambiguates safely from prefix-notation ternary cond / then-arm boundaries".
  2. Bare @-fn names do not auto-coerce to a (@ R P*) closure parameter; you must wrap them in \ ... { ( fn args ) }. This is a real ergonomic cost in higher-order code; the stdlib has several eq_int/eq_string/cmp_int helpers that all need closure-wrapping when handed to vec_contains and friends.
  3. Same-line shadowing of parameters: : i z + z 7 inside a function whose parameter is z silently rebinds z. The compiler now emits a non-fatal warning:, scoped to parameter shadowing only.
  4. Ternary arity errors cascade because prefix notation has no closing token. The diagnostic always points at the wrong line.
  5. : ~-mutable multi-field structs captured by closures are captured by pointer, not by value. This is exactly the right thing for in-place accumulators (Phase 8 metrics) but a use-after-free hazard when the closure escapes its caller's scope. The compiler now warns for ^-return escapes; it does not yet warn for vec_push/vec_insert/thread_spawn escape paths.

These are honest compiler quirks. Three of the five are design choices rather than bugs (binary-only &/|, bare-fn no-auto-coerce, closure-borrow lifetime). Two are partial-help-from-compiler quirks where a non-fatal warning fires today and full enforcement is still in flight. The discipline of maintaining this document and pruning it as the compiler improves is itself a good signal about the project; the contents reveal that the language is not production-grade in the sense people normally use the phrase. It is genuinely usable for the kinds of programs the bundled examples exercise (a static file server, a Claude agent, a Wordcount), but a new contributor will hit at least items 1 and 4 within their first hour.

4.5 What is missing

The roadmap is honest. As of May 2026:

  • LSP: the README describes a build/nurl-lsp stdio JSON-RPC server with diagnostics, go-to-definition, hover, document symbols, completion, formatting, workspace symbol search, and folding ranges, wired to the bundled VS Code / Windsurf extension. The roadmap still lists "Full LSP support" as an open item, so the precise coverage of the shipped server versus a hypothetically more complete one is worth verifying in practice.
  • Formatter: nurlfmt shipped May 14, 2026 as a deterministic opinionated formatter (~750 LOC of NURL) with idempotence and IR-equivalence checks across 263 files. This is a real, modern, properly-tested formatter and a credit to the project.
  • Package manager: nurlpkg shipped May 16, 2026 in a Cargo-shaped six-phase implementation covering init/info/deps/add/remove/install/lock/verify. Local path-based deps only; registry-hosted dependencies are explicitly out of v1 scope.
  • CI: GitHub Actions are on the roadmap, not yet wired.
  • DWARF debug info: roadmap, not present.
  • Mobile / embedded targets: roadmap.
  • Async/await: design phase.

The fixed-size integer types and variadic FFI items, which sat on the roadmap a fortnight ago, are now shipped. The pace of shipping is unusually fast for a solo project; whether it remains so over twelve more months is a question the project's bus factor of one cannot answer.

5. Stdlib, runtime, and idioms

The standard library is organised in three layers:

  • stdlib/core/errors, io, mem, option, pair, result, slice, string, vec. The foundation.
  • stdlib/std/arena, bytes, channel, cmp, encode, float, fmt, fs, hash, hashmap, int, iter, log, net, panic, path, process, random, set, signal, sort, thread, time. The standard library proper.
  • stdlib/ext/anthropic, compress, csv, env, http* (eleven modules), json, manifest, mcp* (four modules), postgres, regex, sqlite, toml, uuid. Third-party-protocol-shaped bindings and a couple of FFI bridges.

The split is principled. core is the minimum surface the compiler itself needs; std adds platform abstractions; ext is the deliberately optional layer where heavier dependencies (libcurl, OpenSSL, libpq, libsqlite3, libz, libzstd) and protocol-specific code live. The pattern is reminiscent of Rust's core/std/external-crate split. The code is unusually well-commented for a solo project: most stdlib files open with a 30-to-100-line header explaining ownership invariants, memory model expectations, and API contract. The core/vec.nu file's header walks the reader through control-block layout, phantom-type semantics, the lack of a vec_clone, and per-element drop closures, in a tone that reads like a senior engineer leaving notes for the next maintainer.

Idiomatic API style is reminiscent of Rust without the trait soup. Errors are sum types (! T E), unwrapped with \ for propagation or ?? for pattern-match handling. Ownership is documented per-call ("BORROWED" vs "CONSUMED") in the doc comments. Generic instantiation is verbose at the call site (vec_push [Json] msgs ...) but never hidden. There is no method-call syntax; every call is ( fn args ), which means longer call chains than Rust's .foo().bar().baz(), but the prefix discipline is consistent.

The HTTP server module is unusually well-designed for a solo project. Phase 5.4 keep-alive measured a 38× speedup against the canonical examples/static_server.nu on 100 sequential /api/health requests (5152 ms → 136 ms). Phase 5.4.1 added pipelining correctness with an explicit regression test for a two-request sendall() packet. Phase 7 closed out the static-file lifecycle. Phase 8 production hardening added access log, Prometheus metrics, idle timeout, graceful shutdown via SIGINT/SIGTERM and CTRL+C/BREAK/CLOSE on Windows, per-request total timeout, configurable parser limits, and handler panic recovery built on the new stdlib/std/panic.nu model. Phase 9 added multipart/form-data parsing, reverse-proxy streaming pass-through (the AI-gateway use case), and server-side TLS via libssl. The honest gaps remaining in HTTP are HTTP/2 (deferred as a separate design doc) and WebSocket (estimated ~400 LOC, reuses Phase 1 sockets). Server-side TLS is single-cert per listener (no SNI), no ALPN, no client-cert auth, no live cert reload. These are documented v2 follow-ups.

The Anthropic SDK (ext/anthropic.nu) is genuinely surprising for a side project. The claude_messages_full_ex surface covers multi-turn conversations, tool definitions with tool_choice (auto/any/none/tool:NAME), prompt caching for both system prompt and trailing tool definition with separate cache_creation_input_tokens and cache_read_input_tokens accessors, extended thinking with a budget knob, multimodal blocks for image/document with both URL and base64 variants, streaming SSE with per-event extractors for text_delta, input_json_delta (tool-args streaming), content_block_start, content_block_stop, message_delta.stop_reason, and error.type/error.message. The 600-second default timeout (with ANTHROPIC_TIMEOUT_MS env override and a 15-second connect timeout) is the right value; Anthropic's own SDKs default to ten minutes for the same reason. The completeness is on par with — and in places more explicit about ownership than — Anthropic's official Python SDK at the same revision.

The MCP stack is the editorially significant piece. NURL ships:

  • ext/mcp.nu — server-side JSON-RPC envelope helpers, line-delimited JSON I/O on stdio, tool/prompt/resource shape builders, JSON-RPC error codes.
  • ext/mcp_http.nu — server-side HTTP transport with JSON-RPC batch requests, GET-SSE stub, Mcp-Session-Id echo (forward-compat).
  • ext/mcp_client.nu — HTTP-transport client.
  • ext/mcp_stdio.nu — stdio-transport client. This required a duplex-process runtime (nurl_proc_spawn + 12 ABI calls in runtime.c §16b: POSIX fork+pipe+execvp, CLOEXEC sideband for exec-error reporting, non-blocking poll-driven line reader, scratch-buffer line accumulator that survives chunk boundaries; Win32 and WASI return stubs).

The asymmetry between HTTP and stdio transports is exactly what the MCP ecosystem needs. Most MCP servers in May 2026 ship as command-line programs (filesystem, git, Postgres, …) that speak newline-delimited JSON-RPC on stdin/stdout; the HTTP-only client would not be able to consume them. The stdio implementation is genuinely careful — the mcp_stdio_call read loop matches JSON-RPC ids so server-initiated notifications (id-less frames) are auto-skipped without surfacing as the wrong response, and proc_free reaps via SIGTERM-with-timeout-then-SIGKILL.

As far as I can determine through directed search, no other small system language ships a built-in bidirectional MCP client+server in its standard library as of May 2026. The official MCP organisation maintains SDKs for TypeScript, Python, C#, Java, Kotlin, PHP, Ruby, Swift, and Go as separate repositories, not bundled into language standard libraries. Gleam has third-party Hex packages (mcp_toolkit, aide) but they are community libraries. Hare, Odin, Zig, Vale, Inko, Roc, V, Crystal, and Nim show essentially no first-class MCP integration in their own stdlibs. This makes NURL distinctive — though it is worth pointing out that nobody else has chosen to bundle MCP into a stdlib, and there are reasonable arguments for treating MCP as an application-layer concern. Whether NURL's choice is "first to do the obvious thing" or "scope creep into a protocol that will move underneath you" is genuinely an open question. (The official Model Context Protocol blog's March 9, 2026 roadmap post by lead maintainer David Soria Parra reframed governance around priority areas rather than dates — "The new document is organized around priority areas, rather than around dates. Working Groups drive the timeline for their deliverables" — meaning no firm release calendar is in public view; NURL's MCP code commits to a still-evolving target whose next milestone is now WG-driven rather than schedule-driven.)

6. The compiler as an MCP server

The piece of the project that most clearly distinguishes it from any small language I am familiar with is that the NURL compiler toolchain is itself exposed as a public, unauthenticated MCP server at https://play.nurl-lang.org/mcp.

What's exposed:

  • Build tools (4): nurl_build_native (Linux x86_64 ELF), nurl_build_windows (mingw-w64), nurl_build_macos (zig cc), nurl_build_wasm (WASI SDK).
  • Browse tools (3): nurl_list_examples, nurl_list_stdlib, nurl_list_tests.
  • Read tools (7): nurl_read_example, nurl_read_stdlib, nurl_read_test, nurl_read_grammar, nurl_read_readme, nurl_read_roadmap, nurl_read_gotchas.
  • Resources (7) mirror the read-tools as nurl:// URIs for clients that prefer resource semantics.
  • Prompts (1): nurl_coding_assistant, a compact grammar + canonical example primer.

The endpoint accepts the Streamable HTTP transport. The README explicitly suggests claude mcp add --transport http nurl https://play.nurl-lang.org/mcp to add the server to Claude Code, and provides the JSON config block for Claude Desktop, Cursor, Windsurf, and Zed. The caveats are documented: open, unauthenticated, container-sandboxed, source assumed logged, binaries returned but not executed.

Is this a real innovation or a gimmick?

The honest answer is "both, in ways worth disentangling". As a gimmick, the move is obviously rhetorically aligned with the project's pitch: a language built for language models has a compiler that language models can drive directly. The marketing energy is real. But the actual technical content is non-trivial. By exposing the compiler as a tool surface, the author is making three editorially significant choices:

  1. Toolchain as ABI. The compiler's command-line surface (build target, source string, return format) becomes a stable contract that the language designer commits to. Tool signatures changing is announced in CHANGELOG.md. This is unusual: most compilers do not treat their CLI surface as an externally-versioned API in this sense.
  2. Documentation as data. The grammar, README, roadmap, GOTCHAS, stdlib modules, examples, and tests are all exposed as named, addressable resources. An agent can ground itself in the language by reading these directly rather than relying on training-time priors. This is the same logic that motivates the prompt-primer (nurl_coding_assistant) — a deliberate grounding pathway rather than a hope that the model has memorised enough NURL.
  3. Sandboxed execution as a service. The container-level isolation gives outside agents a "compile and inspect logs" loop that they could not get from a code-suggestion model alone.

Whether the editorial pattern here generalises is the more interesting question. I think the answer is that something close to this pattern is going to become a baseline expectation for new languages over the next two to three years, and NURL is genuinely early. The combination of (a) MCP as a standardised tool-and-resource transport, (b) wasm as a portable sandbox, and (c) a small enough compiler to ship as wasm makes the cost of "expose your toolchain to an agent" small enough that the friction tilts in favour of doing it for any new language whose audience includes LLM users. NURL is several steps along this curve. The choice is not a gimmick; the framing might be.

Two cautions. First, an unauthenticated public endpoint that runs clang on attacker-supplied source is a non-trivial security posture. The author acknowledges this and limits exposure to container sandboxing, returning binaries rather than executing them. A serious production deployment will need rate limiting, source-origin attestation, and probably authentication. Second, the value of "agent reads the docs through MCP" depends on the docs actually being current; this is true for NURL today because the project is small and the author is shipping daily, but it is the kind of property that decays in larger projects without a strong CI-driven freshness gate.

7. Positioning against other small languages

Let me say briefly where I would situate NURL on the landscape.

Hare and NURL share the minimalism ethos, but Hare actively rejects generics-in-stdlib (the well-known critique that even hash tables are left to the user), where NURL embraces monomorphising generics with Vec[A] and HashMap[K,V] in the core. Hare's syntax is a thoughtful descendant of C; NURL's is unrelated to any production language. They are minimalist for different reasons.

Odin is the language I would most readily recommend to someone who wanted "C with modern ergonomics and no GC"; Odin is human-ergonomic in a way NURL emphatically is not. The Odin FAQ explicitly notes that the language is not (yet) self-hosted. NURL is.

Zig is the closest neighbour on the cross-compilation / direct-LLVM / explicit-allocator axis. Zig has more polish, more contributors, a more mature comptime story, and a much larger stdlib. NURL leans on Zig as a cross-toolchain. NURL is much smaller as a language and much less mature as an implementation. NURL's WASI compiler-in-wasm story is, surprisingly, more end-to-end than Zig's; Zig's zig1.wasm is a 2.4–2.6 MiB bootstrap shim, not a full compiler.

V has been criticised for over-promising and under-delivering on safety and concurrency guarantees; NURL is more honest in its scope claims. NURL is also smaller and less promoted.

Roc and Gleam are functional languages with strong type systems and different runtime stories (Roc's platform-as-host model; Gleam's BEAM target). They are not really comparable to NURL at the language-design level beyond "all four have nice Result/Option types".

Vale and Hylo are the research neighbours on memory-model invariants; both are more ambitious than NURL on safety without GC, but neither is anywhere near NURL's current level of usable toolchain.

Inko is in some ways a kindred spirit — a small language with an opinionated memory model (capabilities + isolation), single primary maintainer, well-thought-out design. Inko prioritises memory safety more than NURL does and is less concerned with LLM-targeted token efficiency.

If you wanted to do real systems work today and were not constrained by the LLM angle, I would point you at Zig or Odin first, and at Rust if you were willing to pay the learning cost. NURL is differently shaped: it is a language whose primary thesis is about the generator and consumer of source code, not about safety or runtime semantics. That thesis happens to be backed by a more competent compiler than is usual for a manifesto-driven project, which is what makes it worth reviewing.

8. Strengths

In order of how unusual I find them in projects of this size:

  1. A working byte-identical-IR fixed-point bootstrap, executed daily and reported in the roadmap. This is the property I most associate with serious compiler projects. NURL has it.
  2. A self-hosting compiler that fits in approximately 390 KB of WASI and recompiles itself under wasmtime. This is rare. It opens up embeddability and supply-chain auditability stories the language could exploit.
  3. An HTTP server that actually understands keep-alive, pipelining, multipart, TLS, panic recovery, and Prometheus metrics, with regression tests for each. This is not a toy.
  4. A complete Anthropic SDK with multimodal blocks, prompt caching, extended thinking, streaming with tool-use partial-JSON deltas — on par with first-party SDKs at similar revisions.
  5. Bidirectional MCP: server, HTTP client, stdio client. A duplex-process runtime exists in runtime.c to make the stdio client work. As far as I can determine, NURL is the only small language whose stdlib ships this.
  6. Honest internal documentation. The README, ROADMAP, GOTCHAS, and stdlib headers are written in the tone of someone who expects another engineer to inherit the codebase and wants that engineer to succeed.
  7. A coherent rhetorical thesis (language-for-LLMs) that is consistently expressed in the language design, the compiler's MCP surface, and the toolchain's wasm embeddability.

9. Weaknesses and risks

Also in order, the highest-impact risks first:

  1. Bus factor of one. The project has effectively one developer. The pace of shipping is impressive, but the project cannot survive that person stepping away without a deliberate handoff.
  2. The memory model is honestly less safe than the marketing implies. Single-owner with auto-drop is not a borrow checker. The remaining GOTCHAS — particularly closure-borrow lifetime escapes via vec_push and thread_spawn, where the compiler does not yet warn — are real bugs waiting to bite. Anyone using NURL for systems work should plan for ASan/UBSan CI from day one, which the project does not yet have.
  3. Syntax is genuinely costly for humans. Single-character keywords, prefix-everything, no operator precedence, mandatory parens around every call: every one of these is defensible in isolation, but the combination guarantees a steep onboarding curve. Whether LLMs find this easier than humans is asserted but not measured.
  4. The MCP commitment moves underneath you. The MCP spec governance shifted to a Working-Group-driven, priority-area model (as documented on blog.modelcontextprotocol.io, March 9, 2026), and NURL pins protocolVersion: "2024-11-05". Continuous integration against the moving spec is not yet automated; this is a maintenance debt that will compound.
  5. No third-party adoption. This is unsurprising for a project of this age (4 stars, 2 forks, 2 commits as of May 18, 2026), but it means the language has not yet been tested by anyone other than the author. Most of the "fitness for purpose" claims are unfalsified.
  6. Several compiler quirks remain. Ternary arity cascading and binary-only &/| are particularly painful for diagnostic clarity. They are partly tooling problems (better error recovery would help), but the prefix-notation grammar makes them harder to solve than they would be in a delimited language.
  7. No proper module system. $-imports are inline-include with optional name-rewriting. This will become a real scaling problem above some line-count threshold.
  8. The "language built for LLMs" thesis is not yet empirically supported. The README's token-count comparison (NURL ~13 vs Python ~46 for sum-to-N) is suggestive but rests on hand-counted tokens for cherry-picked examples. There is no published benchmark on, say, model success rates at writing NURL versus writing Python or Rust. This is the central empirical claim of the project, and it is unmeasured.

10. Open questions

A handful of questions I would put to the author and to the wider PL community:

  1. Is single-owner + auto-drop the right operating point for LLM-generated systems code? I find the argument plausible but not proved. A small empirical study — measure model pass-rates on a fixed benchmark, comparing NURL against C, Rust, and Zig with the same model and the same prompt budget — would be the single most important piece of evidence this project could produce.
  2. What happens to the prefix grammar at scale? The bundled examples are at most a few hundred lines. claude_agent.nu is genuinely substantial at 13 kB. The self-hosted compiler is large. Are there human maintainers who can read 50,000 lines of NURL fluently? Are there models?
  3. Should the compiler's MCP surface become the canonical "language addressable by agents" pattern? I think yes, with caveats. The pattern is clearly directional. Other language designers should consider whether their compilers should ship an MCP surface, and what the security boundary is.
  4. Should the bootstrap fixed-point compare binary artefacts, not LLVM IR text? GCC, Go, Rust, OCaml, and Zig all compare binaries. NURL's IR-level compare is interpretable but weaker. Adding a binary-level compare on top of the IR-level one would be cheap and worth doing.
  5. How does the project intend to manage MCP-version drift? The protocol is explicitly evolving under a Working-Group-driven roadmap. Pinning protocolVersion: "2024-11-05" is a temporary measure. A per-method capability negotiation strategy is on the roadmap as "MCP HTTP enhancements — partial" and is the right shape; the question is whether the maintenance burden grows faster than the maintainer can keep up.
  6. Should the language move toward Rust-style affine types and a real borrow check? This would make NURL significantly safer but would also break the "easy for LLMs to generate" thesis if borrow-checker errors become a frequent generation failure mode. The tension is real and unresolved.
  7. Is pub enough, or does NURL need namespaces? $-import-with-alias gets you part of the way; real namespaces would get you the rest. The package manager exists. The next obvious step is to graduate path-based deps into a registry-shaped resolution model and pair that with a more substantive module system.

11. Conclusion: what would I actually do with NURL today?

I would not use NURL in production. I would use it to think with.

If you are building a programming language right now and you want to engage seriously with the question of what code looks like when most of it is generated by language models, NURL is the most concretely-built artefact I have found that takes the question seriously. The language designer has committed to choices most projects are unwilling to commit to — single-character keywords, prefix notation, no operator precedence, monomorphising generics with manual instantiation, single-owner ownership without a borrow checker — and is honest about the consequences. The compiler is good enough to bootstrap with byte-identical IR, ship as a 390 KB WASI module, cross-compile to three native targets via clang and zig cc, and host an HTTP server with multipart, TLS, and reverse-proxy streaming. The MCP integration is editorially significant.

What I find unconvincing is the empirical case that LLMs actually generate NURL better than they generate, say, Rust. The token-count table in the README is suggestive but not measured. The fact that the language is genuinely unfamiliar to today's frontier models — none of them have seen meaningful amounts of NURL during pretraining as of mid-2026 — cuts both ways: the regular grammar makes in-context grounding work well, but the lack of training-time priors means models will need explicit grounding (via nurl_coding_assistant or the documentation MCP resources) on every fresh context. Whether that grounding scales better than asking the same model to write Rust depends on questions that have not been answered.

The thing I would adopt today is the pattern, not the language. Specifically:

  • A small, self-hosted compiler that bootstraps to byte-identical IR and ships as a WASI module is a good idea for any new language project. The cost is modest. The supply-chain auditability is high. NURL demonstrates it can be done.
  • Exposing the compiler toolchain as an MCP server — with build tools, documentation as named resources, and a coding-assistant primer prompt — is a good idea for any new language whose audience includes LLM users. The cost is again modest if you have a wasm build of your compiler. NURL demonstrates this too.
  • Single-owner-plus-auto-drop is a defensible memory model for a particular class of program if you are honest about its limits. NURL is honest about its limits. It is also a more usable model than C without the conceptual overhead of a borrow checker, which makes it appealing for prototyping.

What would change my recommendation? Three things, in order:

  1. A published empirical study showing that current frontier models actually pass more correctness benchmarks per generated token in NURL than in equivalent existing languages, with code available for replication.
  2. A larger community — even a half-dozen independent contributors and a working CI gate — that would raise the bus factor above one.
  3. Either a real borrow check or aggressive runtime instrumentation (ASan-by-default) that closes the use-after-free hole the current closure-borrow semantics leave open.

Until then, I will keep NURL bookmarked. It is the most interesting "post-LLM-era programming language" I have encountered, partly because it actually exists as a working compiler and not as a manifesto, and partly because the design choices feel like they were made by someone who had thought through the implications, not just the marketing. The fictional origin makes that biographically appropriate; the technical artefact makes it real. Both of those facts together are unusual, and they are why this language deserves the attention I have given it here.


Disclosures: This review was conducted from primary sources (the NURL GitHub repository, the live playground and MCP endpoint, and the project website) over several days in May 2026. The reviewer is not affiliated with the project, has no commercial interest in NURL, and is publishing as an independent technical commentator.

Content is user-generated and unverified.
    NURL Programming Language: Technical Peer Review & Evaluation | Claude