Architecture
This page is the canonical reference for how WeaveFFI works internally. It is the document new generator authors and contributors should read before making non-trivial changes; all other documentation is consumer- or library-author-facing.
High-level pipeline
Every weaveffi generate invocation flows through the same five
stages, in this order:
IDL file (YAML/JSON/TOML)
│
▼
Parse ── weaveffi-ir::parse: produces an `Api` IR
│
▼
Validate ── weaveffi-core::validate: rejects errors, collects warnings
│
▼
Resolve ── weaveffi-cli `CliConfig`: merges --config TOML and the
│ inline generators: section into each target's typed config
▼
Generate ── weaveffi-core::codegen::Orchestrator: dispatches every
│ selected target generator in parallel via rayon
▼
Output ── Each generator writes its files under {out_dir}/{target}/
and updates {out_dir}/.weaveffi-cache/{target}.hash
Subcommands like validate, lint, diff, format, and watch re-use
the parse and validate stages; generate, diff, and watch
additionally exercise resolve and generate.
Crate layout
The workspace is structured as a small set of stable, focused crates. The dependency graph is acyclic and shallow:
weaveffi-cli ──► weaveffi-core ──► weaveffi-ir
│
├──► weaveffi-gen-c
├──► weaveffi-gen-cpp
├──► weaveffi-gen-swift
├──► weaveffi-gen-android
├──► weaveffi-gen-node
├──► weaveffi-gen-wasm
├──► weaveffi-gen-python
├──► weaveffi-gen-dotnet
├──► weaveffi-gen-dart
├──► weaveffi-gen-go
└──► weaveffi-gen-ruby
weaveffi-abi ──► (stand-alone, linked at run time by every cdylib that
exposes the WeaveFFI C ABI)
weaveffi-fuzz ──► weaveffi-ir, weaveffi-core (workspace-private; unpublished)
| Crate | What it owns |
|---|---|
weaveffi-ir | The IR types (Api, Module, Function, TypeRef, …), the parse_api_str parser, the parse_type_ref mini-grammar, and CURRENT_SCHEMA_VERSION. |
weaveffi-abi | Stable C ABI runtime symbols: weaveffi_error, weaveffi_error_clear, weaveffi_free_string, weaveffi_free_bytes, the arena, cancel tokens. |
weaveffi-core | The Generator trait, the LanguageBackend framework + driver, the Orchestrator, the abi C-ABI lowering model, the BindingModel, validation rules, generator config resolution, and the per-generator hash cache. |
weaveffi-gen-* | Eleven generator crates. Each implements LanguageBackend (bridged to Generator by impl_generator_via_backend!) and produces target-specific output (header, wrapper, package metadata). |
weaveffi-cli | The weaveffi binary. Parses the IDL, applies validation, instantiates every generator (via the cli_targets! registry), and dispatches the Orchestrator. Self-contained subcommands live in their own modules (doctor.rs, extract.rs, scaffold.rs). |
weaveffi-fuzz | cargo-fuzz harnesses for the parsers, the validator, and parse_type_ref. Workspace-private (not published to crates.io). |
Crates that contain unsafe code (weaveffi-abi, every samples/*
cdylib, weaveffi-fuzz, and the scaffold output emitted by
weaveffi generate --scaffold) opt in with
#![allow(unsafe_code)] at the top of their main source file. The
workspace-wide unsafe_code = deny lint forbids it everywhere else.
CLI internals
weaveffi-cli is split so that main.rs holds only argument parsing and
command dispatch; each self-contained subcommand group lives in its own
module:
| Module | Responsibility |
|---|---|
main.rs | clap definitions, the cli_targets! registry, and dispatch. |
doctor.rs | weaveffi doctor: probes host toolchains per target. |
extract.rs | weaveffi extract: derives an IDL from annotated Rust source. |
scaffold.rs | the Rust producer stubs emitted by weaveffi generate --scaffold. |
The cli_targets! registry
The 11 language targets used to be spelled out a dozen times (config
struct fields, the --target parser, inline-generator merging, and the
Orchestrator wiring). They now live in one declarative macro,
cli_targets!, invoked once near the top of main.rs:
#![allow(unused)]
fn main() {
cli_targets! {
"c" => c: CConfig via CGenerator,
"cpp" => cpp: CppConfig via CppGenerator,
"swift" => swift: SwiftConfig via SwiftGenerator, strip,
// … one line per target …
"ruby" => ruby: RubyConfig via RubyGenerator,
}
}
That single invocation expands to the CliConfig struct (one typed
field per target), build_generators, apply_inline_target, and the
strip_module_prefix/input-stamping fan-out. Adding a language is a
one-line change here; see Adding a new generator.
Format canonicalization
weaveffi format (and format --check) round-trips an IDL through the
IR and re-serializes it, so the on-disk form is canonical. For the
check to be a no-op on an already-formatted file, serialization must omit
every field that is at its default; otherwise serde would inject
null, [], and false noise that the parser then drops on the next
read, making format non-idempotent. The IR types therefore tag their
optional/defaulted fields with #[serde(skip_serializing_if = …)]
(Option::is_none, Vec::is_empty, and a local is_false for booleans
that default to false). This keeps canonical IDLs terse and makes
format idempotent; it also removes the now-meaningless default
annotations from the generated weaveffi.schema.json.
The IR
weaveffi_ir::ir defines a small algebraic type system. The shapes
that matter most:
Api { version, modules, generators }: root node.Module { name, functions, structs, enums, callbacks, listeners, errors, modules }: modules can nest.Function { name, params, returns, doc, async, cancellable, deprecated, since }.TypeRefenumerates every supported type reference: primitives (I32,U32,I64,F64,Bool,StringUtf8,Bytes,Handle,BorrowedStr,BorrowedBytes), user types (Struct(String),Enum(String),TypedHandle(String)), and the four composite shapes (Optional,List,Map,Iterator).
Every IR type derives Debug, Clone, PartialEq, Serialize, and
Deserialize. Eq is derived where possible; a few types (Api,
Module, StructDef, StructField) intentionally omit Eq because
they transitively contain f64 (in default values) or
serde_yaml::Value.
TypeRef (de)serializes as a string with custom syntax (i32,
handle<T>, [T], {K:V}, T?, &str, &[u8]). The parser is
weaveffi_ir::ir::parse_type_ref; both human-written IDL and the
JSON Schema export rely on it.
Schema versioning
CURRENT_SCHEMA_VERSION (currently "0.4.0") lives in
crates/weaveffi-ir/src/ir.rs. Pre-1.0, SUPPORTED_VERSIONS
contains exactly the current version; older schema revisions are rejected
by validation with an actionable error. When you change the schema:
- Bump
CURRENT_SCHEMA_VERSION(and theweaveffi-irminor version). - Document the changes in
CHANGELOG.mdunder a “Migration” section. - Update every sample IDL, the
weaveffi newtemplate, the README quickstart, and the Getting Started doc.
The stability page is the external contract; this section is the implementation note.
Validation
weaveffi_core::validate::validate_api is the single entry point.
It returns a Vec<ValidationError> (errors that must be fixed before
generation) and a separate Vec<ValidationWarning> (advisory; the
lint subcommand surfaces these).
Errors enforced today:
- Identifier well-formedness (
is_valid_identifier). - Reserved keyword rejection (
if,else,for,while,loop,match,type,return,async,await,break,continue,fn,struct,enum,mod,use). - Uniqueness of module/function/parameter/struct/enum/field/variant names within their respective scopes.
- Structs must have at least one field; enums at least one variant.
- Enum discriminant uniqueness within an enum.
- Type references resolve within the enclosing module chain (cross-sibling references are rejected; see Cross-module references).
- Iterator return types are valid in return position only.
- Map keys must be a primitive or enum type.
event_callbackon a listener must reference a callback in the same module.- Error domain name must not collide with a function name in the same module; codes must be non-zero and unique.
Warnings emitted today:
LargeEnumVariantCount(>100 variants).DeepNesting(composite types nested deeper than 3 levels).EmptyModuleDoc(nodoc:on any function in the module).AsyncVoidFunction(async without a return type).MutableOnValueType(mutable: trueon a non-pointer parameter).DeprecatedFunction(informational).
Async functions, cancellable functions, listeners, callbacks,
iterators (iter<T>), typed handles (handle<T>), borrowed types
(&str, &[u8]), nested modules, and cross-module type references are
all first-class. They pass validation and every generator handles
them. Do not re-add validator rejections for these features.
The one exception is per-target capability gating: each generator
declares a TargetCapabilities (async, callbacks, listeners,
iterators), and the orchestrator fails generation (listing the
offending IDL definitions) when a selected target cannot deliver a
used feature. Today only WASM declares gaps (callbacks and listeners);
its allow_unsupported = true config opts into generating the rest of
the surface with explicit throwing stubs in place of the unsupported
entry points. Capability failures must stay loud: never skip a
definition silently.
Generator configuration resolution
There is no single global config object. Each generator owns its own
typed Generator::Config (CConfig, SwiftConfig, PythonConfig, …),
so adding a knob to one target only touches that target’s crate. The CLI
gathers all of them into one CliConfig struct (generated by the
cli_targets! macro, one field per target) and resolves it from three
sources (later wins):
- Defaults baked into each
Config::default(). - The
--config <file.toml>external file passed togenerate. - The inline
generators:section of the IDL.
The IDL section is the project-local source of truth and overrides any
machine-local TOML; see the
Generator Configuration guide.
Each resolved config is hashed (via serde_json) into the per-generator
cache key, so a config-only change re-runs just that target.
Orchestrator
weaveffi_core::codegen::Orchestrator coordinates the generator stage:
- If
--forceis set, every cache entry under{out_dir}/.weaveffi-cache/{target}.hashis invalidated. - For each registered generator, the orchestrator hashes
(api, generator.name(), config)and compares against the persisted hash, so an IR or config change re-runs just the affected target. - If a
pre_generatehook is configured (OrchestratorHooks), the orchestrator shells out to it (cmd on Windows, sh elsewhere) and aborts on non-zero exit. - The pending generators run in parallel via
rayon::par_iter. Generators must therefore beSend + Sync. post_generateruns once after every generator has succeeded.- Each successful generator’s hash is persisted.
This per-generator caching is what lets weaveffi generate skip every
target whose IR has not changed since the last run; see the
Generator Configuration guide.
The Generator trait and the language-backend framework
The orchestrator consumes the object-safe Generator trait
(weaveffi_core::codegen::Generator). Each generator owns a typed,
serializable Config; the orchestrator stays config-agnostic by working
through the object-safe DynGenerator view:
pub trait Generator: Send + Sync {
/// Per-target options. Must round-trip through `serde_json` so the
/// orchestrator can fold the config into the cache key.
type Config: Serialize + Default + Clone + Send + Sync;
/// Stable short name (`"swift"`, `"c"`, …): the `--target` token and
/// the per-generator cache-file basename.
fn name(&self) -> &'static str;
/// Render the bindings under `out_dir`.
fn generate(&self, api: &Api, out_dir: &Utf8Path, config: &Self::Config) -> Result<()>;
/// Files `generate` would write (used by `--dry-run` and `diff`).
fn output_files(&self, api: &Api, out_dir: &Utf8Path, config: &Self::Config) -> Vec<String>;
}
To erase the associated Config, a typed generator is paired with a
concrete config value via ConfiguredGenerator::new(gen, config), which
implements the object-safe DynGenerator trait the Orchestrator
stores. The CLI builds one ConfiguredGenerator per selected target
from the resolved CliConfig.
LanguageBackend and the shared driver
Generators are not written against Generator directly. Each target
implements weaveffi_core::backend::LanguageBackend and is bridged to
Generator by the impl_generator_via_backend! macro, so the model
construction, the file I/O, and the output_files derivation live in one
place instead of being re-implemented eleven times:
pub trait LanguageBackend: Send + Sync {
type Config: Serialize + Default + Clone + Send + Sync;
fn name(&self) -> &'static str;
/// C ABI symbol prefix; the driver builds the `BindingModel` with it.
fn prefix<'a>(&self, config: &'a Self::Config) -> &'a str { "weaveffi" }
/// The single required hook: assemble every output file. Rendering is
/// pure; the driver performs the actual writes.
fn files(&self, api: &Api, model: &BindingModel,
out_dir: &Utf8Path, config: &Self::Config) -> Vec<OutputFile>;
/// Canonical per-module walk (enums → structs → callbacks → listeners
/// → functions) with call-shape dispatch. Single-pass backends override
/// the `render_enum`/`render_struct`/`render_function` hooks and call
/// this; multi-pass backends build their layout in `files` directly.
fn emit_members(&self, out: &mut String, module: &ModuleBinding, config: &Self::Config) { /* … */ }
// render_enum / render_struct / render_callback / render_listener /
// render_function: all default to no-op.
}
The free backend::run builds the BindingModel once (with the
backend’s prefix), calls files, and writes each OutputFile
(creating parent directories). backend::output_files calls the same
files and returns the sorted path list, so generate and
output_files are derived from a single source and cannot drift.
Python is the reference single-pass backend (it overrides the per-entity
hooks and composes emit_members); Ruby, .NET, Node, and Android are
multi-pass (their FFI declarations, wrapper classes, and secondary
surfaces such as the JNI C shim are emitted in their own passes inside
files).
Generators emit code by direct string construction; there is no
template-engine layer (an early Tera prototype intended for user
template overrides was removed in 0.4.0 because nothing read from it).
Shared rendering infrastructure lives in weaveffi_core:
backend: theLanguageBackendtrait, therun/output_filesdriver, theOutputFiletype, and theimpl_generator_via_backend!bridge macro.model::BindingModel: the normalized, fully-lowered view every backend renders from (precomputed C symbol names and ABI signatures).codegen::common: module-tree traversal (walk_modules,walk_modules_with_path), theis_c_pointer_typeABI classifier, doc-comment emission (emit_doc), andpascal_casenaming.
The signatures above use Result<T> from anyhow and IR types from
weaveffi_ir; consult those crates for the precise import set.
Implementation notes:
- Implement
name()(the--targetflag value, e.g."swift"), the associatedConfigtype, andfiles(); overrideprefix()when the config carries a configurablec_prefix. - Return every emitted file from
files();--dry-runandweaveffi diffread the derivedoutput_files, so there is no separate list to keep in sync. - All paths are joined under
out_dir; do not write outside the passed directory or you will break the per-generator cache. - Generators run in parallel; share no mutable state across calls.
C ABI naming convention
Every emitted C symbol follows
{c_prefix}_{module}_{function} (default c_prefix = "weaveffi").
The c_prefix configuration is honored end-to-end: when set, the
generated C output uses it consistently, including references to
weaveffi-abi runtime symbols ({c_prefix}_error,
{c_prefix}_error_clear, {c_prefix}_free_string,
{c_prefix}_free_bytes).
Struct lifecycle, enum constants, and getter symbols follow the patterns in the C generator reference.
The ABI lowering model
The C ABI is the foundation every binding sits on: a flat, C-callable
surface where each IDL type lowers to a fixed sequence of C parameters.
A string becomes one const char*; bytes becomes
const uint8_t* {name}_ptr, size_t {name}_len; a map<K,V> becomes
parallel {name}_keys / {name}_values / {name}_len slots;
collection and out-of-band returns append out_* pointers; and every
fallible call ends with a trailing {prefix}_error*.
That calling convention is defined once, in
weaveffi_core::abi, rather than re-derived inside each
generator:
CType: a prefix-agnostic algebra of C types (Int32,Size,Ptr { pointee, const_pos },StructTag { module, name }, …) with a singlerender_c(prefix)method that produces canonical C spelling.element_ctype(ty, module): the C type of a single element.lower_param(name, ty, module, mutable): expands one IDL parameter into its orderedAbiParamslots.lower_return(ty, module): the returnCTypeplus any trailingout_*AbiParams.callback_result_params(ty, module): the trailing slots an async callback receives after(context, err).
The C and C++ generators render these slots straight to C
declarations, so their headers are the model by construction. The
declarative consumer generators (Python, Ruby, .NET) call the same
lower_* functions and map each CType onto their own FFI vocabulary
(ctypes.c_*, Ruby FFI symbols, P/Invoke IntPtr/UIntPtr). This is
what guarantees the producer header and every consumer agree on the
parameter arity and order of a symbol: the class of drift that
previously hid in a dozen hand-written copies of the lowering.
A few conventions are genuinely language-specific and stay local to their generator rather than leaking into the shared model:
- Iterator returns. The C ABI returns an opaque iterator handle
(
{prefix}_{module}_{Iter}*) while other backends model the same slot differently, solower_returnrefuses anIteratorand each caller lowers it explicitly. byrefout-params. ctypes (Python) and P/Invoke (.NET) express a map return’sout_keys/out_valueswith an extra pointer level or the C#outkeyword; those renderings stay in the respective generator.
Imperative generators (Go cgo, Node, Dart, Swift) build their FFI
signatures inline with marshalling code and share the single
is_c_pointer_type classifier in weaveffi_core::codegen::common. The
Android (JNI) and WASM backends target different ABIs entirely and do
not consume the C lowering.
When you add a parameter shape or change how a type crosses the
boundary, change weaveffi_core::abi and let the consumers inherit it;
the snapshot suite will show every generator the edit touches.
Determinism
Regenerating with the same WeaveFFI version on the same IDL produces byte-identical output.
The contract is enforced by determinism tests in the snapshot suite.
Internally, every HashMap iteration that contributes to generated
output has been replaced with BTreeMap or an explicit sort, and the
serde_json-backed cache key uses canonical ordering.
If you need to iterate a map inside a generator, use BTreeMap or
collect to a Vec and sort_by_key. Never rely on HashMap
iteration order for output; CI snapshot tests will fail
non-deterministically on different platforms or insta orderings.
Snapshot tests
crates/weaveffi-cli/tests/snapshots.rs runs every generator across a
nine-fixture corpus (tests/fixtures/01_calculator … 09_nested_modules:
calculator, contacts, inventory, async-demo, events, kitchen-sink,
docs-everywhere, kvstore, and nested-modules). Output is diffed via
cargo-insta. When a snapshot diff is intentional:
cargo install cargo-insta --locked
cargo test -p weaveffi-cli --test snapshots
cargo insta review
Press a to accept, r to reject, s to skip. Commit accepted
.snap files in the same commit as the code change that produced
them; never commit .snap.new. CI rejects pending snapshots.
The harness redacts the WeaveFFI version in each file’s generated-by
prelude to [VERSION] before snapshotting (and separately asserts the
real prelude is present), so a routine version bump does not invalidate
every snapshot in the corpus.
Adding a new generator
A condensed checklist (the long version lives in
CONTRIBUTING.md):
- Create
crates/weaveffi-gen-<lang>/mirroring the layout ofweaveffi-gen-c. Add it tomembersin the rootCargo.tomland depend onweaveffi-coreandweaveffi-ir. - Implement
weaveffi_core::backend::LanguageBackend: define the associatedConfigtype, thenname,prefix(if the config carries ac_prefix), andfiles(returning everyOutputFile). For a single-pass layout, override therender_enum/render_struct/render_functionhooks and composeemit_members; otherwise build the layout directly infiles. Then addweaveffi_core::impl_generator_via_backend!(<Generator>);to bridge it toGenerator(this derivesgenerateandoutput_files). ReuseBindingModelandweaveffi_core::codegen::commoninstead of re-deriving traversal or ABI classification. - Wire the generator into the
cli_targets!registry macro incrates/weaveffi-cli/src/main.rs: add one line ("<name>" => <field>: <Config> via <Generator>, plusstripif the generator honorsstrip_module_prefix). That single entry is the source of truth: it expands to theCliConfigfield, the--target <name>parser entry, inline-config merging, and theOrchestratorregistration. No other CLI edits are required. - Add snapshot fixtures in
crates/weaveffi-cli/tests/snapshots.rscovering at minimum the calculator, contacts, inventory, async-demo, and events sample IDLs. - Document the generator under
docs/src/generators/<lang>.mdand link it fromdocs/src/SUMMARY.md. - Add a consumer example under
examples/<lang>/and wire it intoexamples/run_all.sh. - Add
scripts/publish-crates.shto the dependency-ordered publish list (only when the crate is ready to be released).
Where to read next
- IDL Schema: the type system and validation rules from a user’s perspective.
- Generator Configuration: every option a consumer can set.
- Stability and Versioning: what counts as a breaking change once we hit 1.0.
- Memory Ownership: the per-target memory rules every generator must enforce.
- Async Functions: the per-target async invariants every async-capable generator implements.