Add persistent_term support by petermm · Pull Request #2323 · atomvm/AtomVM

petermm · 2026-05-31T10:59:57Z

Note the lack of GC in this implementation for erased/replaced persistent_terms, that is for followup..

Implement a VM-level persistent_term table with NIFs for get/0,1,2, put/2, put_new/2, erase/1, and info/0.

Persistent terms are copied into VM-owned heaps on write and returned without copying on read. Replaced and erased entries are retained until VM shutdown so previously returned terms remain valid without an OTP-style global GC pass, and info/0 reports retained memory.

Also share the ETS term hashing helper for persistent_term lookup and add coverage for put/get/erase behavior, retained values, info/0, complex keys, and function keys.

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

petermm · 2026-06-01T19:48:24Z

PR Review: persistent_term + term_hash extraction

Scope: last two commits on this branch.

51ab669e — Add persistent_term support
67707b01 — Fix CodeQL warning in 64-bit term hash folding

Verdict

No blocking soundness issues found.

The headline design choice — skipping OTP's global GC pass on
replace/erase — is safe specifically because retired/erased entries
are retained until VM shutdown. A previously returned raw term can
never become a use-after-free, because nothing ever frees the heap it
lives in until the GlobalContext is destroyed. This trade-off is
documented in libs/estdlib/src/persistent_term.erl.

The CodeQL fix in src/libAtomVM/term_hash.c hash_uint64() is correct:
promoting h to uint64_t before the multiply silences the warning
while preserving identical mod‑2³² results after the cast back to
uint32_t.

Findings below are prioritized.

P1 — Document the retention invariant near `retire_entry()`

The "no global GC" design is only sound because retired/erased
entries are kept alive until VM shutdown. This invariant lives only in
the Erlang wrapper's edoc today. A future cleanup PR could easily
"optimize" retire_entry() into an immediate entry_destroy() and
silently introduce a UAF: process heaps may still contain raw term
pointers into the retired heap.

Add a comment at the C call site so the constraint is visible to
anyone touching this code.

--- a/src/libAtomVM/persistent_term.c
+++ b/src/libAtomVM/persistent_term.c
@@ -341,6 +341,14 @@ static void entry_destroy(struct PersistentTermEntry *entry, GlobalContext *glob
     free(entry);
 }

+/*
+ * Retire an entry instead of freeing it.
+ *
+ * persistent_term:get/1 returns the value term without copying, so process
+ * heaps may contain raw pointers into entry->heap after a subsequent
+ * put/2 or erase/1. AtomVM does not implement OTP's global GC pass, so
+ * retired entries must remain allocated until globalcontext destruction.
+ */
 static void retire_entry(PersistentTerm *persistent_term, struct PersistentTermEntry *entry)
 {
     entry->next = persistent_term->retired_entries;

Effort: trivial.

P1 — `persistent_term:info/0` doesn't check the `count` atom allocation

src/libAtomVM/nifs.c nif_persistent_term_info() calls
globalcontext_make_atom(...) and feeds the result straight into
term_set_map_assoc(). On atom-table OOM that returns
term_invalid_term() and the resulting map is malformed.

--- a/src/libAtomVM/nifs.c
+++ b/src/libAtomVM/nifs.c
@@ -4636,12 +4636,16 @@ static term nif_persistent_term_info(Context *ctx, int argc, term argv[])
         RAISE_ERROR(OUT_OF_MEMORY_ATOM);
     }

+    term count_atom = globalcontext_make_atom(ctx->global, ATOM_STR("\x5", "count"));
+    if (UNLIKELY(term_is_invalid_term(count_atom))) {
+        RAISE_ERROR(OUT_OF_MEMORY_ATOM);
+    }
+
     term map = term_alloc_map(2, &ctx->heap);
     term_set_map_assoc(
         map,
         0,
-        globalcontext_make_atom(ctx->global, ATOM_STR("\x5", "count")),
+        count_atom,
         term_make_maybe_boxed_int64(count_i, &ctx->heap));
     term_set_map_assoc(
         map,

(The memory key already uses the pre-defined MEMORY_ATOM, so only
count is affected.)

Effort: trivial.

P2 — Retention test should force `erlang:garbage_collect()`

The retention check in tests/erlang_tests/test_persistent_term.erl
verifies that an old binding survives a subsequent put/2 and
erase/1, but it never forces a process GC between the bindings.
That's the one path most likely to break if someone "improves"
retire_entry(). Forcing GC makes the test exercise the actual
soundness invariant.

--- a/tests/erlang_tests/test_persistent_term.erl
+++ b/tests/erlang_tests/test_persistent_term.erl
@@ -45,11 +45,16 @@ test_retained_old_values() ->
     Key = {retained, key},
     ok = persistent_term:put(Key, {old, value, [1, 2, 3]}),
     OldValue = persistent_term:get(Key),
+    %% Force a GC while OldValue still references the retired heap.
+    erlang:garbage_collect(),
+    {old, value, [1, 2, 3]} = OldValue,

     ok = persistent_term:put(Key, {new, value}),
+    erlang:garbage_collect(),
     {old, value, [1, 2, 3]} = OldValue,
     {new, value} = persistent_term:get(Key),

     true = persistent_term:erase(Key),
+    erlang:garbage_collect(),
     {old, value, [1, 2, 3]} = OldValue,
     ok.

A second, smaller follow-up would be a refc-binary value to cover the
MSO/refcount retention path; defer unless we want it now.

Effort: small.

P2 — `info().memory` accounting: clarify or switch to allocated capacity

src/libAtomVM/persistent_term.c entry_new() sizes the per-entry
heap from memory_estimate_usage(key) + memory_estimate_usage(value)
but accounts memory using the post-copy heap pointer delta:

size_t size = memory_estimate_usage(key) + memory_estimate_usage(value);
if (UNLIKELY(memory_init_heap(heap, size) != MEMORY_GC_OK)) { ... }

entry->key   = memory_copy_term_tree(heap, key);
entry->value = memory_copy_term_tree(heap, value);
entry->memory = sizeof(struct PersistentTermEntry) + sizeof(Heap)
              + sizeof(HeapFragment)
              + ((size_t) (heap->heap_ptr - heap->heap_start) * sizeof(term));

If memory_estimate_usage() ever overestimates (canonical/shared
sub-terms), info().memory reports terms actually used rather than
heap allocated for them. Two reasonable options:

Option A — document as "logical copied-term usage" in
libs/estdlib/src/persistent_term.erl.

Option B — switch accounting to allocated capacity (closer to OTP's
intent):

--- a/src/libAtomVM/persistent_term.c
+++ b/src/libAtomVM/persistent_term.c
@@ -327,8 +327,9 @@ static struct PersistentTermEntry *entry_new(term key, term value)
     entry->key = memory_copy_term_tree(heap, key);
     entry->value = memory_copy_term_tree(heap, value);
     entry->heap = heap;
-    entry->memory = sizeof(struct PersistentTermEntry) + sizeof(Heap) + sizeof(HeapFragment)
-        + ((size_t) (heap->heap_ptr - heap->heap_start) * sizeof(term));
+    /* Report allocated heap capacity, not just the live copied portion. */
+    entry->memory = sizeof(struct PersistentTermEntry) + sizeof(Heap)
+        + sizeof(HeapFragment) + size * sizeof(term);
     entry->next = NULL;

Either is acceptable; just pick one and make it explicit.

Effort: small.

P3 — Concurrency / `put_new/2` test gaps

Current tests cover behavior on a single scheduler. Worth adding under
SMP:

concurrent put_new/2 from N processes for the same missing key —
exactly one should win, all others should observe the winning value;
concurrent get/1 interleaved with put/2/erase/1 of the same
key — readers must never observe a half-replaced or freed value;
repeated replace/erase cycles asserting info().count stabilizes
while info().memory grows monotonically (consistent with the
retention design).

These don't catch a known bug; they protect the spinlock + retention
invariants from future refactors.

Effort: medium.

Other notes (no action required)

get/0 holds the spinlock through GC + key copies
(persistent_term.c persistent_term_get_all_maybe_gc()).
Correct, and simple. Only worth changing if get/0 shows up in a
profile. Any future "snapshot under lock, copy outside lock"
refactor still relies on the retention invariant above.
term_hash extraction. src/libAtomVM/ets_multimap.c call
sites all switched to the shared term_hash(); float -0.0 is
normalized to +0.0 to stay consistent with term_compare(). Looks
good.
hash_uint64 CodeQL fix (term_hash.c). The cast through
uint64_t before truncating back to uint32_t produces the same
bits as the original uint32_t arithmetic, so existing hashes are
preserved.
Unsupported-term fprintf(stderr, ...) fallback in term_hash.c.
Confirm all user-reachable term kinds are covered (current set
looks complete); if any path can reach it from user input it will
spam stderr.

Recommended minimum follow-up patch

Apply the P1 comment on retire_entry().
Apply the P1 count atom check in nif_persistent_term_info.
Apply the P2 GC-forcing retention test.
Pick a side on info().memory semantics and either doc or
change.

Concurrency stress and a refc-binary retention case can land as a
follow-up.

pguyot

I'm not sure we should ship a r/w persistent term feature that leaks memory. I would suggest a first version with everything except erase and replace.

I want to challenge usage of locking.

I am not sure I understand why we copy keys with get_all. It seems inherited from ets, but maybe I didn't spend enough time yet on this review.

Implement a VM-level persistent_term table with NIFs for get/0,1,2, put/2, put_new/2, erase/1, and info/0. Persistent terms are copied into VM-owned heaps on write and returned without copying on read. Replaced and erased entries are retained until VM shutdown so previously returned terms remain valid without an OTP-style global GC pass, and info/0 reports retained memory. Also share the ETS term hashing helper for persistent_term lookup and add coverage for put/get/erase behavior, retained values, info/0, complex keys, and function keys. Signed-off-by: Peter M <petermm@gmail.com>

Use an explicit 64-bit intermediate when folding bytes from uint64_t values into the 32-bit term hash accumulator. This avoids CodeQL's "multiplication result converted to larger type" warning while preserving the intended uint32_t wraparound behavior of the hash. Signed-off-by: Peter M <petermm@gmail.com>

Use the project RWLock API directly for persistent_term instead of wrapping spinlock calls in local read/write lock macros. Also reduce the persistent_term critical sections by precomputing the bucket hash and allocating replacement entries before taking the write lock. Finally, stop copying keys in persistent_term:get/0. The returned list now allocates only the outer list and tuple cells on the caller heap, while both keys and values reference the persistent-term entry heap directly. This matches get/1 semantics, since retired entries are retained so previously returned persistent terms remain valid. Add a regression assertion that a persistent_term:get/0 result remains valid after the corresponding entries are erased. Signed-off-by: Peter M <petermm@gmail.com>

petermm force-pushed the codex/add-persistentterm-support branch from 97739cb to 51ab669 Compare May 31, 2026 11:16

github-advanced-security AI found potential problems May 31, 2026

View reviewed changes

Comment thread src/libAtomVM/term_hash.c Fixed

pguyot reviewed Jun 2, 2026

View reviewed changes

Comment thread src/libAtomVM/persistent_term.c Outdated

Comment thread src/libAtomVM/persistent_term.c Outdated

Comment thread src/libAtomVM/persistent_term.c

Comment thread src/libAtomVM/persistent_term.c Outdated

petermm added 3 commits June 2, 2026 08:05

petermm force-pushed the codex/add-persistentterm-support branch from 7af87d4 to fda1420 Compare June 2, 2026 06:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add persistent_term support#2323

Add persistent_term support#2323
petermm wants to merge 3 commits into
atomvm:release-0.7from
petermm:codex/add-persistentterm-support

petermm commented May 31, 2026

Uh oh!

Uh oh!

petermm commented Jun 1, 2026

Uh oh!

pguyot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

petermm commented May 31, 2026

Uh oh!

Uh oh!

petermm commented Jun 1, 2026

PR Review: persistent_term + term_hash extraction

Verdict

P1 — Document the retention invariant near retire_entry()

P1 — persistent_term:info/0 doesn't check the count atom allocation

P2 — Retention test should force erlang:garbage_collect()

P2 — info().memory accounting: clarify or switch to allocated capacity

P3 — Concurrency / put_new/2 test gaps

Other notes (no action required)

Recommended minimum follow-up patch

Uh oh!

pguyot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

P1 — Document the retention invariant near `retire_entry()`

P1 — `persistent_term:info/0` doesn't check the `count` atom allocation

P2 — Retention test should force `erlang:garbage_collect()`

P2 — `info().memory` accounting: clarify or switch to allocated capacity

P3 — Concurrency / `put_new/2` test gaps