Translate char as core::ffi::c_char instead of u8 by lucic71 · Pull Request #194 · Cpp2Rust/cpp2rust

lucic71 · 2026-06-13T19:02:32Z

Translates char as c_char instead of u8 in both unsafe and refcount. It uses core:ffi::c_char instead of libc::c_char so that refcount is not dependent on libc.

This contains:

change all rules and libcc2rs to use c_char instead of u8
use c_char instead of u8 in mapper and VisitBuiltinType
use c_char in argv for both unsafe and refcount
delete the IsCharPointerFieldFromLibc and IsCharArrayFieldFromLibc hacks
use the edition 2024 c"" syntax to define c_char compatible string literals
use libcc2rs::char_array to define c_char array of characters
use getTypedLiteral to write 0 as c_char (correct) instead of 0_c_char (incorrect)
use reinterpret_cast between char and unsigned char in VisitExplicitCastExpr and BuildFnAdapter
reinterpret all remaining bytes instead of one object in PtrKind::Reinterpreted

I wanted to split each of them in a separate PR, however in isolation they don't work. So I had to pack them together.

9 is a bug that I encountered during the u8 -> c_char change. brunsli does reinterpret_cast<uint8_t>(string.data()). string.data() returns a char (c_char) and uint8_t is translated as u8. This means that the following fast-path in Ptr::reinterprete_cast is skipped and a real reinterpret takes place:

https://github.com/Cpp2Rust/cpp2rust/blob/master/libcc2rs/src/rc.rs#L389-L392

Later, brunsli passes the reinterpreted pointer to the unsafe brotli ffi. In the previous version of StrongPtr::deref/PtrKind::with_mut/PtrKind::with we only materialized one u8 from c_char, meaning that the brotli ffi received a single u8 and a size > 1, resulting in OOB. The current code materializes the entire remaining bytes of the source pointer.

This problem only appears because we interact with the unsafe ffi. In safe code, it's fine to materialize only one element because all our Ptr APIs work on element boundary, e.g. write/read. I expect 9 to be a temporary solution until we replace the unsafe brotli ffi with a safe brotli crate.

…nal alloc

lucic71 · 2026-06-22T17:45:19Z

@nunoplopes this is ready for review

nunoplopes · 2026-06-24T08:02:21Z

What's the advantage of using c_char vs u8? The code seems a lot more verbose and less idiomatic. Refcount should not use c_char.

lucic71 · 2026-06-24T08:32:44Z

What's the advantage of using c_char vs u8? The code seems a lot more verbose and less idiomatic. Refcount should not use c_char.

u8 is incompatible with char* or char[] from libc or FFI. We use u8 while libc and FFI use c_char (typedef over i8). The job of IsCharPointerFieldFromLibc and IsCharArrayFieldFromLibc was to workaround this discrepancy. I deleted both functions now.

We can use c_char instead of core:ffi::c_char to make the code less verbose and more idiomatic.

It's ok for refcount to use c_char. core::ffi::c_char does not depend on libc and c_char is the correct and portable solution, consider the following code:

#include <cassert>
int main() {
    char a = -2;
    char b = 1;
    assert(a + b == -1);
    return 0;
}

With the u8 translation this fails. With the c_char translation, this is ok.

nunoplopes · 2026-06-24T09:40:30Z

I disagree. I don't think core:ffi::c_char is more idiomatic. Native Rust code does not use that.
The sign of char is platform dependent. It's true that in x86 it's signed (so equivalent to i8). On ARM it's often unsigned (equivalent to u8).

I don't know why the translation of that example fails, but c_char is not the answer, for recount at least.

lucic71 · 2026-06-24T11:27:20Z

I disagree. I don't think core:ffi::c_char is more idiomatic. Native Rust code does not use that. The sign of char is platform dependent. It's true that in x86 it's signed (so equivalent to i8). On ARM it's often unsigned (equivalent to u8).

I don't know why the translation of that example fails, but c_char is not the answer, for recount at least.

Then for unsafe I will continue with c_char so that I can get rid of IsCharPointerFieldFromLibc and IsCharArrayFieldFromLibc. For refcount we keep u8 instead of c_char because we don't expect to interact with libc.

I don't know why the translation of that example fails

On a platform where char is signed, in a + b both arguments are promoted to signed int, -2 + 1 becomes -1 and the result is downgraded back to char which results in (char) -1. But becaue we translate char as u8, the result becomes: -2i32 as u8 + 1_u8 = 254 + 1 = 255

fn main_0() -> i32 {
    let a: Value<u8> = Rc::new(RefCell::new((-2_i32 as u8)));
    let b: Value<u8> = Rc::new(RefCell::new(1_u8));
    assert!(((((*a.borrow()) as i32) + ((*b.borrow()) as i32)) == -1_i32));
    return 0;
}

lucic71 added 16 commits June 13, 2026 14:04

Update tests

88205c8

Use c_char instead of u8

88bb282

Use c_char in libcc2rs mapped rules

33a26cf

Use c_char for cstrings in libcc2rs

d5220ea

Use edition 2024 for rustfmt

1c63cde

Use c_char instead of u8 for char translation

8cd9189

Delete IsCharPointerFieldFromLibc and IsCharArrayFieldFromLibc

caa8217

Fix building of argv

5b2a10c

Fix 0 initialization for c_char

90da3da

Fix implicit and explicit casts between u8/c_char

aecefd9

Fix definition of main to use c_char

b786182

Convert Vec<c_char> to Vec<u8> when printing

aee4da6

Fix char array initialization to use c_char

3284cba

Add libcc2rs::char_array to initialize fixed sized arrays

f0e889e

cargo gmt

b79bf93

Snapshot/write-back the entire region between offset and end of origi…

93bdb87

…nal alloc

lucic71 marked this pull request as draft June 13, 2026 19:02

Replace ::core::ffi::c_char with core::ffi::c_char

43865ca

lucic71 marked this pull request as ready for review June 22, 2026 17:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Translate char as core::ffi::c_char instead of u8#194

Translate char as core::ffi::c_char instead of u8#194
lucic71 wants to merge 17 commits into
Cpp2Rust:masterfrom
lucic71:u8-as-c_char

lucic71 commented Jun 13, 2026 •

edited

Loading

Uh oh!

lucic71 commented Jun 22, 2026

Uh oh!

nunoplopes commented Jun 24, 2026

Uh oh!

lucic71 commented Jun 24, 2026

Uh oh!

nunoplopes commented Jun 24, 2026

Uh oh!

lucic71 commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lucic71 commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucic71 commented Jun 22, 2026

Uh oh!

nunoplopes commented Jun 24, 2026

Uh oh!

lucic71 commented Jun 24, 2026

Uh oh!

nunoplopes commented Jun 24, 2026

Uh oh!

lucic71 commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lucic71 commented Jun 13, 2026 •

edited

Loading