encode: prevent appending .0 to scientific notation to le bucket labels#261
encode: prevent appending .0 to scientific notation to le bucket labels#261Pigueiras wants to merge 1 commit intofluent:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes Prometheus exposition encoding for histogram bucket (le) label values by preventing the encoder from producing malformed scientific-notation strings (e.g., 1e+06.0) that Prometheus rejects.
Changes:
- Update
bucket_value_to_string()to append.0only when the formatted value is neither decimal nor scientific notation. - Add clarifying comment documenting the
.0-append rule.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/cmt_encode_prometheus.c
Outdated
| * Append .0 only when there is no decimal point and the number | ||
| * is not in scientific notation. | ||
| */ | ||
| if (!strchr(str, '.') && !strchr(str, 'e')) { |
There was a problem hiding this comment.
The check for scientific notation only looks for lowercase 'e'. While %g typically uses 'e', this becomes fragile if the formatting ever changes (e.g., %G) or if a platform emits an uppercase exponent. Consider using a character-class check (e.g., treating either 'e' or 'E' as scientific notation) so .0 is never appended to exponent-form values.
| if (!strchr(str, '.') && !strchr(str, 'e')) { | |
| if (!strchr(str, '.') && !strchr(str, 'e') && !strchr(str, 'E')) { |
| len = snprintf(str, 64, "%g", val); | ||
| cfl_sds_len_set(str, len); | ||
|
|
||
| if (!strchr(str, '.')) { | ||
| /* | ||
| * Append .0 only when there is no decimal point and the number | ||
| * is not in scientific notation. | ||
| */ | ||
| if (!strchr(str, '.') && !strchr(str, 'e')) { | ||
| cfl_sds_cat_safe(&str, ".0", 2); | ||
| } |
There was a problem hiding this comment.
bucket_value_to_string() will still append .0 to non-finite printf outputs like inf/nan (which contain neither . nor e), producing values like inf.0/nan.0 that are not parseable as floats in Prometheus label values. Since histogram buckets can be user-provided without an isfinite validation, consider explicitly handling non-finite val (e.g., reject, or map infinities to +Inf/-Inf, and avoid appending .0 for nan/inf).
| /* | ||
| * Append .0 only when there is no decimal point and the number | ||
| * is not in scientific notation. | ||
| */ | ||
| if (!strchr(str, '.') && !strchr(str, 'e')) { | ||
| cfl_sds_cat_safe(&str, ".0", 2); |
There was a problem hiding this comment.
This change fixes a Prometheus parsing failure for large bucket boundaries, but there doesn’t appear to be a regression test covering scientific-notation le (e.g., a histogram bucket at >= 1e6 that should encode as 1e+06 and remain parseable). Adding a test that encodes and then decodes Prometheus text (or validates the encoded string) would prevent this from reoccurring.
cosmo0920
left a comment
There was a problem hiding this comment.
How about adding unit test to confirm this fix?
It's not mandatory for every change but we encourage to write it.
I've added one but tbh I'm not sure if it's the correct place for it, if you think this should go somewhere else I would appreciate some guidance on it 😄 |
|
The fix looks good and the new test seems fine where it is. One remaining edge case: bucket_value_to_string() still appends .0 to %g outputs like inf/nan, and histogram buckets do not seem to reject non-finite bounds, so this can still generate invalid le labels. |
18409fb to
4f19cb2
Compare
bucket_value_to_string uses %g to format histogram bucket boundaries. That output can legitimately be scientific notation for large finite values, or inf/nan for non-finite bucket bounds. The previous logic appended .0 whenever the formatted string had no decimal point, which may produce malformed Prometheus le labels such as 1e+08.0, inf.0, and nan.0. Only append .0 for finite values that are not already written in scientific notation, and cover these special cases in the histogram bucket label regression test. Signed-off-by: Luis Pigueiras <luis.pigueiras@cern.ch>
I've added an |
bucket_value_to_stringused %g to format histogram bucket boundaries, which switches to scientific notation for values >= 1e6. The subsequent check for a missing decimal point would then append ".0", producing malformedlelabels like "1e+06.0" that Prometheus rejects with:Skip the ".0" suffix when the string already contains 'e', since scientific notation is valid in Prometheus
lelabels without it.This PR also handles non finite values as
nan,-infandinf.