Files
git-mirror/parse.c
Jeff King b5b6c11a70 parse: add functions for parsing from non-string buffers
If you have a buffer that is not NUL-terminated but want to parse an
integer, there aren't many good options. If you use strtol() and
friends, you risk running off the end of the buffer if there is no
non-digit terminating character. And even if you carefully make sure
that there is such a character, ASan's strict-string-check mode will
still complain.

You can copy bytes into a temporary buffer, terminate it, and then call
strtol(), but doing so adds some pitfalls (like making sure you soak up
whitespace and leading +/- signs, and reporting overflow for overly long
input). Or you can hand-parse the digits, but then you need to take some
care to handle overflow (and again, whitespace and +/- signs).

These things aren't impossible to do right, but it's error-prone to have
to do them in every spot that wants to do such parsing. So let's add
some functions which can be used across the code base.

There are a few choices regarding the interface and the implementation.

First, the implementation:

  - I went with with parsing the digits (rather than buffering and
    passing to libc functions). It ends up being a similar amount of
    code because we have to do some parsing either way. And likewise
    overflow detection depends on the exact type the caller wants, so we
    either have to do it by hand or write a separate wrapper for
    strtol(), strtoumax(), and so on.

  - Unsigned overflow detection is done using the same techniques as in
    unsigned_add_overflows(), etc. We can't use those macros directly
    because our core function is type-agnostic (so the caller passes in
    the max value, rather than us deriving it on the fly). This is
    similar to how git_parse_int(), etc, work.

  - Signed overflow detection assumes that we can express a negative
    value with magnitude one larger than our maximum positive value
    (e.g., -128..127 for a signed 8-bit value). I doubt this is
    guaranteed by the standard, but it should hold in practice, and we
    make the same assumption in git_parse_int(), etc. The nice thing
    about this is that we can derive the range from the number of bits
    in the type. For ints, you obviously could use INT_MIN..INT_MAX, but
    for an arbitrary type, we can use maximum_signed_value_of_type().

  - I didn't bother with handling bases other than 10. It would
    complicate the code, and I suspect it won't be needed. We could
    probably retro-fit it later without too much work, if need be.

For the interface:

  - What do we call it? We have git_parse_int() and friends, which aim
    to make parsing less error-prone. And in some ways, these are just
    buffer (rather than string) versions of those functions. But not
    entirely. Those functions are aimed at parsing a single user-facing
    value. So they accept a unit prefix (e.g., "10k"), which we won't
    always want. And they insist that the whole string is consumed
    (rather than passing back an "end" pointer).

    We also have strtol_i() and strtoul_ui() wrappers, which try to make
    error handling simpler (especially around overflow), but mostly
    behave like their libc counterparts. These also don't pass out an
    end pointer, though.

    So I started a new namespace, "parse_<type>_from_buf".

  - Like those other functions above, we use an out-parameter to store
    the result, which lets us return an error code directly. This avoids
    the complicated errno dance for detecting overflow that you get with
    strtol().

    What should the error code look like? git_parse_int() uses a bool
    for success/failure. But strtol_ui() uses the syscall-like "0 is
    success, -1 is error" convention.

    I went with the bool approach here. Since the names are closest to
    those functions, I thought it would cause the least confusion.

  - Unlike git_parse_signed() and friends, we do not insist that the
    entire buffer be consumed. For parsing a specific standalone string
    that makes sense, but within an unterminated buffer you are much
    more likely to be parsing multiple fields from a larger data set.

    We pass out an "end" pointer the same way strtol() does. Another
    option is to accept the input as an in-out parameter and advance the
    pointer ourselves (and likewise shrink the length pointer). That
    would let you do something like:

       if (!parse_int_from_buf(&p, &len, &out))
               return error(...);
       /* "p" and "len" were adjusted automatically */
       if (!len || *p++ != ' ')
               return error(...);

    That saves a few lines of code in some spots, but requires a few
    more in others (depending on whether the caller has a length in the
    first place or is using an end pointer). Of the two callers I intend
    to immediately convert, we have one of each type!

    I went with the strtol() approach as flexible and time-tested.

  - We could likewise take the input buffer as two pointers (start and
    end) rather than a pointer and a length. That again makes life
    easier for some callers and harder for others. I stuck with pointer
    and length as the more usual interface.

  - What happens when a caller passes in a NULL end pointer? This is
    allowed by strtol(). But I think it's often a sign of a lurking bug,
    because there's no way to know how much was consumed (and even if a
    caller wants to assume everything is consumed, you have no way to
    verify it). So it is simply an error in this interface (you'd get a
    segfault).

    I am tempted to say that if the end pointer is NULL the functions
    could confirm that the entire buffer was consumed, as a convenience.
    But that felt a bit magical and surprising.

Like git_parse_*(), there is a generic signed/unsigned helper, and then
we can add type-specific helpers on top. I've added an int helper here
to start, and we'll add more as we convert callers.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-30 10:03:43 -08:00

308 lines
6.0 KiB
C

#include "git-compat-util.h"
#include "gettext.h"
#include "parse.h"
static uintmax_t get_unit_factor(const char *end)
{
if (!*end)
return 1;
else if (!strcasecmp(end, "k"))
return 1024;
else if (!strcasecmp(end, "m"))
return 1024 * 1024;
else if (!strcasecmp(end, "g"))
return 1024 * 1024 * 1024;
return 0;
}
bool git_parse_signed(const char *value, intmax_t *ret, intmax_t max)
{
if (value && *value) {
char *end;
intmax_t val;
intmax_t factor;
if (max < 0)
BUG("max must be a positive integer");
errno = 0;
val = strtoimax(value, &end, 0);
if (errno == ERANGE)
return false;
if (end == value) {
errno = EINVAL;
return false;
}
factor = get_unit_factor(end);
if (!factor) {
errno = EINVAL;
return false;
}
if ((val < 0 && (-max - 1) / factor > val) ||
(val > 0 && max / factor < val)) {
errno = ERANGE;
return false;
}
val *= factor;
*ret = val;
return true;
}
errno = EINVAL;
return false;
}
bool git_parse_unsigned(const char *value, uintmax_t *ret, uintmax_t max)
{
if (value && *value) {
char *end;
uintmax_t val;
uintmax_t factor;
/* negative values would be accepted by strtoumax */
if (strchr(value, '-')) {
errno = EINVAL;
return false;
}
errno = 0;
val = strtoumax(value, &end, 0);
if (errno == ERANGE)
return false;
if (end == value) {
errno = EINVAL;
return false;
}
factor = get_unit_factor(end);
if (!factor) {
errno = EINVAL;
return false;
}
if (unsigned_mult_overflows(factor, val) ||
factor * val > max) {
errno = ERANGE;
return false;
}
val *= factor;
*ret = val;
return true;
}
errno = EINVAL;
return false;
}
bool git_parse_int(const char *value, int *ret)
{
intmax_t tmp;
if (!git_parse_signed(value, &tmp, maximum_signed_value_of_type(int)))
return false;
*ret = tmp;
return true;
}
bool git_parse_int64(const char *value, int64_t *ret)
{
intmax_t tmp;
if (!git_parse_signed(value, &tmp, maximum_signed_value_of_type(int64_t)))
return false;
*ret = tmp;
return true;
}
bool git_parse_ulong(const char *value, unsigned long *ret)
{
uintmax_t tmp;
if (!git_parse_unsigned(value, &tmp, maximum_unsigned_value_of_type(long)))
return false;
*ret = tmp;
return true;
}
bool git_parse_ssize_t(const char *value, ssize_t *ret)
{
intmax_t tmp;
if (!git_parse_signed(value, &tmp, maximum_signed_value_of_type(ssize_t)))
return false;
*ret = tmp;
return true;
}
bool git_parse_double(const char *value, double *ret)
{
char *end;
double val;
uintmax_t factor;
if (!value || !*value) {
errno = EINVAL;
return false;
}
errno = 0;
val = strtod(value, &end);
if (errno == ERANGE)
return false;
if (end == value) {
errno = EINVAL;
return false;
}
factor = get_unit_factor(end);
if (!factor) {
errno = EINVAL;
return false;
}
val *= factor;
*ret = val;
return true;
}
int git_parse_maybe_bool_text(const char *value)
{
if (!value)
return 1;
if (!*value)
return 0;
if (!strcasecmp(value, "true")
|| !strcasecmp(value, "yes")
|| !strcasecmp(value, "on"))
return 1;
if (!strcasecmp(value, "false")
|| !strcasecmp(value, "no")
|| !strcasecmp(value, "off"))
return 0;
return -1;
}
int git_parse_maybe_bool(const char *value)
{
int v = git_parse_maybe_bool_text(value);
if (0 <= v)
return v;
if (git_parse_int(value, &v))
return !!v;
return -1;
}
/*
* Parse environment variable 'k' as a boolean (in various
* possible spellings); if missing, use the default value 'def'.
*/
int git_env_bool(const char *k, int def)
{
const char *v = getenv(k);
int val;
if (!v)
return def;
val = git_parse_maybe_bool(v);
if (val < 0)
die(_("bad boolean environment value '%s' for '%s'"),
v, k);
return val;
}
/*
* Parse environment variable 'k' as ulong with possibly a unit
* suffix; if missing, use the default value 'val'.
*/
unsigned long git_env_ulong(const char *k, unsigned long val)
{
const char *v = getenv(k);
if (v && !git_parse_ulong(v, &val))
die(_("failed to parse %s"), k);
return val;
}
/*
* Helper that handles both signed/unsigned cases. If "negate" is NULL,
* negative values are disallowed. If not NULL and the input is negative,
* the value is range-checked but the caller is responsible for actually doing
* the negatiion. You probably don't want to use this! Use one of
* parse_signed_from_buf() or parse_unsigned_from_buf() below.
*/
static bool parse_from_buf_internal(const char *buf, size_t len,
const char **ep, bool *negate,
uintmax_t *ret, uintmax_t max)
{
const char *end = buf + len;
uintmax_t val = 0;
while (buf < end && isspace(*buf))
buf++;
if (negate)
*negate = false;
if (buf < end && *buf == '-') {
if (!negate) {
errno = EINVAL;
return false;
}
buf++;
*negate = true;
/* Assume negative range is always one larger than positive. */
max = max + 1;
} else if (buf < end && *buf == '+') {
buf++;
}
if (buf == end || !isdigit(*buf)) {
errno = EINVAL;
return false;
}
while (buf < end && isdigit(*buf)) {
int digit = *buf - '0';
if (val > max / 10) {
errno = ERANGE;
return false;
}
val *= 10;
if (val > max - digit) {
errno = ERANGE;
return false;
}
val += digit;
buf++;
}
*ep = buf;
*ret = val;
return true;
}
bool parse_unsigned_from_buf(const char *buf, size_t len, const char **ep,
uintmax_t *ret, uintmax_t max)
{
return parse_from_buf_internal(buf, len, ep, NULL, ret, max);
}
bool parse_signed_from_buf(const char *buf, size_t len, const char **ep,
intmax_t *ret, intmax_t max)
{
uintmax_t u_ret;
bool negate;
if (!parse_from_buf_internal(buf, len, ep, &negate, &u_ret, max))
return false;
/*
* Range already checked internally, but we must apply negation
* ourselves since only we have the signed integer type.
*/
if (negate) {
*ret = u_ret;
*ret = -*ret;
} else {
*ret = u_ret;
}
return true;
}
bool parse_int_from_buf(const char *buf, size_t len, const char **ep, int *ret)
{
intmax_t tmp;
if (!parse_signed_from_buf(buf, len, ep, &tmp,
maximum_signed_value_of_type(int)))
return false;
*ret = tmp;
return true;
}