git-mirror

mirror of https://github.com/git/git.git synced 2025-12-12 20:36:24 +01:00

Author	SHA1	Message	Date
Jeff King	b5b6c11a70	parse: add functions for parsing from non-string buffers If you have a buffer that is not NUL-terminated but want to parse an integer, there aren't many good options. If you use strtol() and friends, you risk running off the end of the buffer if there is no non-digit terminating character. And even if you carefully make sure that there is such a character, ASan's strict-string-check mode will still complain. You can copy bytes into a temporary buffer, terminate it, and then call strtol(), but doing so adds some pitfalls (like making sure you soak up whitespace and leading +/- signs, and reporting overflow for overly long input). Or you can hand-parse the digits, but then you need to take some care to handle overflow (and again, whitespace and +/- signs). These things aren't impossible to do right, but it's error-prone to have to do them in every spot that wants to do such parsing. So let's add some functions which can be used across the code base. There are a few choices regarding the interface and the implementation. First, the implementation: - I went with with parsing the digits (rather than buffering and passing to libc functions). It ends up being a similar amount of code because we have to do some parsing either way. And likewise overflow detection depends on the exact type the caller wants, so we either have to do it by hand or write a separate wrapper for strtol(), strtoumax(), and so on. - Unsigned overflow detection is done using the same techniques as in unsigned_add_overflows(), etc. We can't use those macros directly because our core function is type-agnostic (so the caller passes in the max value, rather than us deriving it on the fly). This is similar to how git_parse_int(), etc, work. - Signed overflow detection assumes that we can express a negative value with magnitude one larger than our maximum positive value (e.g., -128..127 for a signed 8-bit value). I doubt this is guaranteed by the standard, but it should hold in practice, and we make the same assumption in git_parse_int(), etc. The nice thing about this is that we can derive the range from the number of bits in the type. For ints, you obviously could use INT_MIN..INT_MAX, but for an arbitrary type, we can use maximum_signed_value_of_type(). - I didn't bother with handling bases other than 10. It would complicate the code, and I suspect it won't be needed. We could probably retro-fit it later without too much work, if need be. For the interface: - What do we call it? We have git_parse_int() and friends, which aim to make parsing less error-prone. And in some ways, these are just buffer (rather than string) versions of those functions. But not entirely. Those functions are aimed at parsing a single user-facing value. So they accept a unit prefix (e.g., "10k"), which we won't always want. And they insist that the whole string is consumed (rather than passing back an "end" pointer). We also have strtol_i() and strtoul_ui() wrappers, which try to make error handling simpler (especially around overflow), but mostly behave like their libc counterparts. These also don't pass out an end pointer, though. So I started a new namespace, "parse_<type>_from_buf". - Like those other functions above, we use an out-parameter to store the result, which lets us return an error code directly. This avoids the complicated errno dance for detecting overflow that you get with strtol(). What should the error code look like? git_parse_int() uses a bool for success/failure. But strtol_ui() uses the syscall-like "0 is success, -1 is error" convention. I went with the bool approach here. Since the names are closest to those functions, I thought it would cause the least confusion. - Unlike git_parse_signed() and friends, we do not insist that the entire buffer be consumed. For parsing a specific standalone string that makes sense, but within an unterminated buffer you are much more likely to be parsing multiple fields from a larger data set. We pass out an "end" pointer the same way strtol() does. Another option is to accept the input as an in-out parameter and advance the pointer ourselves (and likewise shrink the length pointer). That would let you do something like: if (!parse_int_from_buf(&p, &len, &out)) return error(...); /* "p" and "len" were adjusted automatically / if (!len \|\| p++ != ' ') return error(...); That saves a few lines of code in some spots, but requires a few more in others (depending on whether the caller has a length in the first place or is using an end pointer). Of the two callers I intend to immediately convert, we have one of each type! I went with the strtol() approach as flexible and time-tested. - We could likewise take the input buffer as two pointers (start and end) rather than a pointer and a length. That again makes life easier for some callers and harder for others. I stuck with pointer and length as the more usual interface. - What happens when a caller passes in a NULL end pointer? This is allowed by strtol(). But I think it's often a sign of a lurking bug, because there's no way to know how much was consumed (and even if a caller wants to assume everything is consumed, you have no way to verify it). So it is simply an error in this interface (you'd get a segfault). I am tempted to say that if the end pointer is NULL the functions could confirm that the entire buffer was consumed, as a convenience. But that felt a bit magical and surprising. Like git_parse_*(), there is a generic signed/unsigned helper, and then we can add type-specific helpers on top. I've added an int helper here to start, and we'll add more as we convert callers. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-30 10:03:43 -08:00
Jeff King	fa4fe12fa0	parse: prefer bool to int for boolean returns All of the integer parsing functions in parse.[ch] return an int that is "0" for failure or "1" for success. Since most of the other functions in Git use "0" for success and "-1" for failure, this can be confusing. Let's switch the return types to bool to make it clear that we are using this other convention. Callers should not need to update at all. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-30 10:03:42 -08:00
Patrick Steinhardt	bc288c5929	parse-options: introduce precision handling for `OPTION_UNSIGNED` This commit is the equivalent to the preceding commit, but instead of introducing precision handling for `OPTION_INTEGER` we introduce it for `OPTION_UNSIGNED`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-17 08:15:16 -07:00
Taylor Blau	5831f8ac41	config: introduce `git_config_double()` Future commits will want to parse a double-precision floating point value from configuration, but we have no way to parse such a value prior to this patch. The core of the routine is implemented in git_parse_double(). Unlike git_parse_unsigned() and git_parse_signed(), however, the function implemented here only works on type "double", and not related types like "float", or "long double". This is because "float" and "long double" use different functions to convert from ASCII strings to floating point values (strtof() and strtold(), respectively). Likewise, there is no pointer type that can assign to any of these values (except for "void *"), so the only way to define this trio of functions would be with a macro expansion that is parameterized over the floating point type and conversion function. That is all doable, but likely to be overkill given our current needs, which is only to parse double-precision floats. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-24 11:40:42 -07:00
Calvin Wan	b1bda75173	parse: separate out parsing functions from config.h The files config.{h,c} contain functions that have to do with parsing, but not config. In order to further reduce all-in-one headers, separate out functions in config.c that do not operate on config into its own file, parse.h, and update the include directives in the .c files that need only such functions accordingly. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-09-29 15:14:57 -07:00

5 Commits