mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
synced 2026-04-03 12:05:13 +02:00
commitb41c1d8d07upstream. Make fscrypt no longer use Crypto API drivers for non-inline crypto engines, even when the Crypto API prioritizes them over CPU-based code (which unfortunately it often does). These drivers tend to be really problematic, especially for fscrypt's workload. This commit has no effect on inline crypto engines, which are different and do work well. Specifically, exclude drivers that have CRYPTO_ALG_KERN_DRIVER_ONLY or CRYPTO_ALG_ALLOCATES_MEMORY set. (Later, CRYPTO_ALG_ASYNC should be excluded too. That's omitted for now to keep this commit backportable, since until recently some CPU-based code had CRYPTO_ALG_ASYNC set.) There are two major issues with these drivers: bugs and performance. First, these drivers tend to be buggy. They're fundamentally much more error-prone and harder to test than the CPU-based code. They often don't get tested before kernel releases, and even if they do, the crypto self-tests don't properly test these drivers. Released drivers have en/decrypted or hashed data incorrectly. These bugs cause issues for fscrypt users who often didn't even want to use these drivers, e.g.: - https://github.com/google/fscryptctl/issues/32 - https://github.com/google/fscryptctl/issues/9 - https://lore.kernel.org/r/PH0PR02MB731916ECDB6C613665863B6CFFAA2@PH0PR02MB7319.namprd02.prod.outlook.com These drivers have also similarly caused issues for dm-crypt users, including data corruption and deadlocks. Since Linux v5.10, dm-crypt has disabled most of them by excluding CRYPTO_ALG_ALLOCATES_MEMORY. Second, these drivers tend to be *much* slower than the CPU-based code. This may seem counterintuitive, but benchmarks clearly show it. There's a *lot* of overhead associated with going to a hardware driver, off the CPU, and back again. To prove this, I gathered as many systems with this type of crypto engine as I could, and I measured synchronous encryption of 4096-byte messages (which matches fscrypt's workload): Intel Emerald Rapids server: AES-256-XTS: xts-aes-vaes-avx512 16171 MB/s [CPU-based, Vector AES] qat_aes_xts 289 MB/s [Offload, Intel QuickAssist] Qualcomm SM8650 HDK: AES-256-XTS: xts-aes-ce 4301 MB/s [CPU-based, ARMv8 Crypto Extensions] xts-aes-qce 73 MB/s [Offload, Qualcomm Crypto Engine] i.MX 8M Nano LPDDR4 EVK: AES-256-XTS: xts-aes-ce 647 MB/s [CPU-based, ARMv8 Crypto Extensions] xts(ecb-aes-caam) 20 MB/s [Offload, CAAM] AES-128-CBC-ESSIV: essiv(cbc-aes-caam,sha256-lib) 23 MB/s [Offload, CAAM] STM32MP157F-DK2: AES-256-XTS: xts-aes-neonbs 13.2 MB/s [CPU-based, ARM NEON] xts(stm32-ecb-aes) 3.1 MB/s [Offload, STM32 crypto engine] AES-128-CBC-ESSIV: essiv(cbc-aes-neonbs,sha256-lib) 14.7 MB/s [CPU-based, ARM NEON] essiv(stm32-cbc-aes,sha256-lib) 3.2 MB/s [Offload, STM32 crypto engine] Adiantum: adiantum(xchacha12-arm,aes-arm,nhpoly1305-neon) 52.8 MB/s [CPU-based, ARM scalar + NEON] So, there was no case in which the crypto engine was even *close* to being faster. On the first three, which have AES instructions in the CPU, the CPU was 30 to 55 times faster (!). Even on STM32MP157F-DK2 which has a Cortex-A7 CPU that doesn't have AES instructions, AES was over 4 times faster on the CPU. And Adiantum encryption, which is what actually should be used on CPUs like that, was over 17 times faster. Other justifications that have been given for these non-inline crypto engines (almost always coming from the hardware vendors, not actual users) don't seem very plausible either: - The crypto engine throughput could be improved by processing multiple requests concurrently. Currently irrelevant to fscrypt, since it doesn't do that. This would also be complex, and unhelpful in many cases. 2 of the 4 engines I tested even had only one queue. - Some of the engines, e.g. STM32, support hardware keys. Also currently irrelevant to fscrypt, since it doesn't support these. Interestingly, the STM32 driver itself doesn't support this either. - Free up CPU for other tasks and/or reduce energy usage. Not very plausible considering the "short" message length, driver overhead, and scheduling overhead. There's just very little time for the CPU to do something else like run another task or enter low-power state, before the message finishes and it's time to process the next one. - Some of these engines resist power analysis and electromagnetic attacks, while the CPU-based crypto generally does not. In theory, this sounds great. In practice, if this benefit requires the use of an off-CPU offload that massively regresses performance and has a low-quality, buggy driver, the price for this hardening (which is not relevant to most fscrypt users, and tends to be incomplete) is just too high. Inline crypto engines are much more promising here, as are on-CPU solutions like RISC-V High Assurance Cryptography. Fixes:b30ab0e034("ext4 crypto: add ext4 encryption facilities") Cc: stable@vger.kernel.org Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250704070322.20692-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
183 lines
5.4 KiB
C
183 lines
5.4 KiB
C
// SPDX-License-Identifier: GPL-2.0
|
|
/*
|
|
* Implementation of HKDF ("HMAC-based Extract-and-Expand Key Derivation
|
|
* Function"), aka RFC 5869. See also the original paper (Krawczyk 2010):
|
|
* "Cryptographic Extraction and Key Derivation: The HKDF Scheme".
|
|
*
|
|
* This is used to derive keys from the fscrypt master keys.
|
|
*
|
|
* Copyright 2019 Google LLC
|
|
*/
|
|
|
|
#include <crypto/hash.h>
|
|
#include <crypto/sha2.h>
|
|
|
|
#include "fscrypt_private.h"
|
|
|
|
/*
|
|
* HKDF supports any unkeyed cryptographic hash algorithm, but fscrypt uses
|
|
* SHA-512 because it is well-established, secure, and reasonably efficient.
|
|
*
|
|
* HKDF-SHA256 was also considered, as its 256-bit security strength would be
|
|
* sufficient here. A 512-bit security strength is "nice to have", though.
|
|
* Also, on 64-bit CPUs, SHA-512 is usually just as fast as SHA-256. In the
|
|
* common case of deriving an AES-256-XTS key (512 bits), that can result in
|
|
* HKDF-SHA512 being much faster than HKDF-SHA256, as the longer digest size of
|
|
* SHA-512 causes HKDF-Expand to only need to do one iteration rather than two.
|
|
*/
|
|
#define HKDF_HMAC_ALG "hmac(sha512)"
|
|
#define HKDF_HASHLEN SHA512_DIGEST_SIZE
|
|
|
|
/*
|
|
* HKDF consists of two steps:
|
|
*
|
|
* 1. HKDF-Extract: extract a pseudorandom key of length HKDF_HASHLEN bytes from
|
|
* the input keying material and optional salt.
|
|
* 2. HKDF-Expand: expand the pseudorandom key into output keying material of
|
|
* any length, parameterized by an application-specific info string.
|
|
*
|
|
* HKDF-Extract can be skipped if the input is already a pseudorandom key of
|
|
* length HKDF_HASHLEN bytes. However, cipher modes other than AES-256-XTS take
|
|
* shorter keys, and we don't want to force users of those modes to provide
|
|
* unnecessarily long master keys. Thus fscrypt still does HKDF-Extract. No
|
|
* salt is used, since fscrypt master keys should already be pseudorandom and
|
|
* there's no way to persist a random salt per master key from kernel mode.
|
|
*/
|
|
|
|
/* HKDF-Extract (RFC 5869 section 2.2), unsalted */
|
|
static int hkdf_extract(struct crypto_shash *hmac_tfm, const u8 *ikm,
|
|
unsigned int ikmlen, u8 prk[HKDF_HASHLEN])
|
|
{
|
|
static const u8 default_salt[HKDF_HASHLEN];
|
|
int err;
|
|
|
|
err = crypto_shash_setkey(hmac_tfm, default_salt, HKDF_HASHLEN);
|
|
if (err)
|
|
return err;
|
|
|
|
return crypto_shash_tfm_digest(hmac_tfm, ikm, ikmlen, prk);
|
|
}
|
|
|
|
/*
|
|
* Compute HKDF-Extract using the given master key as the input keying material,
|
|
* and prepare an HMAC transform object keyed by the resulting pseudorandom key.
|
|
*
|
|
* Afterwards, the keyed HMAC transform object can be used for HKDF-Expand many
|
|
* times without having to recompute HKDF-Extract each time.
|
|
*/
|
|
int fscrypt_init_hkdf(struct fscrypt_hkdf *hkdf, const u8 *master_key,
|
|
unsigned int master_key_size)
|
|
{
|
|
struct crypto_shash *hmac_tfm;
|
|
u8 prk[HKDF_HASHLEN];
|
|
int err;
|
|
|
|
hmac_tfm = crypto_alloc_shash(HKDF_HMAC_ALG, 0, FSCRYPT_CRYPTOAPI_MASK);
|
|
if (IS_ERR(hmac_tfm)) {
|
|
fscrypt_err(NULL, "Error allocating " HKDF_HMAC_ALG ": %ld",
|
|
PTR_ERR(hmac_tfm));
|
|
return PTR_ERR(hmac_tfm);
|
|
}
|
|
|
|
if (WARN_ON_ONCE(crypto_shash_digestsize(hmac_tfm) != sizeof(prk))) {
|
|
err = -EINVAL;
|
|
goto err_free_tfm;
|
|
}
|
|
|
|
err = hkdf_extract(hmac_tfm, master_key, master_key_size, prk);
|
|
if (err)
|
|
goto err_free_tfm;
|
|
|
|
err = crypto_shash_setkey(hmac_tfm, prk, sizeof(prk));
|
|
if (err)
|
|
goto err_free_tfm;
|
|
|
|
hkdf->hmac_tfm = hmac_tfm;
|
|
goto out;
|
|
|
|
err_free_tfm:
|
|
crypto_free_shash(hmac_tfm);
|
|
out:
|
|
memzero_explicit(prk, sizeof(prk));
|
|
return err;
|
|
}
|
|
|
|
/*
|
|
* HKDF-Expand (RFC 5869 section 2.3). This expands the pseudorandom key, which
|
|
* was already keyed into 'hkdf->hmac_tfm' by fscrypt_init_hkdf(), into 'okmlen'
|
|
* bytes of output keying material parameterized by the application-specific
|
|
* 'info' of length 'infolen' bytes, prefixed by "fscrypt\0" and the 'context'
|
|
* byte. This is thread-safe and may be called by multiple threads in parallel.
|
|
*
|
|
* ('context' isn't part of the HKDF specification; it's just a prefix fscrypt
|
|
* adds to its application-specific info strings to guarantee that it doesn't
|
|
* accidentally repeat an info string when using HKDF for different purposes.)
|
|
*/
|
|
int fscrypt_hkdf_expand(const struct fscrypt_hkdf *hkdf, u8 context,
|
|
const u8 *info, unsigned int infolen,
|
|
u8 *okm, unsigned int okmlen)
|
|
{
|
|
SHASH_DESC_ON_STACK(desc, hkdf->hmac_tfm);
|
|
u8 prefix[9];
|
|
unsigned int i;
|
|
int err;
|
|
const u8 *prev = NULL;
|
|
u8 counter = 1;
|
|
u8 tmp[HKDF_HASHLEN];
|
|
|
|
if (WARN_ON_ONCE(okmlen > 255 * HKDF_HASHLEN))
|
|
return -EINVAL;
|
|
|
|
desc->tfm = hkdf->hmac_tfm;
|
|
|
|
memcpy(prefix, "fscrypt\0", 8);
|
|
prefix[8] = context;
|
|
|
|
for (i = 0; i < okmlen; i += HKDF_HASHLEN) {
|
|
|
|
err = crypto_shash_init(desc);
|
|
if (err)
|
|
goto out;
|
|
|
|
if (prev) {
|
|
err = crypto_shash_update(desc, prev, HKDF_HASHLEN);
|
|
if (err)
|
|
goto out;
|
|
}
|
|
|
|
err = crypto_shash_update(desc, prefix, sizeof(prefix));
|
|
if (err)
|
|
goto out;
|
|
|
|
err = crypto_shash_update(desc, info, infolen);
|
|
if (err)
|
|
goto out;
|
|
|
|
BUILD_BUG_ON(sizeof(counter) != 1);
|
|
if (okmlen - i < HKDF_HASHLEN) {
|
|
err = crypto_shash_finup(desc, &counter, 1, tmp);
|
|
if (err)
|
|
goto out;
|
|
memcpy(&okm[i], tmp, okmlen - i);
|
|
memzero_explicit(tmp, sizeof(tmp));
|
|
} else {
|
|
err = crypto_shash_finup(desc, &counter, 1, &okm[i]);
|
|
if (err)
|
|
goto out;
|
|
}
|
|
counter++;
|
|
prev = &okm[i];
|
|
}
|
|
err = 0;
|
|
out:
|
|
if (unlikely(err))
|
|
memzero_explicit(okm, okmlen); /* so caller doesn't need to */
|
|
shash_desc_zero(desc);
|
|
return err;
|
|
}
|
|
|
|
void fscrypt_destroy_hkdf(struct fscrypt_hkdf *hkdf)
|
|
{
|
|
crypto_free_shash(hkdf->hmac_tfm);
|
|
}
|