AMD has issued an advisory indicating that having fTPM enabled in BIOS can cause "stuttering" in the OS. This issue has been fixed in newer versions of the fTPM firmware, but it's up to system designers to decide whether to distribute it.
This issue has existed for a while, but is more prevalent starting with kernel 6.1 because commit b006c439d58db ("hwrng: core - start hwrng kthread also for untrusted sources") started to use the fTPM for hwrng by default. However, all uses of /dev/hwrng result in unacceptable stuttering.
So, simply disable registration of the defective hwrng when detecting these faulty fTPM versions.
Link: https://www.amd.com/en/support/kb/faq/pa-410 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216989 Link: https://lore.kernel.org/all/20230209153120.261904-1-Jason@zx2c4.com/ Fixes: b006c439d58d ("hwrng: core - start hwrng kthread also for untrusted sources") Cc: stable@vger.kernel.org Cc: Jarkko Sakkinen jarkko@kernel.org Cc: Thorsten Leemhuis regressions@leemhuis.info Cc: James Bottomley James.Bottomley@hansenpartnership.com Co-developed-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com --- drivers/char/tpm/tpm-chip.c | 62 ++++++++++++++++++++++++++++++- drivers/char/tpm/tpm.h | 73 +++++++++++++++++++++++++++++++++++++ 2 files changed, 134 insertions(+), 1 deletion(-)
diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c index 741d8f3e8fb3a..348dd5705fbb6 100644 --- a/drivers/char/tpm/tpm-chip.c +++ b/drivers/char/tpm/tpm-chip.c @@ -512,6 +512,65 @@ static int tpm_add_legacy_sysfs(struct tpm_chip *chip) return 0; }
+static bool tpm_is_rng_defective(struct tpm_chip *chip) +{ + int ret; + u64 version; + u32 val1, val2; + + /* No known-broken TPM1 chips. */ + if (!(chip->flags & TPM_CHIP_FLAG_TPM2)) + return false; + + ret = tpm_request_locality(chip); + if (ret) + return false; + + /* Some AMD fTPM versions may cause stutter */ + ret = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val1, NULL); + if (ret) + goto release; + if (val1 != 0x414D4400U /* AMD */) { + ret = -ENODEV; + goto release; + } + ret = tpm2_get_tpm_pt(chip, TPM2_PT_FIRMWARE_VERSION_1, &val1, NULL); + if (ret) + goto release; + ret = tpm2_get_tpm_pt(chip, TPM2_PT_FIRMWARE_VERSION_2, &val2, NULL); + if (ret) + goto release; + +release: + tpm_relinquish_locality(chip); + + if (ret) + return false; + + version = ((u64)val1 << 32) | val2; + /* + * Fixes for stutter as described in + * https://www.amd.com/en/support/kb/faq/pa-410 + * are available in two series of fTPM firmware: + * 6.x.y.z series: 6.0.18.6 + + * 3.x.y.z series: 3.57.x.5 + + */ + if ((version >> 48) == 6) { + if (version >= 0x0006000000180006ULL) + return false; + } else if ((version >> 48) == 3) { + if (version >= 0x0003005700000005ULL) + return false; + } else { + return false; + } + dev_warn(&chip->dev, + "AMD fTPM version 0x%llx causes system stutter; hwrng disabled\n", + version); + + return true; +} + static int tpm_hwrng_read(struct hwrng *rng, void *data, size_t max, bool wait) { struct tpm_chip *chip = container_of(rng, struct tpm_chip, hwrng); @@ -521,7 +580,8 @@ static int tpm_hwrng_read(struct hwrng *rng, void *data, size_t max, bool wait)
static int tpm_add_hwrng(struct tpm_chip *chip) { - if (!IS_ENABLED(CONFIG_HW_RANDOM_TPM) || tpm_is_firmware_upgrade(chip)) + if (!IS_ENABLED(CONFIG_HW_RANDOM_TPM) || tpm_is_firmware_upgrade(chip) || + tpm_is_rng_defective(chip)) return 0;
snprintf(chip->hwrng_name, sizeof(chip->hwrng_name), diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h index 24ee4e1cc452a..830014a266090 100644 --- a/drivers/char/tpm/tpm.h +++ b/drivers/char/tpm/tpm.h @@ -150,6 +150,79 @@ enum tpm_sub_capabilities { TPM_CAP_PROP_TIS_DURATION = 0x120, };
+enum tpm2_pt_props { + TPM2_PT_NONE = 0x00000000, + TPM2_PT_GROUP = 0x00000100, + TPM2_PT_FIXED = TPM2_PT_GROUP * 1, + TPM2_PT_FAMILY_INDICATOR = TPM2_PT_FIXED + 0, + TPM2_PT_LEVEL = TPM2_PT_FIXED + 1, + TPM2_PT_REVISION = TPM2_PT_FIXED + 2, + TPM2_PT_DAY_OF_YEAR = TPM2_PT_FIXED + 3, + TPM2_PT_YEAR = TPM2_PT_FIXED + 4, + TPM2_PT_MANUFACTURER = TPM2_PT_FIXED + 5, + TPM2_PT_VENDOR_STRING_1 = TPM2_PT_FIXED + 6, + TPM2_PT_VENDOR_STRING_2 = TPM2_PT_FIXED + 7, + TPM2_PT_VENDOR_STRING_3 = TPM2_PT_FIXED + 8, + TPM2_PT_VENDOR_STRING_4 = TPM2_PT_FIXED + 9, + TPM2_PT_VENDOR_TPM_TYPE = TPM2_PT_FIXED + 10, + TPM2_PT_FIRMWARE_VERSION_1 = TPM2_PT_FIXED + 11, + TPM2_PT_FIRMWARE_VERSION_2 = TPM2_PT_FIXED + 12, + TPM2_PT_INPUT_BUFFER = TPM2_PT_FIXED + 13, + TPM2_PT_HR_TRANSIENT_MIN = TPM2_PT_FIXED + 14, + TPM2_PT_HR_PERSISTENT_MIN = TPM2_PT_FIXED + 15, + TPM2_PT_HR_LOADED_MIN = TPM2_PT_FIXED + 16, + TPM2_PT_ACTIVE_SESSIONS_MAX = TPM2_PT_FIXED + 17, + TPM2_PT_PCR_COUNT = TPM2_PT_FIXED + 18, + TPM2_PT_PCR_SELECT_MIN = TPM2_PT_FIXED + 19, + TPM2_PT_CONTEXT_GAP_MAX = TPM2_PT_FIXED + 20, + TPM2_PT_NV_COUNTERS_MAX = TPM2_PT_FIXED + 22, + TPM2_PT_NV_INDEX_MAX = TPM2_PT_FIXED + 23, + TPM2_PT_MEMORY = TPM2_PT_FIXED + 24, + TPM2_PT_CLOCK_UPDATE = TPM2_PT_FIXED + 25, + TPM2_PT_CONTEXT_HASH = TPM2_PT_FIXED + 26, + TPM2_PT_CONTEXT_SYM = TPM2_PT_FIXED + 27, + TPM2_PT_CONTEXT_SYM_SIZE = TPM2_PT_FIXED + 28, + TPM2_PT_ORDERLY_COUNT = TPM2_PT_FIXED + 29, + TPM2_PT_MAX_COMMAND_SIZE = TPM2_PT_FIXED + 30, + TPM2_PT_MAX_RESPONSE_SIZE = TPM2_PT_FIXED + 31, + TPM2_PT_MAX_DIGEST = TPM2_PT_FIXED + 32, + TPM2_PT_MAX_OBJECT_CONTEXT = TPM2_PT_FIXED + 33, + TPM2_PT_MAX_SESSION_CONTEXT = TPM2_PT_FIXED + 34, + TPM2_PT_PS_FAMILY_INDICATOR = TPM2_PT_FIXED + 35, + TPM2_PT_PS_LEVEL = TPM2_PT_FIXED + 36, + TPM2_PT_PS_REVISION = TPM2_PT_FIXED + 37, + TPM2_PT_PS_DAY_OF_YEAR = TPM2_PT_FIXED + 38, + TPM2_PT_PS_YEAR = TPM2_PT_FIXED + 39, + TPM2_PT_SPLIT_MAX = TPM2_PT_FIXED + 40, + TPM2_PT_TOTAL_COMMANDS = TPM2_PT_FIXED + 41, + TPM2_PT_LIBRARY_COMMANDS = TPM2_PT_FIXED + 42, + TPM2_PT_VENDOR_COMMANDS = TPM2_PT_FIXED + 43, + TPM2_PT_NV_BUFFER_MAX = TPM2_PT_FIXED + 44, + TPM2_PT_MODES = TPM2_PT_FIXED + 45, + TPM2_PT_MAX_CAP_BUFFER = TPM2_PT_FIXED + 46, + TPM2_PT_VAR = TPM2_PT_GROUP * 2, + TPM2_PT_PERMANENT = TPM2_PT_VAR + 0, + TPM2_PT_STARTUP_CLEAR = TPM2_PT_VAR + 1, + TPM2_PT_HR_NV_INDEX = TPM2_PT_VAR + 2, + TPM2_PT_HR_LOADED = TPM2_PT_VAR + 3, + TPM2_PT_HR_LOADED_AVAIL = TPM2_PT_VAR + 4, + TPM2_PT_HR_ACTIVE = TPM2_PT_VAR + 5, + TPM2_PT_HR_ACTIVE_AVAIL = TPM2_PT_VAR + 6, + TPM2_PT_HR_TRANSIENT_AVAIL = TPM2_PT_VAR + 7, + TPM2_PT_HR_PERSISTENT = TPM2_PT_VAR + 8, + TPM2_PT_HR_PERSISTENT_AVAIL = TPM2_PT_VAR + 9, + TPM2_PT_NV_COUNTERS = TPM2_PT_VAR + 10, + TPM2_PT_NV_COUNTERS_AVAIL = TPM2_PT_VAR + 11, + TPM2_PT_ALGORITHM_SET = TPM2_PT_VAR + 12, + TPM2_PT_LOADED_CURVES = TPM2_PT_VAR + 13, + TPM2_PT_LOCKOUT_COUNTER = TPM2_PT_VAR + 14, + TPM2_PT_MAX_AUTH_FAIL = TPM2_PT_VAR + 15, + TPM2_PT_LOCKOUT_INTERVAL = TPM2_PT_VAR + 16, + TPM2_PT_LOCKOUT_RECOVERY = TPM2_PT_VAR + 17, + TPM2_PT_NV_WRITE_RECOVERY = TPM2_PT_VAR + 18, + TPM2_PT_AUDIT_COUNTER_0 = TPM2_PT_VAR + 19, + TPM2_PT_AUDIT_COUNTER_1 = TPM2_PT_VAR + 20, +};
/* 128 bytes is an arbitrary cap. This could be as large as TPM_BUFSIZE - 18 * bytes, but 128 is still a relatively large number of random bytes and
On 14.02.23 21:19, Mario Limonciello wrote:
AMD has issued an advisory indicating that having fTPM enabled in BIOS can cause "stuttering" in the OS. This issue has been fixed in newer versions of the fTPM firmware, but it's up to system designers to decide whether to distribute it.
This issue has existed for a while, but is more prevalent starting with kernel 6.1 because commit b006c439d58db ("hwrng: core - start hwrng kthread also for untrusted sources") started to use the fTPM for hwrng by default. However, all uses of /dev/hwrng result in unacceptable stuttering.
So, simply disable registration of the defective hwrng when detecting these faulty fTPM versions.
Hmm, no reply since Mario posted this.
Jarkko, James, what's your stance on this? Does the patch look fine from your point of view? And does the situation justify merging this on the last minute for 6.2? Or should we merge it early for 6.3 and then backport to stable?
Ciao, Thorsten
Link: https://www.amd.com/en/support/kb/faq/pa-410 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216989 Link: https://lore.kernel.org/all/20230209153120.261904-1-Jason@zx2c4.com/ Fixes: b006c439d58d ("hwrng: core - start hwrng kthread also for untrusted sources") Cc: stable@vger.kernel.org Cc: Jarkko Sakkinen jarkko@kernel.org Cc: Thorsten Leemhuis regressions@leemhuis.info Cc: James Bottomley James.Bottomley@hansenpartnership.com Co-developed-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com
drivers/char/tpm/tpm-chip.c | 62 ++++++++++++++++++++++++++++++- drivers/char/tpm/tpm.h | 73 +++++++++++++++++++++++++++++++++++++ 2 files changed, 134 insertions(+), 1 deletion(-)
diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c index 741d8f3e8fb3a..348dd5705fbb6 100644 --- a/drivers/char/tpm/tpm-chip.c +++ b/drivers/char/tpm/tpm-chip.c @@ -512,6 +512,65 @@ static int tpm_add_legacy_sysfs(struct tpm_chip *chip) return 0; } +static bool tpm_is_rng_defective(struct tpm_chip *chip) +{
- int ret;
- u64 version;
- u32 val1, val2;
- /* No known-broken TPM1 chips. */
- if (!(chip->flags & TPM_CHIP_FLAG_TPM2))
return false;
- ret = tpm_request_locality(chip);
- if (ret)
return false;
- /* Some AMD fTPM versions may cause stutter */
- ret = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val1, NULL);
- if (ret)
goto release;
- if (val1 != 0x414D4400U /* AMD */) {
ret = -ENODEV;
goto release;
- }
- ret = tpm2_get_tpm_pt(chip, TPM2_PT_FIRMWARE_VERSION_1, &val1, NULL);
- if (ret)
goto release;
- ret = tpm2_get_tpm_pt(chip, TPM2_PT_FIRMWARE_VERSION_2, &val2, NULL);
- if (ret)
goto release;
+release:
- tpm_relinquish_locality(chip);
- if (ret)
return false;
- version = ((u64)val1 << 32) | val2;
- /*
* Fixes for stutter as described in
* https://www.amd.com/en/support/kb/faq/pa-410
* are available in two series of fTPM firmware:
* 6.x.y.z series: 6.0.18.6 +
* 3.x.y.z series: 3.57.x.5 +
*/
- if ((version >> 48) == 6) {
if (version >= 0x0006000000180006ULL)
return false;
- } else if ((version >> 48) == 3) {
if (version >= 0x0003005700000005ULL)
return false;
- } else {
return false;
- }
- dev_warn(&chip->dev,
"AMD fTPM version 0x%llx causes system stutter; hwrng disabled\n",
version);
- return true;
+}
static int tpm_hwrng_read(struct hwrng *rng, void *data, size_t max, bool wait) { struct tpm_chip *chip = container_of(rng, struct tpm_chip, hwrng); @@ -521,7 +580,8 @@ static int tpm_hwrng_read(struct hwrng *rng, void *data, size_t max, bool wait) static int tpm_add_hwrng(struct tpm_chip *chip) {
- if (!IS_ENABLED(CONFIG_HW_RANDOM_TPM) || tpm_is_firmware_upgrade(chip))
- if (!IS_ENABLED(CONFIG_HW_RANDOM_TPM) || tpm_is_firmware_upgrade(chip) ||
return 0;tpm_is_rng_defective(chip))
snprintf(chip->hwrng_name, sizeof(chip->hwrng_name), diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h index 24ee4e1cc452a..830014a266090 100644 --- a/drivers/char/tpm/tpm.h +++ b/drivers/char/tpm/tpm.h @@ -150,6 +150,79 @@ enum tpm_sub_capabilities { TPM_CAP_PROP_TIS_DURATION = 0x120, }; +enum tpm2_pt_props {
- TPM2_PT_NONE = 0x00000000,
- TPM2_PT_GROUP = 0x00000100,
- TPM2_PT_FIXED = TPM2_PT_GROUP * 1,
- TPM2_PT_FAMILY_INDICATOR = TPM2_PT_FIXED + 0,
- TPM2_PT_LEVEL = TPM2_PT_FIXED + 1,
- TPM2_PT_REVISION = TPM2_PT_FIXED + 2,
- TPM2_PT_DAY_OF_YEAR = TPM2_PT_FIXED + 3,
- TPM2_PT_YEAR = TPM2_PT_FIXED + 4,
- TPM2_PT_MANUFACTURER = TPM2_PT_FIXED + 5,
- TPM2_PT_VENDOR_STRING_1 = TPM2_PT_FIXED + 6,
- TPM2_PT_VENDOR_STRING_2 = TPM2_PT_FIXED + 7,
- TPM2_PT_VENDOR_STRING_3 = TPM2_PT_FIXED + 8,
- TPM2_PT_VENDOR_STRING_4 = TPM2_PT_FIXED + 9,
- TPM2_PT_VENDOR_TPM_TYPE = TPM2_PT_FIXED + 10,
- TPM2_PT_FIRMWARE_VERSION_1 = TPM2_PT_FIXED + 11,
- TPM2_PT_FIRMWARE_VERSION_2 = TPM2_PT_FIXED + 12,
- TPM2_PT_INPUT_BUFFER = TPM2_PT_FIXED + 13,
- TPM2_PT_HR_TRANSIENT_MIN = TPM2_PT_FIXED + 14,
- TPM2_PT_HR_PERSISTENT_MIN = TPM2_PT_FIXED + 15,
- TPM2_PT_HR_LOADED_MIN = TPM2_PT_FIXED + 16,
- TPM2_PT_ACTIVE_SESSIONS_MAX = TPM2_PT_FIXED + 17,
- TPM2_PT_PCR_COUNT = TPM2_PT_FIXED + 18,
- TPM2_PT_PCR_SELECT_MIN = TPM2_PT_FIXED + 19,
- TPM2_PT_CONTEXT_GAP_MAX = TPM2_PT_FIXED + 20,
- TPM2_PT_NV_COUNTERS_MAX = TPM2_PT_FIXED + 22,
- TPM2_PT_NV_INDEX_MAX = TPM2_PT_FIXED + 23,
- TPM2_PT_MEMORY = TPM2_PT_FIXED + 24,
- TPM2_PT_CLOCK_UPDATE = TPM2_PT_FIXED + 25,
- TPM2_PT_CONTEXT_HASH = TPM2_PT_FIXED + 26,
- TPM2_PT_CONTEXT_SYM = TPM2_PT_FIXED + 27,
- TPM2_PT_CONTEXT_SYM_SIZE = TPM2_PT_FIXED + 28,
- TPM2_PT_ORDERLY_COUNT = TPM2_PT_FIXED + 29,
- TPM2_PT_MAX_COMMAND_SIZE = TPM2_PT_FIXED + 30,
- TPM2_PT_MAX_RESPONSE_SIZE = TPM2_PT_FIXED + 31,
- TPM2_PT_MAX_DIGEST = TPM2_PT_FIXED + 32,
- TPM2_PT_MAX_OBJECT_CONTEXT = TPM2_PT_FIXED + 33,
- TPM2_PT_MAX_SESSION_CONTEXT = TPM2_PT_FIXED + 34,
- TPM2_PT_PS_FAMILY_INDICATOR = TPM2_PT_FIXED + 35,
- TPM2_PT_PS_LEVEL = TPM2_PT_FIXED + 36,
- TPM2_PT_PS_REVISION = TPM2_PT_FIXED + 37,
- TPM2_PT_PS_DAY_OF_YEAR = TPM2_PT_FIXED + 38,
- TPM2_PT_PS_YEAR = TPM2_PT_FIXED + 39,
- TPM2_PT_SPLIT_MAX = TPM2_PT_FIXED + 40,
- TPM2_PT_TOTAL_COMMANDS = TPM2_PT_FIXED + 41,
- TPM2_PT_LIBRARY_COMMANDS = TPM2_PT_FIXED + 42,
- TPM2_PT_VENDOR_COMMANDS = TPM2_PT_FIXED + 43,
- TPM2_PT_NV_BUFFER_MAX = TPM2_PT_FIXED + 44,
- TPM2_PT_MODES = TPM2_PT_FIXED + 45,
- TPM2_PT_MAX_CAP_BUFFER = TPM2_PT_FIXED + 46,
- TPM2_PT_VAR = TPM2_PT_GROUP * 2,
- TPM2_PT_PERMANENT = TPM2_PT_VAR + 0,
- TPM2_PT_STARTUP_CLEAR = TPM2_PT_VAR + 1,
- TPM2_PT_HR_NV_INDEX = TPM2_PT_VAR + 2,
- TPM2_PT_HR_LOADED = TPM2_PT_VAR + 3,
- TPM2_PT_HR_LOADED_AVAIL = TPM2_PT_VAR + 4,
- TPM2_PT_HR_ACTIVE = TPM2_PT_VAR + 5,
- TPM2_PT_HR_ACTIVE_AVAIL = TPM2_PT_VAR + 6,
- TPM2_PT_HR_TRANSIENT_AVAIL = TPM2_PT_VAR + 7,
- TPM2_PT_HR_PERSISTENT = TPM2_PT_VAR + 8,
- TPM2_PT_HR_PERSISTENT_AVAIL = TPM2_PT_VAR + 9,
- TPM2_PT_NV_COUNTERS = TPM2_PT_VAR + 10,
- TPM2_PT_NV_COUNTERS_AVAIL = TPM2_PT_VAR + 11,
- TPM2_PT_ALGORITHM_SET = TPM2_PT_VAR + 12,
- TPM2_PT_LOADED_CURVES = TPM2_PT_VAR + 13,
- TPM2_PT_LOCKOUT_COUNTER = TPM2_PT_VAR + 14,
- TPM2_PT_MAX_AUTH_FAIL = TPM2_PT_VAR + 15,
- TPM2_PT_LOCKOUT_INTERVAL = TPM2_PT_VAR + 16,
- TPM2_PT_LOCKOUT_RECOVERY = TPM2_PT_VAR + 17,
- TPM2_PT_NV_WRITE_RECOVERY = TPM2_PT_VAR + 18,
- TPM2_PT_AUDIT_COUNTER_0 = TPM2_PT_VAR + 19,
- TPM2_PT_AUDIT_COUNTER_1 = TPM2_PT_VAR + 20,
+}; /* 128 bytes is an arbitrary cap. This could be as large as TPM_BUFSIZE - 18
- bytes, but 128 is still a relatively large number of random bytes and
On Fri, Feb 17, 2023 at 04:18:39PM +0100, Thorsten Leemhuis wrote:
On 14.02.23 21:19, Mario Limonciello wrote:
AMD has issued an advisory indicating that having fTPM enabled in BIOS can cause "stuttering" in the OS. This issue has been fixed in newer versions of the fTPM firmware, but it's up to system designers to decide whether to distribute it.
This issue has existed for a while, but is more prevalent starting with kernel 6.1 because commit b006c439d58db ("hwrng: core - start hwrng kthread also for untrusted sources") started to use the fTPM for hwrng by default. However, all uses of /dev/hwrng result in unacceptable stuttering.
So, simply disable registration of the defective hwrng when detecting these faulty fTPM versions.
Hmm, no reply since Mario posted this.
Jarkko, James, what's your stance on this? Does the patch look fine from your point of view? And does the situation justify merging this on the last minute for 6.2? Or should we merge it early for 6.3 and then backport to stable?
Ciao, Thorsten
As I stated in earlier response: do we want to forbid tpm_crb in this case or do we want to pass-through with a faulty firmware?
Not weighting either choice here I just don't see any motivating points in the commit message to pick either, that's all.
BR, Jarkko
On 2/17/2023 16:05, Jarkko Sakkinen wrote:
Perhaps tpm_amd_* ?
When Jason first proposed this patch I feel the intent was it could cover multiple deficiencies. But as this is the only one for now, sure re-naming it is fine.
Also, just a question: is there any legit use for fTPM's, which are not updated? I.e. why would want tpm_crb to initialize with a dysfunctional firmware?> I.e. the existential question is: is it better to workaround the
issue and
let pass through, or make the user aware that the firmware would really need an update.
On 2/17/2023 16:35, Jarkko Sakkinen wrote:
Hmm, no reply since Mario posted this.
Jarkko, James, what's your stance on this? Does the patch look fine from your point of view? And does the situation justify merging this on the last minute for 6.2? Or should we merge it early for 6.3 and then backport to stable?
Ciao, Thorsten
As I stated in earlier response: do we want to forbid tpm_crb in this case or do we want to pass-through with a faulty firmware?
Not weighting either choice here I just don't see any motivating points in the commit message to pick either, that's all.
BR, Jarkko
Even if you're not using RNG functionality you can still do plenty of other things with the TPM. The RNG functionality is what tripped up this issue though. All of these issues were only raised because the kernel started using it by default for RNG and userspace wants random numbers all the time.
If the firmware was easily updatable from all the OEMs I would lean on trying to encourage people to update. But alas this has been available for over a year and a sizable number of OEMs haven't distributed a fix.
The major issue I see with forbidding tpm_crb is that users may have been using the fTPM for something and taking it away in an update could lead to a no-boot scenario if they're (for example) tying a policy to PCR values and can no longer access those.
If the consensus were to go that direction instead I would want to see a module parameter that lets users turn on the fTPM even knowing this problem exists so they could recover. That all seems pretty expensive to me for this problem.
On Fri, Feb 17, 2023 at 08:25:56PM -0600, Limonciello, Mario wrote:
On 2/17/2023 16:05, Jarkko Sakkinen wrote:
Perhaps tpm_amd_* ?
When Jason first proposed this patch I feel the intent was it could cover multiple deficiencies. But as this is the only one for now, sure re-naming it is fine.
Also, just a question: is there any legit use for fTPM's, which are not updated? I.e. why would want tpm_crb to initialize with a dysfunctional firmware?> I.e. the existential question is: is it better to workaround the issue and let pass through, or make the user aware that the firmware would really need an update.
On 2/17/2023 16:35, Jarkko Sakkinen wrote:
Hmm, no reply since Mario posted this.
Jarkko, James, what's your stance on this? Does the patch look fine from your point of view? And does the situation justify merging this on the last minute for 6.2? Or should we merge it early for 6.3 and then backport to stable?
Ciao, Thorsten
As I stated in earlier response: do we want to forbid tpm_crb in this case or do we want to pass-through with a faulty firmware?
Not weighting either choice here I just don't see any motivating points in the commit message to pick either, that's all.
BR, Jarkko
Even if you're not using RNG functionality you can still do plenty of other things with the TPM. The RNG functionality is what tripped up this issue though. All of these issues were only raised because the kernel started using it by default for RNG and userspace wants random numbers all the time.
If the firmware was easily updatable from all the OEMs I would lean on trying to encourage people to update. But alas this has been available for over a year and a sizable number of OEMs haven't distributed a fix.
The major issue I see with forbidding tpm_crb is that users may have been using the fTPM for something and taking it away in an update could lead to a no-boot scenario if they're (for example) tying a policy to PCR values and can no longer access those.
If the consensus were to go that direction instead I would want to see a module parameter that lets users turn on the fTPM even knowing this problem exists so they could recover. That all seems pretty expensive to me for this problem.
I agree with the last argument.
I re-read the commit message and https://www.amd.com/en/support/kb/faq/pa-410.
Why this scopes down to only rng? Should TPM2_CC_GET_RANDOM also blocked from /dev/tpm0?
BR, Jarkko
[Public]
-----Original Message----- From: Jarkko Sakkinen jarkko@kernel.org Sent: Tuesday, February 21, 2023 16:53 To: Limonciello, Mario Mario.Limonciello@amd.com Cc: Thorsten Leemhuis regressions@leemhuis.info; James Bottomley James.Bottomley@hansenpartnership.com; Jason@zx2c4.com; linux- integrity@vger.kernel.org; linux-kernel@vger.kernel.org; stable@vger.kernel.org; Linus Torvalds torvalds@linux-foundation.org; Linux kernel regressions list regressions@lists.linux.dev Subject: Re: [PATCH 1/1] tpm: disable hwrng for fTPM on some AMD designs
On Fri, Feb 17, 2023 at 08:25:56PM -0600, Limonciello, Mario wrote:
On 2/17/2023 16:05, Jarkko Sakkinen wrote:
Perhaps tpm_amd_* ?
When Jason first proposed this patch I feel the intent was it could cover multiple deficiencies. But as this is the only one for now, sure re-naming it is fine.
Also, just a question: is there any legit use for fTPM's, which are not updated? I.e. why would want tpm_crb to initialize with a dysfunctional firmware?> I.e. the existential question is: is it better to workaround the issue and let pass through, or make the user aware that the firmware would really need an update.
On 2/17/2023 16:35, Jarkko Sakkinen wrote:
Hmm, no reply since Mario posted this.
Jarkko, James, what's your stance on this? Does the patch look fine
from
your point of view? And does the situation justify merging this on the last minute for 6.2? Or should we merge it early for 6.3 and then backport to stable?
Ciao, Thorsten
As I stated in earlier response: do we want to forbid tpm_crb in this case or do we want to pass-through with a faulty firmware?
Not weighting either choice here I just don't see any motivating points in the commit message to pick either, that's all.
BR, Jarkko
Even if you're not using RNG functionality you can still do plenty of other things with the TPM. The RNG functionality is what tripped up this issue though. All of these issues were only raised because the kernel started using it by default for RNG and userspace wants random numbers all the
time.
If the firmware was easily updatable from all the OEMs I would lean on trying to encourage people to update. But alas this has been available for over a year and a sizable number of OEMs haven't distributed a fix.
The major issue I see with forbidding tpm_crb is that users may have been using the fTPM for something and taking it away in an update could lead to
a
no-boot scenario if they're (for example) tying a policy to PCR values and can no longer access those.
If the consensus were to go that direction instead I would want to see a module parameter that lets users turn on the fTPM even knowing this
problem
exists so they could recover. That all seems pretty expensive to me for this problem.
I agree with the last argument.
FYI, I did send out a v2 and folded in this argument to the commit message and adjusted for your feedback. You might not have found it in your inbox yet.
I re-read the commit message and https://www.amd.com/en/support/kb/faq/pa-410.
Why this scopes down to only rng? Should TPM2_CC_GET_RANDOM also blocked from /dev/tpm0?
The only reason that this commit was created is because the kernel utilized the fTPM for hwrng which triggered the problem. If that never happened this probably wouldn't have been exposed either.
Yes; I would agree that if someone was to do other fTPM operations over an extended period of time it's plausible they can cause the problem too.
But picking and choosing functionality to block seems quite arbitrary to me.
Hi, this is your Linux kernel regression tracker. Top-posting for once, to make this easily accessible to everyone.
Jarkko (or James), what is needed to get this regression resolved? More people showed up that are apparently affected by this. Sure, 6.2 is out, but it's a regression in 6.1 it thus would be good to fix rather sooner than later. Ideally this week, if you ask me.
FWIW, latest version of this patch is here, but it didn't get any replies since it was posted last Tuesday (and the mail quoted below is just one day younger):
https://lore.kernel.org/all/20230220180729.23862-1-mario.limonciello@amd.com...
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page.
#regzbot poke
On 22.02.23 00:10, Limonciello, Mario wrote:
[Public]
-----Original Message----- From: Jarkko Sakkinen jarkko@kernel.org Sent: Tuesday, February 21, 2023 16:53 To: Limonciello, Mario Mario.Limonciello@amd.com Cc: Thorsten Leemhuis regressions@leemhuis.info; James Bottomley James.Bottomley@hansenpartnership.com; Jason@zx2c4.com; linux- integrity@vger.kernel.org; linux-kernel@vger.kernel.org; stable@vger.kernel.org; Linus Torvalds torvalds@linux-foundation.org; Linux kernel regressions list regressions@lists.linux.dev Subject: Re: [PATCH 1/1] tpm: disable hwrng for fTPM on some AMD designs
On Fri, Feb 17, 2023 at 08:25:56PM -0600, Limonciello, Mario wrote:
On 2/17/2023 16:05, Jarkko Sakkinen wrote:
Perhaps tpm_amd_* ?
When Jason first proposed this patch I feel the intent was it could cover multiple deficiencies. But as this is the only one for now, sure re-naming it is fine.
Also, just a question: is there any legit use for fTPM's, which are not updated? I.e. why would want tpm_crb to initialize with a dysfunctional firmware?> I.e. the existential question is: is it better to workaround the issue and let pass through, or make the user aware that the firmware would really need an update.
On 2/17/2023 16:35, Jarkko Sakkinen wrote:
Hmm, no reply since Mario posted this.
Jarkko, James, what's your stance on this? Does the patch look fine
from
your point of view? And does the situation justify merging this on the last minute for 6.2? Or should we merge it early for 6.3 and then backport to stable?
Ciao, Thorsten
As I stated in earlier response: do we want to forbid tpm_crb in this case or do we want to pass-through with a faulty firmware?
Not weighting either choice here I just don't see any motivating points in the commit message to pick either, that's all.
BR, Jarkko
Even if you're not using RNG functionality you can still do plenty of other things with the TPM. The RNG functionality is what tripped up this issue though. All of these issues were only raised because the kernel started using it by default for RNG and userspace wants random numbers all the
time.
If the firmware was easily updatable from all the OEMs I would lean on trying to encourage people to update. But alas this has been available for over a year and a sizable number of OEMs haven't distributed a fix.
The major issue I see with forbidding tpm_crb is that users may have been using the fTPM for something and taking it away in an update could lead to
a
no-boot scenario if they're (for example) tying a policy to PCR values and can no longer access those.
If the consensus were to go that direction instead I would want to see a module parameter that lets users turn on the fTPM even knowing this
problem
exists so they could recover. That all seems pretty expensive to me for this problem.
I agree with the last argument.
FYI, I did send out a v2 and folded in this argument to the commit message and adjusted for your feedback. You might not have found it in your inbox yet.
I re-read the commit message and https://www.amd.com/en/support/kb/faq/pa-410.
Why this scopes down to only rng? Should TPM2_CC_GET_RANDOM also blocked from /dev/tpm0?
The only reason that this commit was created is because the kernel utilized the fTPM for hwrng which triggered the problem. If that never happened this probably wouldn't have been exposed either.
Yes; I would agree that if someone was to do other fTPM operations over an extended period of time it's plausible they can cause the problem too.
But picking and choosing functionality to block seems quite arbitrary to me.
On Mon, Feb 27, 2023 at 11:57:15AM +0100, Thorsten Leemhuis wrote:
Hi, this is your Linux kernel regression tracker. Top-posting for once, to make this easily accessible to everyone.
Jarkko (or James), what is needed to get this regression resolved? More people showed up that are apparently affected by this. Sure, 6.2 is out, but it's a regression in 6.1 it thus would be good to fix rather sooner than later. Ideally this week, if you ask me.
I do not see any tested-by's responded to v2 patch. I.e. we have an unverified solution, which cannot be applied.
BR, Jarkko
On Mon, Feb 27, 2023 at 01:14:46PM +0200, Jarkko Sakkinen wrote:
On Mon, Feb 27, 2023 at 11:57:15AM +0100, Thorsten Leemhuis wrote:
Hi, this is your Linux kernel regression tracker. Top-posting for once, to make this easily accessible to everyone.
Jarkko (or James), what is needed to get this regression resolved? More people showed up that are apparently affected by this. Sure, 6.2 is out, but it's a regression in 6.1 it thus would be good to fix rather sooner than later. Ideally this week, if you ask me.
I do not see any tested-by's responded to v2 patch. I.e. we have an unverified solution, which cannot be applied.
v2 is good enough as far as I'm concerned as long as we know it is good to go. Please do not respond tested-by to his thread. Test it and respond to the corresponding thread so that all tags can be picked by b4.
BR, Jarkko
On Tue, Feb 14, 2023 at 02:19:55PM -0600, Mario Limonciello wrote:
AMD has issued an advisory indicating that having fTPM enabled in BIOS can cause "stuttering" in the OS. This issue has been fixed in newer versions of the fTPM firmware, but it's up to system designers to decide whether to distribute it.
This issue has existed for a while, but is more prevalent starting with kernel 6.1 because commit b006c439d58db ("hwrng: core - start hwrng kthread also for untrusted sources") started to use the fTPM for hwrng by default. However, all uses of /dev/hwrng result in unacceptable stuttering.
So, simply disable registration of the defective hwrng when detecting these faulty fTPM versions.
Link: https://www.amd.com/en/support/kb/faq/pa-410 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216989 Link: https://lore.kernel.org/all/20230209153120.261904-1-Jason@zx2c4.com/ Fixes: b006c439d58d ("hwrng: core - start hwrng kthread also for untrusted sources") Cc: stable@vger.kernel.org Cc: Jarkko Sakkinen jarkko@kernel.org Cc: Thorsten Leemhuis regressions@leemhuis.info Cc: James Bottomley James.Bottomley@hansenpartnership.com Co-developed-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com
drivers/char/tpm/tpm-chip.c | 62 ++++++++++++++++++++++++++++++- drivers/char/tpm/tpm.h | 73 +++++++++++++++++++++++++++++++++++++ 2 files changed, 134 insertions(+), 1 deletion(-)
diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c index 741d8f3e8fb3a..348dd5705fbb6 100644 --- a/drivers/char/tpm/tpm-chip.c +++ b/drivers/char/tpm/tpm-chip.c @@ -512,6 +512,65 @@ static int tpm_add_legacy_sysfs(struct tpm_chip *chip) return 0; } +static bool tpm_is_rng_defective(struct tpm_chip *chip)
Perhaps tpm_amd_* ?
Also, just a question: is there any legit use for fTPM's, which are not updated? I.e. why would want tpm_crb to initialize with a dysfunctional firmware?
I.e. the existential question is: is it better to workaround the issue and let pass through, or make the user aware that the firmware would really need an update.
+{
- int ret;
- u64 version;
- u32 val1, val2;
I'd use reverse christmas tree order here.
- /* No known-broken TPM1 chips. */
- if (!(chip->flags & TPM_CHIP_FLAG_TPM2))
return false;
- ret = tpm_request_locality(chip);
- if (ret)
return false;
- /* Some AMD fTPM versions may cause stutter */
- ret = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val1, NULL);
- if (ret)
goto release;
- if (val1 != 0x414D4400U /* AMD */) {
ret = -ENODEV;
goto release;
- }
- ret = tpm2_get_tpm_pt(chip, TPM2_PT_FIRMWARE_VERSION_1, &val1, NULL);
- if (ret)
goto release;
- ret = tpm2_get_tpm_pt(chip, TPM2_PT_FIRMWARE_VERSION_2, &val2, NULL);
- if (ret)
goto release;
+release:
- tpm_relinquish_locality(chip);
- if (ret)
return false;
- version = ((u64)val1 << 32) | val2;
- /*
* Fixes for stutter as described in
* https://www.amd.com/en/support/kb/faq/pa-410
* are available in two series of fTPM firmware:
* 6.x.y.z series: 6.0.18.6 +
* 3.x.y.z series: 3.57.x.5 +
*/
- if ((version >> 48) == 6) {
if (version >= 0x0006000000180006ULL)
return false;
- } else if ((version >> 48) == 3) {
if (version >= 0x0003005700000005ULL)
return false;
- } else {
return false;
- }
You can drop the curly braces here.
- dev_warn(&chip->dev,
"AMD fTPM version 0x%llx causes system stutter; hwrng disabled\n",
version);
- return true;
+}
static int tpm_hwrng_read(struct hwrng *rng, void *data, size_t max, bool wait) { struct tpm_chip *chip = container_of(rng, struct tpm_chip, hwrng); @@ -521,7 +580,8 @@ static int tpm_hwrng_read(struct hwrng *rng, void *data, size_t max, bool wait) static int tpm_add_hwrng(struct tpm_chip *chip) {
- if (!IS_ENABLED(CONFIG_HW_RANDOM_TPM) || tpm_is_firmware_upgrade(chip))
- if (!IS_ENABLED(CONFIG_HW_RANDOM_TPM) || tpm_is_firmware_upgrade(chip) ||
return 0;tpm_is_rng_defective(chip))
snprintf(chip->hwrng_name, sizeof(chip->hwrng_name), diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h index 24ee4e1cc452a..830014a266090 100644 --- a/drivers/char/tpm/tpm.h +++ b/drivers/char/tpm/tpm.h @@ -150,6 +150,79 @@ enum tpm_sub_capabilities { TPM_CAP_PROP_TIS_DURATION = 0x120, }; +enum tpm2_pt_props {
- TPM2_PT_NONE = 0x00000000,
- TPM2_PT_GROUP = 0x00000100,
- TPM2_PT_FIXED = TPM2_PT_GROUP * 1,
- TPM2_PT_FAMILY_INDICATOR = TPM2_PT_FIXED + 0,
- TPM2_PT_LEVEL = TPM2_PT_FIXED + 1,
- TPM2_PT_REVISION = TPM2_PT_FIXED + 2,
- TPM2_PT_DAY_OF_YEAR = TPM2_PT_FIXED + 3,
- TPM2_PT_YEAR = TPM2_PT_FIXED + 4,
- TPM2_PT_MANUFACTURER = TPM2_PT_FIXED + 5,
- TPM2_PT_VENDOR_STRING_1 = TPM2_PT_FIXED + 6,
- TPM2_PT_VENDOR_STRING_2 = TPM2_PT_FIXED + 7,
- TPM2_PT_VENDOR_STRING_3 = TPM2_PT_FIXED + 8,
- TPM2_PT_VENDOR_STRING_4 = TPM2_PT_FIXED + 9,
- TPM2_PT_VENDOR_TPM_TYPE = TPM2_PT_FIXED + 10,
- TPM2_PT_FIRMWARE_VERSION_1 = TPM2_PT_FIXED + 11,
- TPM2_PT_FIRMWARE_VERSION_2 = TPM2_PT_FIXED + 12,
- TPM2_PT_INPUT_BUFFER = TPM2_PT_FIXED + 13,
- TPM2_PT_HR_TRANSIENT_MIN = TPM2_PT_FIXED + 14,
- TPM2_PT_HR_PERSISTENT_MIN = TPM2_PT_FIXED + 15,
- TPM2_PT_HR_LOADED_MIN = TPM2_PT_FIXED + 16,
- TPM2_PT_ACTIVE_SESSIONS_MAX = TPM2_PT_FIXED + 17,
- TPM2_PT_PCR_COUNT = TPM2_PT_FIXED + 18,
- TPM2_PT_PCR_SELECT_MIN = TPM2_PT_FIXED + 19,
- TPM2_PT_CONTEXT_GAP_MAX = TPM2_PT_FIXED + 20,
- TPM2_PT_NV_COUNTERS_MAX = TPM2_PT_FIXED + 22,
- TPM2_PT_NV_INDEX_MAX = TPM2_PT_FIXED + 23,
- TPM2_PT_MEMORY = TPM2_PT_FIXED + 24,
- TPM2_PT_CLOCK_UPDATE = TPM2_PT_FIXED + 25,
- TPM2_PT_CONTEXT_HASH = TPM2_PT_FIXED + 26,
- TPM2_PT_CONTEXT_SYM = TPM2_PT_FIXED + 27,
- TPM2_PT_CONTEXT_SYM_SIZE = TPM2_PT_FIXED + 28,
- TPM2_PT_ORDERLY_COUNT = TPM2_PT_FIXED + 29,
- TPM2_PT_MAX_COMMAND_SIZE = TPM2_PT_FIXED + 30,
- TPM2_PT_MAX_RESPONSE_SIZE = TPM2_PT_FIXED + 31,
- TPM2_PT_MAX_DIGEST = TPM2_PT_FIXED + 32,
- TPM2_PT_MAX_OBJECT_CONTEXT = TPM2_PT_FIXED + 33,
- TPM2_PT_MAX_SESSION_CONTEXT = TPM2_PT_FIXED + 34,
- TPM2_PT_PS_FAMILY_INDICATOR = TPM2_PT_FIXED + 35,
- TPM2_PT_PS_LEVEL = TPM2_PT_FIXED + 36,
- TPM2_PT_PS_REVISION = TPM2_PT_FIXED + 37,
- TPM2_PT_PS_DAY_OF_YEAR = TPM2_PT_FIXED + 38,
- TPM2_PT_PS_YEAR = TPM2_PT_FIXED + 39,
- TPM2_PT_SPLIT_MAX = TPM2_PT_FIXED + 40,
- TPM2_PT_TOTAL_COMMANDS = TPM2_PT_FIXED + 41,
- TPM2_PT_LIBRARY_COMMANDS = TPM2_PT_FIXED + 42,
- TPM2_PT_VENDOR_COMMANDS = TPM2_PT_FIXED + 43,
- TPM2_PT_NV_BUFFER_MAX = TPM2_PT_FIXED + 44,
- TPM2_PT_MODES = TPM2_PT_FIXED + 45,
- TPM2_PT_MAX_CAP_BUFFER = TPM2_PT_FIXED + 46,
- TPM2_PT_VAR = TPM2_PT_GROUP * 2,
- TPM2_PT_PERMANENT = TPM2_PT_VAR + 0,
- TPM2_PT_STARTUP_CLEAR = TPM2_PT_VAR + 1,
- TPM2_PT_HR_NV_INDEX = TPM2_PT_VAR + 2,
- TPM2_PT_HR_LOADED = TPM2_PT_VAR + 3,
- TPM2_PT_HR_LOADED_AVAIL = TPM2_PT_VAR + 4,
- TPM2_PT_HR_ACTIVE = TPM2_PT_VAR + 5,
- TPM2_PT_HR_ACTIVE_AVAIL = TPM2_PT_VAR + 6,
- TPM2_PT_HR_TRANSIENT_AVAIL = TPM2_PT_VAR + 7,
- TPM2_PT_HR_PERSISTENT = TPM2_PT_VAR + 8,
- TPM2_PT_HR_PERSISTENT_AVAIL = TPM2_PT_VAR + 9,
- TPM2_PT_NV_COUNTERS = TPM2_PT_VAR + 10,
- TPM2_PT_NV_COUNTERS_AVAIL = TPM2_PT_VAR + 11,
- TPM2_PT_ALGORITHM_SET = TPM2_PT_VAR + 12,
- TPM2_PT_LOADED_CURVES = TPM2_PT_VAR + 13,
- TPM2_PT_LOCKOUT_COUNTER = TPM2_PT_VAR + 14,
- TPM2_PT_MAX_AUTH_FAIL = TPM2_PT_VAR + 15,
- TPM2_PT_LOCKOUT_INTERVAL = TPM2_PT_VAR + 16,
- TPM2_PT_LOCKOUT_RECOVERY = TPM2_PT_VAR + 17,
- TPM2_PT_NV_WRITE_RECOVERY = TPM2_PT_VAR + 18,
- TPM2_PT_AUDIT_COUNTER_0 = TPM2_PT_VAR + 19,
- TPM2_PT_AUDIT_COUNTER_1 = TPM2_PT_VAR + 20,
+}; /* 128 bytes is an arbitrary cap. This could be as large as TPM_BUFSIZE - 18
- bytes, but 128 is still a relatively large number of random bytes and
-- 2.25.1
BR, Jarkko
Hi, I am still getting fTPM stutters with 6.4.3 kernel on Asus GA402RJ laptop. Compiling kernel without TPM support makes the stutters go away. The fTPM firmware version is 0x3005700020005 on my machine.
On 7/27/2023 10:38, Daniil Stas wrote:
Hi, I am still getting fTPM stutters with 6.4.3 kernel on Asus GA402RJ laptop. Compiling kernel without TPM support makes the stutters go away. The fTPM firmware version is 0x3005700020005 on my machine.
Can you please open up a kernel bugzilla and attach your dmesg to it both with TPM enabled and disabled?
You can CC me on it directly.
On Thu, 27 Jul 2023 10:42:33 -0500 Mario Limonciello mario.limonciello@amd.com wrote:
On 7/27/2023 10:38, Daniil Stas wrote:
Hi, I am still getting fTPM stutters with 6.4.3 kernel on Asus GA402RJ laptop. Compiling kernel without TPM support makes the stutters go away. The fTPM firmware version is 0x3005700020005 on my machine.
Can you please open up a kernel bugzilla and attach your dmesg to it both with TPM enabled and disabled?
You can CC me on it directly.
There are several bug categories to choose in the bugzilla. Which one should I use? I never used bugzilla before...
On 7/27/2023 11:39, Daniil Stas wrote:
On Thu, 27 Jul 2023 10:42:33 -0500 Mario Limonciello mario.limonciello@amd.com wrote:
On 7/27/2023 10:38, Daniil Stas wrote:
Hi, I am still getting fTPM stutters with 6.4.3 kernel on Asus GA402RJ laptop. Compiling kernel without TPM support makes the stutters go away. The fTPM firmware version is 0x3005700020005 on my machine.
Can you please open up a kernel bugzilla and attach your dmesg to it both with TPM enabled and disabled?
You can CC me on it directly.
There are several bug categories to choose in the bugzilla. Which one should I use? I never used bugzilla before...
drivers/other is fine. If there is a better category we can move it later.
On Thu, 27 Jul 2023 11:41:55 -0500 Mario Limonciello mario.limonciello@amd.com wrote:
On 7/27/2023 11:39, Daniil Stas wrote:
On Thu, 27 Jul 2023 10:42:33 -0500 Mario Limonciello mario.limonciello@amd.com wrote:
On 7/27/2023 10:38, Daniil Stas wrote:
[...]
Can you please open up a kernel bugzilla and attach your dmesg to it both with TPM enabled and disabled?
You can CC me on it directly.
There are several bug categories to choose in the bugzilla. Which one should I use? I never used bugzilla before...
drivers/other is fine. If there is a better category we can move it later.
I see there are already several bug reports similar to mine. This one for example: https://bugzilla.kernel.org/show_bug.cgi?id=217212 Should I still make a new one?
On 7/27/2023 11:50, Daniil Stas wrote:
On Thu, 27 Jul 2023 11:41:55 -0500 Mario Limonciello mario.limonciello@amd.com wrote:
On 7/27/2023 11:39, Daniil Stas wrote:
On Thu, 27 Jul 2023 10:42:33 -0500 Mario Limonciello mario.limonciello@amd.com wrote:
On 7/27/2023 10:38, Daniil Stas wrote:
[...]
Can you please open up a kernel bugzilla and attach your dmesg to it both with TPM enabled and disabled?
You can CC me on it directly.
There are several bug categories to choose in the bugzilla. Which one should I use? I never used bugzilla before...
drivers/other is fine. If there is a better category we can move it later.
I see there are already several bug reports similar to mine. This one for example: https://bugzilla.kernel.org/show_bug.cgi?id=217212 Should I still make a new one?
Yes; please make a separate one. If we decide it's the same we can always mark as a duplicate.
On Thu, 27 Jul 2023 11:51:21 -0500 Mario Limonciello mario.limonciello@amd.com wrote:
On 7/27/2023 11:50, Daniil Stas wrote:
On Thu, 27 Jul 2023 11:41:55 -0500 Mario Limonciello mario.limonciello@amd.com wrote:
On 7/27/2023 11:39, Daniil Stas wrote:
[...] [...]
[...]
[...] [...]
drivers/other is fine. If there is a better category we can move it later.
I see there are already several bug reports similar to mine. This one for example: https://bugzilla.kernel.org/show_bug.cgi?id=217212 Should I still make a new one?
Yes; please make a separate one. If we decide it's the same we can always mark as a duplicate.
Here is the bug report I created: https://bugzilla.kernel.org/show_bug.cgi?id=217719
On Thu, 27 Jul 2023 at 10:05, Daniil Stas daniil.stas@posteo.net wrote:
Here is the bug report I created: https://bugzilla.kernel.org/show_bug.cgi?id=217719
Let's just disable the stupid fTPM hwrnd thing.
Maybe use it for the boot-time "gather entropy from different sources", but clearly it should *not* be used at runtime.
Why would anybody use that crud when any machine that has it supposedly fixed (which apparently didn't turn out to be true after all) would also have the CPU rdrand instruction that doesn't have the problem?
If you don't trust the CPU rdrand implementation (and that has had bugs too - see clear_rdrand_cpuid_bit() and x86_init_rdrand()), why would you trust the fTPM version that has caused even *more* problems?
So I don't see any downside to just saying "that fTPM thing is not working". Even if it ends up working in the future, there are alternatives that aren't any worse.
Linus
On 7/28/2023 3:41 PM, Linus Torvalds wrote:
On Thu, 27 Jul 2023 at 10:05, Daniil Stas daniil.stas@posteo.net wrote:
Here is the bug report I created: https://bugzilla.kernel.org/show_bug.cgi?id=217719
Let's just disable the stupid fTPM hwrnd thing.
Maybe use it for the boot-time "gather entropy from different sources", but clearly it should *not* be used at runtime.
Why would anybody use that crud when any machine that has it supposedly fixed (which apparently didn't turn out to be true after all) would also have the CPU rdrand instruction that doesn't have the problem?
It /seems/ to be a separate problem, but yes I agree with your point.
If you don't trust the CPU rdrand implementation (and that has had bugs too - see clear_rdrand_cpuid_bit() and x86_init_rdrand()), why would you trust the fTPM version that has caused even *more* problems?
That's exactly why I was asking in the kernel bugzilla if something similar gets tripped up by RDRAND.
I've got a patch that tears it out entirely for AMD fTPMs in the bugzilla, but I would prefer to discuss this with BIOS people before going that direction.
So I don't see any downside to just saying "that fTPM thing is not working". Even if it ends up working in the future, there are alternatives that aren't any worse.
Linus
On Fri, 28 Jul 2023 at 14:01, Limonciello, Mario mario.limonciello@amd.com wrote:
That's exactly why I was asking in the kernel bugzilla if something similar gets tripped up by RDRAND.
So that would sound very unlikely, but who knows... Microcode can obviously do pretty much anything at all, but at least the original fTPM issues _seemed_ to be about BIOS doing truly crazy things like SPI flash accesses.
I can easily imagine a BIOS fTPM code using some absolutely horrid global "EFI synchronization" lock or whatever, which could then cause random problems just based on some entirely unrelated activity.
I would not be surprised, for example, if wasn't the fTPM hwrnd code itself that decided to read some random number from SPI, but that it simply got serialized with something else that the BIOS was involved with. It's not like BIOS people are famous for their scalable code that is entirely parallel...
And I'd be _very_ surprised if CPU microcode does anything even remotely like that. Not impossible - HP famously screwed with the time stamp counter with SMIs, and I could imagine them - or others - doing the same with rdrand.
But it does sound pretty damn unlikely, compared to "EFI BIOS uses a one big lock approach".
So rdrand (and rdseed in particular) can be rather slow, but I think we're talking hundreds of CPU cycles (maybe low thousands). Nothing like the stuttering reports we've seen from fTPM.
Linus
On 7/28/2023 4:38 PM, Linus Torvalds wrote:
On Fri, 28 Jul 2023 at 14:01, Limonciello, Mario mario.limonciello@amd.com wrote:
That's exactly why I was asking in the kernel bugzilla if something similar gets tripped up by RDRAND.
So that would sound very unlikely, but who knows... Microcode can obviously do pretty much anything at all, but at least the original fTPM issues _seemed_ to be about BIOS doing truly crazy things like SPI flash accesses.
I can easily imagine a BIOS fTPM code using some absolutely horrid global "EFI synchronization" lock or whatever, which could then cause random problems just based on some entirely unrelated activity.
I would not be surprised, for example, if wasn't the fTPM hwrnd code itself that decided to read some random number from SPI, but that it simply got serialized with something else that the BIOS was involved with. It's not like BIOS people are famous for their scalable code that is entirely parallel...
And I'd be _very_ surprised if CPU microcode does anything even remotely like that. Not impossible - HP famously screwed with the time stamp counter with SMIs, and I could imagine them - or others - doing the same with rdrand.
But it does sound pretty damn unlikely, compared to "EFI BIOS uses a one big lock approach".
So rdrand (and rdseed in particular) can be rather slow, but I think we're talking hundreds of CPU cycles (maybe low thousands). Nothing like the stuttering reports we've seen from fTPM.
Linus
Your theory sounds totally plausible and it would explain why even though this system has the fixes from the original issue it's tripping a similar behavior.
Based on the argument of RDRAND being on the same SOC I think it's a pretty good argument to drop contributing to the hwrng entropy *anything* that's not a dTPM.
On Thu Jul 27, 2023 at 3:38 PM UTC, Daniil Stas wrote:
Hi, I am still getting fTPM stutters with 6.4.3 kernel on Asus GA402RJ laptop. Compiling kernel without TPM support makes the stutters go away. The fTPM firmware version is 0x3005700020005 on my machine.
This is needs a bit more elaboration in order to be comprehended.
Do you mean by "stutter" unexpected delays and when do they happen?
BR, Jarkko
On Fri, 28 Jul 2023 19:30:18 +0000 "Jarkko Sakkinen" jarkko@kernel.org wrote:
On Thu Jul 27, 2023 at 3:38 PM UTC, Daniil Stas wrote:
Hi, I am still getting fTPM stutters with 6.4.3 kernel on Asus GA402RJ laptop. Compiling kernel without TPM support makes the stutters go away. The fTPM firmware version is 0x3005700020005 on my machine.
This is needs a bit more elaboration in order to be comprehended.
Do you mean by "stutter" unexpected delays and when do they happen?
BR, Jarkko
Yes, unexpected delays. They just happen randomly. You can google "AMD fTPM stuttering", there are a lot of examples. Here is one: https://www.youtube.com/watch?v=TYnRL-x6DVI
On Fri Jul 28, 2023 at 8:18 PM UTC, Daniil Stas wrote:
On Fri, 28 Jul 2023 19:30:18 +0000 "Jarkko Sakkinen" jarkko@kernel.org wrote:
On Thu Jul 27, 2023 at 3:38 PM UTC, Daniil Stas wrote:
Hi, I am still getting fTPM stutters with 6.4.3 kernel on Asus GA402RJ laptop. Compiling kernel without TPM support makes the stutters go away. The fTPM firmware version is 0x3005700020005 on my machine.
This is needs a bit more elaboration in order to be comprehended.
Do you mean by "stutter" unexpected delays and when do they happen?
BR, Jarkko
Yes, unexpected delays. They just happen randomly. You can google "AMD fTPM stuttering", there are a lot of examples. Here is one: https://www.youtube.com/watch?v=TYnRL-x6DVI
What if you make tpm_amd_is_rng_defective() to unconditonally return true? Does this make the problem dissappear, or not?
BR, Jarkko
On Mon, 31 Jul 2023 10:14:05 +0000 "Jarkko Sakkinen" jarkko@kernel.org wrote:
On Fri Jul 28, 2023 at 8:18 PM UTC, Daniil Stas wrote:
On Fri, 28 Jul 2023 19:30:18 +0000 "Jarkko Sakkinen" jarkko@kernel.org wrote:
On Thu Jul 27, 2023 at 3:38 PM UTC, Daniil Stas wrote:
[...]
This is needs a bit more elaboration in order to be comprehended.
Do you mean by "stutter" unexpected delays and when do they happen?
BR, Jarkko
Yes, unexpected delays. They just happen randomly. You can google "AMD fTPM stuttering", there are a lot of examples. Here is one: https://www.youtube.com/watch?v=TYnRL-x6DVI
What if you make tpm_amd_is_rng_defective() to unconditonally return true? Does this make the problem dissappear, or not?
BR, Jarkko
I already tried compiling kernel without CONFIG_HW_RANDOM_TPM enabled, which does the same. Yes, it removes the issue.
On Mon Jul 31, 2023 at 10:28 AM UTC, Daniil Stas wrote:
On Mon, 31 Jul 2023 10:14:05 +0000 "Jarkko Sakkinen" jarkko@kernel.org wrote:
On Fri Jul 28, 2023 at 8:18 PM UTC, Daniil Stas wrote:
On Fri, 28 Jul 2023 19:30:18 +0000 "Jarkko Sakkinen" jarkko@kernel.org wrote:
On Thu Jul 27, 2023 at 3:38 PM UTC, Daniil Stas wrote:
[...]
This is needs a bit more elaboration in order to be comprehended.
Do you mean by "stutter" unexpected delays and when do they happen?
BR, Jarkko
Yes, unexpected delays. They just happen randomly. You can google "AMD fTPM stuttering", there are a lot of examples. Here is one: https://www.youtube.com/watch?v=TYnRL-x6DVI
What if you make tpm_amd_is_rng_defective() to unconditonally return true? Does this make the problem dissappear, or not?
BR, Jarkko
I already tried compiling kernel without CONFIG_HW_RANDOM_TPM enabled, which does the same. Yes, it removes the issue.
Thank you, just wanted sanity check that I exactly know what is going on.
BR, Jarkko
linux-stable-mirror@lists.linaro.org