lists.linaro.org
Sign In
Sign Up
Sign In
Sign Up
Manage this list
×
Keyboard Shortcuts
Thread View
j
: Next unread message
k
: Previous unread message
j a
: Jump to all threads
j l
: Jump to MailingList overview
2025
October
September
August
July
June
May
April
March
February
January
2024
December
November
October
September
August
July
June
May
April
March
February
January
2023
December
November
October
September
August
July
June
May
April
March
February
January
2022
December
November
List overview
Download
Acc
----- 2025 -----
October 2025
September 2025
August 2025
July 2025
June 2025
May 2025
April 2025
March 2025
February 2025
January 2025
----- 2024 -----
December 2024
November 2024
October 2024
September 2024
August 2024
July 2024
June 2024
May 2024
April 2024
March 2024
February 2024
January 2024
----- 2023 -----
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
----- 2022 -----
December 2022
November 2022
acc@lists.linaro.org
126 discussions
Start a n
N
ew thread
[PATCH v2 0/4] Fixup compile warning
by Zhiqi Song
This version fixup some compile warning and add sanity test for SM2. Zhiqi Song (4): uadk_provider: add der encode and packet for SM2 uadk_provider: support sm2 hardware acceleration uadk_provider: add sanity test command for SM2 uadk_engine/digest: cleanup pointer type src/Makefile.am | 4 +- src/uadk_digest.c | 2 +- src/uadk_prov.h | 15 +- src/uadk_prov_bio.c | 3 +- src/uadk_prov_der_writer.c | 236 +++ src/uadk_prov_der_writer.h | 129 ++ src/uadk_prov_init.c | 14 +- src/uadk_prov_packet.c | 514 ++++++ src/uadk_prov_packet.h | 959 +++++++++++ src/uadk_prov_pkey.c | 770 +++++++++ src/uadk_prov_pkey.h | 389 +++++ src/uadk_prov_sm2.c | 3127 ++++++++++++++++++++++++++++++++++ test/sanity_test_provider.sh | 18 + 13 files changed, 6172 insertions(+), 8 deletions(-) create mode 100644 src/uadk_prov_der_writer.c create mode 100644 src/uadk_prov_der_writer.h create mode 100644 src/uadk_prov_packet.c create mode 100644 src/uadk_prov_packet.h create mode 100644 src/uadk_prov_pkey.c create mode 100644 src/uadk_prov_pkey.h create mode 100644 src/uadk_prov_sm2.c -- 2.33.0
1 year, 1 month
1
4
0
0
[RFC PATCH 0/3] Debugging uadk heterogeneous scheduling function
by Longfang Liu
Based on SEC and ZIP modules. Debugging uadk heterogeneous scheduling function and completing the tasks can perform hardware acceleration and software instruction acceleration at the same time. Longfang Liu (3): uadk: add heterogeneous scheduling solutions uadk: update uadk tool uadk: update uadk scheduler processing Makefile.am | 124 +++--- drv/hisi_sec.c | 20 +- drv/isa_ce_sm4.c | 9 + drv/isa_ce_sm4.h | 5 + include/wd_alg.h | 1 + include/wd_alg_common.h | 13 + include/wd_sched.h | 1 + include/wd_util.h | 15 +- module.mk | 51 +-- test/Makefile.am | 11 +- uadk_tool/Makefile.am | 20 +- uadk_tool/benchmark/sec_uadk_benchmark.c | 527 +--------------------- uadk_tool/benchmark/uadk_benchmark.c | 22 +- uadk_tool/uadk_tool.c | 2 +- wd.c | 1 - wd_cipher.c | 144 +++--- wd_comp.c | 85 ++-- wd_sched.c | 203 +++++++-- wd_util.c | 529 +++++++++++++++++------ 19 files changed, 849 insertions(+), 934 deletions(-) -- 2.33.0
1 year, 1 month
1
3
0
0
[PATCH 00/12] uadk_provider: add sm2 combined digest support
by Zhiqi Song
From: JiangShui Yang <yangjiangshui(a)h-partners.com> Chenghai Huang (9): uadk_provider: define the err and success return name uadk_provider: add digest single block function for provider uadk_provider: add input pointer check uadk_provider: code cleanup for provider digest uadk_provider: extract Digest info table check function uadk_provider: optimized provider update performance uadk_provider: add sha512_XXX algorithm name processing uadk_provider: fix the uadk_provider digest ctx copy function uadk_engine: optimized engine update process Zhiqi Song (3): uadk_provider: fixup bio problem in provider uadk_provider: add der encode and packet for SM2 uadk_provider: support sm2 hardware acceleration src/Makefile.am | 5 +- src/uadk_digest.c | 44 +- src/uadk_prov.h | 54 +- src/uadk_prov_bio.c | 265 +++ src/uadk_prov_bio.h | 34 + src/uadk_prov_der_writer.c | 236 +++ src/uadk_prov_der_writer.h | 129 ++ src/uadk_prov_digest.c | 352 ++-- src/uadk_prov_init.c | 84 +- src/uadk_prov_packet.c | 514 ++++++ src/uadk_prov_packet.h | 959 +++++++++++ src/uadk_prov_pkey.c | 770 +++++++++ src/uadk_prov_pkey.h | 429 +++++ src/uadk_prov_sm2.c | 3146 ++++++++++++++++++++++++++++++++++++ 14 files changed, 6864 insertions(+), 157 deletions(-) create mode 100644 src/uadk_prov_bio.c create mode 100644 src/uadk_prov_bio.h create mode 100644 src/uadk_prov_der_writer.c create mode 100644 src/uadk_prov_der_writer.h create mode 100644 src/uadk_prov_packet.c create mode 100644 src/uadk_prov_packet.h create mode 100644 src/uadk_prov_pkey.c create mode 100644 src/uadk_prov_pkey.h create mode 100644 src/uadk_prov_sm2.c -- 2.33.0
1 year, 2 months
1
12
0
0
[PATCH 01/15] cipher: add ctrl function, used by ASF set numa affinity
by Zhiqi Song
From: wangzengliang <wangzengliang2(a)huawei.com> Signed-off-by: wangzengliang <wangzengliang2(a)huawei.com> Signed-off-by: JiangShui Yang <yangjiangshui(a)h-partners.com> --- src/uadk_cipher.c | 32 ++++++++++++++++++++++++++------ 1 file changed, 26 insertions(+), 6 deletions(-) diff --git a/src/uadk_cipher.c b/src/uadk_cipher.c index b506c22..adcde01 100644 --- a/src/uadk_cipher.c +++ b/src/uadk_cipher.c @@ -67,6 +67,7 @@ struct cipher_priv_ctx { /* Crypto small packet offload threshold */ size_t switch_threshold; bool update_iv; + struct sched_params sched_param; }; struct cipher_info { @@ -690,11 +691,26 @@ static int do_cipher_async(struct cipher_priv_ctx *priv, struct async_op *op) return 1; } +static int uadk_e_cipher_ctrl(EVP_CIPHER_CTX *ctx, int type, int numa_node, void *ptr) +{ + struct cipher_priv_ctx *priv = + (struct cipher_priv_ctx *)EVP_CIPHER_CTX_get_cipher_data(ctx); + + if (unlikely(!priv)) { + fprintf(stderr, "cipher priv ctx is NULL!\n"); + return 0; + } + + priv->sched_param.numa_id = numa_node; + priv->setup.sched_param = (void *)&(priv->sched_param); + return 1; +} + static void uadk_e_ctx_init(EVP_CIPHER_CTX *ctx, struct cipher_priv_ctx *priv) { __u32 cipher_counts = ARRAY_SIZE(cipher_info_table); - struct sched_params params = {0}; - int nid, ret; + struct sched_params *para; + int nid, ret, type; __u32 i; priv->req.iv_bytes = EVP_CIPHER_CTX_iv_length(ctx); @@ -715,14 +731,17 @@ static void uadk_e_ctx_init(EVP_CIPHER_CTX *ctx, struct cipher_priv_ctx *priv) * the cipher algorithm does not distinguish between * encryption and decryption queues */ - params.type = priv->req.op_type; + type = priv->req.op_type; ret = uadk_e_is_env_enabled("cipher"); if (ret) - params.type = 0; + type = 0; /* Use the default numa parameters */ - params.numa_id = -1; - priv->setup.sched_param = ¶ms; + if (priv->setup.sched_param != &priv->sched_param) + uadk_e_cipher_ctrl(ctx, 0, -1, NULL); + + para = (struct sched_params *)priv->setup.sched_param; + para->type = type; if (!priv->sess) { nid = EVP_CIPHER_CTX_nid(ctx); @@ -820,6 +839,7 @@ do { \ !EVP_CIPHER_meth_set_init(uadk_##name, uadk_e_cipher_init) || \ !EVP_CIPHER_meth_set_do_cipher(uadk_##name, uadk_e_do_cipher) || \ !EVP_CIPHER_meth_set_cleanup(uadk_##name, uadk_e_cipher_cleanup) || \ + !EVP_CIPHER_meth_set_ctrl(uadk_##name, uadk_e_cipher_ctrl) || \ !EVP_CIPHER_meth_set_set_asn1_params(uadk_##name, EVP_CIPHER_set_asn1_iv) || \ !EVP_CIPHER_meth_set_get_asn1_params(uadk_##name, EVP_CIPHER_get_asn1_iv)) \ return 0; \ -- 2.33.0
1 year, 4 months
1
14
0
0
[PATCH 0/4] uadk_engine: some cleanup
by Zhiqi Song
This series of patches are mainly used for cleanup. Zhiqi Song (4): uadk_engine: cleanup code style of async functions cipher: cleanup repeated function invoking digest: add ctx allocation check sm2: add ctx allocation check src/uadk_async.c | 126 ++++++++++++++++++++++------------------------ src/uadk_async.h | 3 ++ src/uadk_cipher.c | 7 ++- src/uadk_digest.c | 2 + src/uadk_sm2.c | 6 ++- 5 files changed, 74 insertions(+), 70 deletions(-) -- 2.33.0
1 year, 6 months
1
4
0
0
[PATCH 1/2] uadk/digest: modify spelling errors
by Zhiqi Song
Modify spelling errors related to digest stream mode. Signed-off-by: Zhiqi Song <songzhiqi1(a)huawei.com> --- drv/hash_mb/hash_mb.c | 4 ++-- drv/hisi_sec.c | 8 ++++---- drv/isa_ce_sm3.c | 4 ++-- include/drv/wd_digest_drv.h | 6 +++--- v1/test/hisi_sec_test/test_hisi_sec.c | 4 ++-- v1/test/hisi_sec_test_sgl/test_hisi_sec_sgl.c | 4 ++-- wd_digest.c | 2 +- 7 files changed, 16 insertions(+), 16 deletions(-) diff --git a/drv/hash_mb/hash_mb.c b/drv/hash_mb/hash_mb.c index 0f27c11..4750062 100644 --- a/drv/hash_mb/hash_mb.c +++ b/drv/hash_mb/hash_mb.c @@ -406,7 +406,7 @@ static int hash_do_partial(struct hash_mb_poll_queue *poll_queue, int ret = WD_SUCCESS; switch (bd_type) { - case HASH_FRIST_BLOCK: + case HASH_FIRST_BLOCK: ret = hash_first_block_process(d_msg, job, poll_queue->ops->iv_bytes); break; case HASH_MIDDLE_BLOCK: @@ -434,7 +434,7 @@ static void hash_mb_init_iv(struct hash_mb_poll_queue *poll_queue, job->opad.opad_size = 0; switch (bd_type) { - case HASH_FRIST_BLOCK: + case HASH_FIRST_BLOCK: memcpy(job->result_digest, poll_queue->ops->iv_data, poll_queue->ops->iv_bytes); if (d_msg->mode != WD_DIGEST_HMAC) return; diff --git a/drv/hisi_sec.c b/drv/hisi_sec.c index 2eaac51..e66fb00 100644 --- a/drv/hisi_sec.c +++ b/drv/hisi_sec.c @@ -1536,7 +1536,7 @@ static int fill_digest_long_hash(handle_t h_qp, struct wd_digest_msg *msg, if (ret) return ret; - if (block_type == HASH_FRIST_BLOCK) { + if (block_type == HASH_FIRST_BLOCK) { /* Long hash first */ sqe->ai_apd_cs = AI_GEN_INNER; sqe->ai_apd_cs |= AUTHPAD_NOPAD << AUTHPAD_OFFSET; @@ -1618,7 +1618,7 @@ static int digest_bd2_type_check(struct wd_digest_msg *msg) enum hash_block_type type = get_hash_block_type(msg); /* Long hash first and middle bd */ - if (type == HASH_FRIST_BLOCK || type == HASH_MIDDLE_BLOCK) { + if (type == HASH_FIRST_BLOCK || type == HASH_MIDDLE_BLOCK) { WD_ERR("hardware v2 not supports 0 size in long hash!\n"); return -WD_EINVAL; } @@ -1636,7 +1636,7 @@ static int digest_bd3_type_check(struct wd_digest_msg *msg) { enum hash_block_type type = get_hash_block_type(msg); /* Long hash first and middle bd */ - if (type == HASH_FRIST_BLOCK || type == HASH_MIDDLE_BLOCK) { + if (type == HASH_FIRST_BLOCK || type == HASH_MIDDLE_BLOCK) { WD_ERR("invalid: hardware v3 not supports 0 size in long hash!\n"); return -WD_EINVAL; } @@ -1889,7 +1889,7 @@ static int fill_digest_long_hash3(handle_t h_qp, struct wd_digest_msg *msg, if (ret) return ret; - if (block_type == HASH_FRIST_BLOCK) { + if (block_type == HASH_FIRST_BLOCK) { /* Long hash first */ sqe->auth_mac_key |= AI_GEN_INNER << SEC_AI_GEN_OFFSET_V3; sqe->stream_scene.stream_auth_pad = AUTHPAD_NOPAD; diff --git a/drv/isa_ce_sm3.c b/drv/isa_ce_sm3.c index 2789a08..5fc7acc 100644 --- a/drv/isa_ce_sm3.c +++ b/drv/isa_ce_sm3.c @@ -188,7 +188,7 @@ static int do_sm3_ce(struct wd_digest_msg *msg, __u8 *out_digest) sm3_ce_update(&sctx, data, data_len, sm3_ce_block_compress); sm3_ce_final(&sctx, out_digest, sm3_ce_block_compress); break; - case HASH_FRIST_BLOCK: + case HASH_FIRST_BLOCK: sm3_ce_init(&sctx); sm3_ce_update(&sctx, data, data_len, sm3_ce_block_compress); trans_output_result(out_digest, sctx.word_reg); @@ -306,7 +306,7 @@ static int do_hmac_sm3_ce(struct wd_digest_msg *msg, __u8 *out_hmac) sm3_ce_hmac_update(&hctx, data, data_len); sm3_ce_hmac_final(&hctx, out_hmac); break; - case HASH_FRIST_BLOCK: + case HASH_FIRST_BLOCK: sm3_ce_hmac_init(&hctx, key, key_len); sm3_ce_hmac_update(&hctx, data, data_len); trans_output_result(out_hmac, hctx.sctx.word_reg); diff --git a/include/drv/wd_digest_drv.h b/include/drv/wd_digest_drv.h index 7d86b65..5e3e821 100644 --- a/include/drv/wd_digest_drv.h +++ b/include/drv/wd_digest_drv.h @@ -12,7 +12,7 @@ extern "C" { #endif enum hash_block_type { - HASH_FRIST_BLOCK, + HASH_FIRST_BLOCK, HASH_MIDDLE_BLOCK, HASH_END_BLOCK, HASH_SINGLE_BLOCK, @@ -66,13 +66,13 @@ static inline enum hash_block_type get_hash_block_type(struct wd_digest_msg *msg { /* * [has_next , iv_bytes] - * [ 1 , 0 ] = long hash(frist bd) + * [ 1 , 0 ] = long hash(first bd) * [ 1 , 1 ] = long hash(middle bd) * [ 0 , 1 ] = long hash(end bd) * [ 0 , 0 ] = block hash(single bd) */ if (msg->has_next && !msg->iv_bytes) - return HASH_FRIST_BLOCK; + return HASH_FIRST_BLOCK; else if (msg->has_next && msg->iv_bytes) return HASH_MIDDLE_BLOCK; else if (!msg->has_next && msg->iv_bytes) diff --git a/v1/test/hisi_sec_test/test_hisi_sec.c b/v1/test/hisi_sec_test/test_hisi_sec.c index be4ee9d..05c91ad 100644 --- a/v1/test/hisi_sec_test/test_hisi_sec.c +++ b/v1/test/hisi_sec_test/test_hisi_sec.c @@ -1463,7 +1463,7 @@ static int sec_cipher_async_test(int thread_num, __u64 lcore_mask, SEC_TST_PRT("%s(): create pool fail!\n", __func__); return -ENOMEM; } - /* frist create the async poll thread! */ + /* first create the async poll thread! */ test_thrds_data[0].pool = pool; test_thrds_data[0].q = &q; test_thrds_data[0].thread_num = 1; @@ -2070,7 +2070,7 @@ static int sec_aead_async_test(int thd_num, __u64 lcore_mask, SEC_TST_PRT("%s(): create pool fail!\n", __func__); return -ENOMEM; } - /* frist create the async poll thread! */ + /* first create the async poll thread! */ test_thrds_data[0].pool = pool; test_thrds_data[0].q = &q; test_thrds_data[0].thread_num = 1; diff --git a/v1/test/hisi_sec_test_sgl/test_hisi_sec_sgl.c b/v1/test/hisi_sec_test_sgl/test_hisi_sec_sgl.c index b7513d1..ba5cdfa 100644 --- a/v1/test/hisi_sec_test_sgl/test_hisi_sec_sgl.c +++ b/v1/test/hisi_sec_test_sgl/test_hisi_sec_sgl.c @@ -1733,7 +1733,7 @@ static int sec_cipher_async_test(int thread_num, __u64 lcore_mask, SEC_TST_PRT("%s(): create pool fail!\n", __func__); return -ENOMEM; } - /* frist create the async poll thread! */ + /* first create the async poll thread! */ test_thrds_data[0].pool = pool; test_thrds_data[0].q = &q; test_thrds_data[0].thread_num = 1; @@ -2640,7 +2640,7 @@ static int sec_aead_async_test(int thd_num, __u64 lcore_mask, return -ENOMEM; } - /* frist create the async poll thread! */ + /* first create the async poll thread! */ test_thrds_data[0].pool = pool; test_thrds_data[0].q = &q; test_thrds_data[0].thread_num = 1; diff --git a/wd_digest.c b/wd_digest.c index dfe709b..4832de8 100644 --- a/wd_digest.c +++ b/wd_digest.c @@ -55,7 +55,7 @@ struct wd_digest_stream_data { /* Total data length for stream mode */ __u64 long_data_len; /* - * Notify the stream message state, zero is frist message, + * Notify the stream message state, zero is first message, * non-zero is middle or final message. */ int msg_state; -- 2.30.0
1 year, 7 months
1
0
0
0
[PATCH 0/4] Support sm3 ce instruction
by Zhiqi Song
1. Support sync sm3 ce instruction. 2. Some cleanup and bugfix. Zhiqi Song (4): uadk: remove redundant header file in makefile uadk/isa-ce: support sm3 ce instruction uadk: fix control range of environmemt variable uadk/util: use default sched_type for instruction task Makefile.am | 20 +- configure.ac | 3 + drv/isa_ce_sm3.c | 401 ++++++++++++++++++++ drv/isa_ce_sm3.h | 86 +++++ drv/isa_ce_sm3_armv8.S | 765 ++++++++++++++++++++++++++++++++++++++ include/drv/arm_arch_ce.h | 199 ++++++++++ include/wd_alg.h | 43 +++ include/wd_sched.h | 2 +- wd_alg.c | 32 +- wd_digest.c | 2 +- wd_sched.c | 2 +- wd_util.c | 92 ++++- 12 files changed, 1619 insertions(+), 28 deletions(-) create mode 100644 drv/isa_ce_sm3.c create mode 100644 drv/isa_ce_sm3.h create mode 100644 drv/isa_ce_sm3_armv8.S create mode 100644 include/drv/arm_arch_ce.h -- 2.33.0
1 year, 7 months
1
4
0
0
[PATCH 0/4] Support sm3 ce instruction
by Zhiqi Song
1. Support sync sm3 ce instruction. 2. Some cleanup and bugfix. Zhiqi Song (4): uadk: remove redundant header file in makefile uadk/isa-ce: support sm3 ce instruction uadk: fix control range of environmemt variable uadk/util: use default sched_type for instruction task Makefile.am | 20 +- configure.ac | 3 + drv/isa_ce_sm3.c | 401 ++++++++++++++++++++ drv/isa_ce_sm3.h | 86 +++++ drv/isa_ce_sm3_armv8.S | 765 ++++++++++++++++++++++++++++++++++++++ include/drv/arm_arch_ce.h | 199 ++++++++++ include/wd_alg.h | 43 +++ include/wd_sched.h | 2 +- wd_alg.c | 32 +- wd_digest.c | 2 +- wd_sched.c | 2 +- wd_util.c | 92 ++++- 12 files changed, 1619 insertions(+), 28 deletions(-) create mode 100644 drv/isa_ce_sm3.c create mode 100644 drv/isa_ce_sm3.h create mode 100644 drv/isa_ce_sm3_armv8.S create mode 100644 include/drv/arm_arch_ce.h -- 2.33.0
1 year, 7 months
1
4
0
0
[PATCH] uadk: sample - add test case for compression
by Yang Shen
1. Support test fork. 2. Support test wd_comp_init2. 3. Support test zlibwrapper. Signed-off-by: Yang Shen <shenyang39(a)huawei.com> --- sample/uadk_comp.c | 364 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 353 insertions(+), 11 deletions(-) diff --git a/sample/uadk_comp.c b/sample/uadk_comp.c index 908c7bcb..1d259f0b 100644 --- a/sample/uadk_comp.c +++ b/sample/uadk_comp.c @@ -7,6 +7,7 @@ #include "wd_alg_common.h" #include "wd_comp.h" #include "wd_sched.h" +#include "wd_zlibwrapper.h" #define SCHED_RR_NAME "sched_rr" @@ -19,6 +20,8 @@ #define required_argument 1 #define optional_argument 2 +#define MAX_THREAD 1024 + struct request_config { char algname[MAX_ALG_LEN]; enum wd_comp_alg_type alg; @@ -42,6 +45,11 @@ struct acc_alg_item { int alg; }; +struct comp_sample_data { + int size; + char data[128]; +}; + static struct request_config config = { .complv = WD_COMP_L8, .optype = WD_DIR_COMPRESS, @@ -52,6 +60,8 @@ static struct request_config config = { static struct request_data data; +static pthread_t threads[MAX_THREAD]; + static struct acc_alg_item alg_options[] = { {"zlib", WD_ZLIB}, {"gzip", WD_GZIP}, @@ -60,6 +70,11 @@ static struct acc_alg_item alg_options[] = { {"", WD_COMP_ALG_MAX} }; +static struct comp_sample_data sample_data = { + .size = 20, + .data = "Welcome to use uadk!", +}; + static void cowfail(char *s) { fprintf(stderr, "" @@ -74,6 +89,313 @@ static void cowfail(char *s) "\n", s); } +static void *initthread(void *data) +{ + int ret; + + ret = wd_comp_init2("zlib", 0, TASK_HW); + if (ret) + fprintf(stderr, "%s: something is wrong, ret = %d!", __func__, ret); + + return NULL; +} + +static int test_fork(void) +{ + int ret; + + pthread_create(&threads[0], NULL, initthread, NULL); + + sleep(2); + ret = fork(); + if (ret == 0) + ret = wd_comp_init2("zlib", 0, TASK_HW); + else + ret = pthread_join(threads[0], NULL); + + wd_comp_uninit2(); + + return ret; +} + +static int test_uadk_init2(void) +{ + struct wd_comp_sess_setup setup[2] = {0}; + struct sched_params param[2] = {0}; + struct wd_comp_req req[2] = {0}; + handle_t h_sess[2]; + void *src, *dst; + int ret; + + ret = wd_comp_init2("zlib", 0, TASK_HW); + if (ret) + return ret; + + setup[0].alg_type = WD_ZLIB; + setup[0].op_type = WD_DIR_COMPRESS; + setup[0].comp_lv = 1; + setup[0].win_sz = 1; + param[0].type = WD_DIR_COMPRESS; + setup[0].sched_param = ¶m[0]; + + h_sess[0] = wd_comp_alloc_sess(&setup[0]); + if (!h_sess[0]) { + fprintf(stderr, "%s fail to alloc comp sess.\n", __func__); + ret = -WD_EINVAL; + goto out_uninit; + } + + setup[1].alg_type = WD_ZLIB; + setup[1].op_type = WD_DIR_DECOMPRESS; + setup[1].comp_lv = 1; + setup[1].win_sz = 1; + param[1].type = WD_DIR_DECOMPRESS; + setup[1].sched_param = ¶m[1]; + h_sess[1] = wd_comp_alloc_sess(&setup[1]); + if (!h_sess[1]) { + fprintf(stderr, "%s fail to alloc decomp sess.\n", __func__); + ret = -WD_EINVAL; + goto out_free_comp_sess; + } + + src = calloc(1, sizeof(char) * 128); + if (!src) { + ret = -WD_ENOMEM; + goto out_free_decomp_sess; + } + + dst = calloc(1, sizeof(char) * 128); + if (!dst) { + ret = -WD_ENOMEM; + goto out_free_src; + } + + req[0].src = sample_data.data; + req[0].src_len = sample_data.size; + req[0].op_type = WD_DIR_COMPRESS; + req[0].dst = dst; + req[0].dst_len = 128; + + ret = wd_do_comp_sync(h_sess[0], &req[0]); + if (ret) + goto out_free_dst; + + req[1].src = dst; + req[1].src_len = req[0].dst_len; + req[1].op_type = WD_DIR_DECOMPRESS; + req[1].dst = src; + req[1].dst_len = 128; + + ret = wd_do_comp_sync(h_sess[1], &req[1]); + + ret = strcmp(sample_data.data, src); + if (ret) + fprintf(stderr, "decompress fail\n"); + else + fprintf(stderr, "good\n"); + +out_free_dst: + free(dst); +out_free_src: + free(src); +out_free_decomp_sess: + wd_comp_free_sess(h_sess[1]); +out_free_comp_sess: + wd_comp_free_sess(h_sess[0]); +out_uninit: + wd_comp_uninit2(); + return ret; +} + +static int test_uadk_zlib_deflate(void *src, int src_len, void *dest, int dst_len) +{ + __u32 chunk = 128 * 1024; + z_stream zstrm = {0}; + int ret, flush, have; + + ret = wd_comp_init2("zlib", 0, TASK_HW); + if (ret) { + fprintf(stderr, "%s fail to init wd_comp.\n", __func__); + return ret; + } + + ret = wd_deflate_init(&zstrm, 0, 15); + if (ret) { + fprintf(stderr, "%s fail to init deflate.\n", __func__); + return ret; + } + + zstrm.next_in = src; + do { + if (src_len > chunk) { + zstrm.avail_in = chunk; + src_len -= chunk; + } else { + zstrm.avail_in = src_len; + src_len = 0; + } + + flush = src_len ? Z_SYNC_FLUSH : Z_FINISH; + + /* + * Run wd_deflate() on input until output buffer not full, + * finish compression if all of source has been read in. + */ + do { + zstrm.avail_out = chunk; + zstrm.next_out = dest; + ret = wd_deflate(&zstrm, flush); + have = chunk - zstrm.avail_out; + dest += have; + } while (zstrm.avail_in > 0); + + /* done when last data in file processed */ + } while (flush != Z_FINISH); + + ret = ret == Z_STREAM_END ? zstrm.total_out : ret; + + (void)wd_deflate_end(&zstrm); + + return ret; +} + +static int test_uadk_zlib_inflate(void *src, int src_len, void *dest, int dst_len) +{ + __u32 chunk = 128 * 1024; + // __u32 chunk = 1024 * 1024 * 2; + z_stream zstrm = {0}; + int ret, have; + + ret = wd_inflate_init(&zstrm, 15); + if (ret) { + fprintf(stderr, "%s fail to init inflate.\n", __func__); + return ret; + } + + zstrm.next_in = src; + do { + if (src_len > chunk) { + zstrm.avail_in = chunk; + src_len -= chunk; + } else { + zstrm.avail_in = src_len; + src_len = 0; + } + /* + * Run wd_deflate() on input until output buffer not full, + * finish compression if all of source has been read in. + */ + do { + zstrm.avail_out = chunk; + zstrm.next_out = dest; + ret = wd_inflate(&zstrm, Z_SYNC_FLUSH); + have = chunk - zstrm.avail_out; + dest += have; + } while (zstrm.avail_in > 0); + + /* done when last data in file processed */ + } while (ret != Z_STREAM_END); + + ret = ret == Z_STREAM_END ? zstrm.total_out : ret; + + (void)wd_inflate_end(&zstrm); + + return ret; +} + +static int test_uadk_zlib(void) +{ + void *src, *dst, *src2; + FILE *source = stdin; + FILE *dest = stdout; + int ret, fd, sz; + struct stat s; + + fd = fileno(source); + ret = fstat(fd, &s); + if (ret < 0) { + fprintf(stderr, "%s fstat error!\n", __func__); + return ret; + } + + src = calloc(1, sizeof(char) * s.st_size); + if (!src) { + fprintf(stderr, "%s calloc error!\n", __func__); + return -WD_ENOMEM; + } + + src2 = calloc(1, sizeof(char) * s.st_size * 2); + if (!src2) { + fprintf(stderr, "%s calloc2 error!\n", __func__); + ret = -WD_ENOMEM; + goto free_src; + } + + dst = calloc(1, sizeof(char) * s.st_size * 2); + if (!dst) { + fprintf(stderr, "%s calloc error!\n", __func__); + ret = -WD_ENOMEM; + goto free_src2; + } + + sz = fread(src, 1, s.st_size, source); + if (sz != s.st_size) { + fprintf(stderr, "%s read file sz != file.size!\n", __func__); + ret = -WD_EINVAL; + goto free_dst; + } + + ret = test_uadk_zlib_deflate(src, sz, dst, sz * 2); + if (ret < 0) { + fprintf(stderr, "%s do deflate fail ret %d\n", __func__, ret); + goto free_dst; + } + + ret = fwrite(dst, 1, ret, dest); + if (ret < 0) + fprintf(stderr, "%s file write fail ret %d\n", __func__, ret); + + ret = test_uadk_zlib_inflate(dst, ret, src2, sz * 2); + if (ret < 0) { + fprintf(stderr, "%s do inflate fail ret %d\n", __func__, ret); + goto free_dst; + } + + ret = memcmp(src, src2, sz); + if (!ret) + fprintf(stderr, "%s good!\n", __func__); + +free_dst: + free(dst); +free_src2: + free(src2); +free_src: + free(src); + return ret; +} + +static int test_func(int test_mode) +{ + int ret; + + switch (test_mode) { + case 0: + ret = test_fork(); + break; + case 1: + ret = test_uadk_init2(); + break; + case 2: + ret = test_uadk_zlib(); + break; + default: + ret = -WD_EINVAL; + break; + } + + return ret; +} + static struct uacce_dev_list* get_dev_list(char *alg_name) { struct uacce_dev_list *list, *p, *head = NULL, *prev = NULL; @@ -147,7 +469,7 @@ static struct wd_sched *uadk_comp_sched_init(void) sched = wd_sched_rr_alloc(SCHED_POLICY_RR, 2, 2, lib_poll_func); if (!sched) { - printf("%s fail to alloc sched.\n", __func__); + fprintf(stderr, "%s fail to alloc sched.\n", __func__); return NULL; } sched->name = SCHED_RR_NAME; @@ -388,31 +710,31 @@ static int operation(FILE *source, FILE *dest) ret = uadk_comp_ctx_init(); if (ret) { - fprintf(stderr, "%s fail to init ctx!\n", __func__); + fprintf(stderr, "%s fail to init ctx! %d\n", __func__, ret); return ret; } ret = uadk_comp_sess_init(); if (ret) { - fprintf(stderr, "%s fail to init sess!\n", __func__); + fprintf(stderr, "%s fail to init sess! %d\n", __func__, ret); goto out_ctx_uninit; } ret = uadk_comp_request_init(source); if (ret) { - fprintf(stderr, "%s fail to init request!\n", __func__); + fprintf(stderr, "%s fail to init request! %d\n", __func__, ret); goto out_sess_uninit; } ret = uadk_do_comp(); if (ret) { - fprintf(stderr, "%s fail to do request!\n", __func__); + fprintf(stderr, "%s fail to do request! %d\n", __func__, ret); goto out_sess_uninit; } ret = uadk_comp_write_file(dest); if (ret) - fprintf(stderr, "%s fail to write result!\n", __func__); + fprintf(stderr, "%s fail to write result! %d\n", __func__, ret); uadk_comp_request_uninit(); @@ -445,9 +767,10 @@ static void print_help(void) int main(int argc, char *argv[]) { + int ret, c, test_mode; int option_index = 0; int help = 0; - int ret, c; + int test = 0; static struct option long_options[] = { {"help", no_argument, 0, 0}, @@ -455,6 +778,9 @@ int main(int argc, char *argv[]) {"complv", required_argument, 0, 2}, {"optype", required_argument, 0, 3}, {"winsize", required_argument, 0, 4}, + {"fork", no_argument, 0, 5}, + {"init2", no_argument, 0, 6}, + {"zlib", no_argument, 0, 7}, {0, 0, 0, 0} }; @@ -470,8 +796,8 @@ int main(int argc, char *argv[]) case 1: config.list = get_dev_list(optarg); if (!config.list) { - cowfail("Can't find your algorithm!\n"); help = 1; + cowfail("Can't find your algorithm!\n"); } else { strcpy(config.algname, optarg); } @@ -485,9 +811,21 @@ int main(int argc, char *argv[]) case 4: config.winsize = strtol(optarg, NULL, 0); break; + case 5: + test = 1; + test_mode = 0; + break; + case 6: + test = 1; + test_mode = 1; + break; + case 7: + test = 1; + test_mode = 2; + break; default: help = 1; - cowfail("bad input test parameter!\n"); + cowfail("Bad input test parameter!\n"); break; } } @@ -497,9 +835,13 @@ int main(int argc, char *argv[]) exit(-1); } - ret = operation(stdin, stdout); + if (test == 1) + ret = test_func(test_mode); + else + ret = operation(stdin, stdout); + if (ret) - cowfail("So sad for we do something wrong!\n"); + cowfail("So sad for someting wrong!\n"); return ret; } -- 2.33.0
1 year, 7 months
1
0
0
0
[PATCH 2/8] uadk/isa-ce: support sm3 ce instruction
by Zhiqi Song
Support sync sm3 ce instruction, users can use ce instruction to accelerate sm3 sync task through init2 related functions. This patch also includes: 1. Add compile parameter and related file to support isa-ce library. 2. Check whether the platform supports the CE instruction in alg driver register process. 3. Make HW driver and INSTR driver of the same alg can be requested at the same time. Signed-off-by: Zhiqi Song <songzhiqi1(a)huawei.com> --- Makefile.am | 18 +- configure.ac | 3 + drv/isa_ce_sm3.c | 249 +++++++++++++ drv/isa_ce_sm3_armv8.S | 765 ++++++++++++++++++++++++++++++++++++++ include/drv/arm_arch_ce.h | 199 ++++++++++ include/drv/isa_ce_sm3.h | 66 ++++ include/wd_alg.h | 43 +++ wd_alg.c | 32 +- wd_digest.c | 2 +- wd_sched.c | 2 +- wd_util.c | 87 ++++- 11 files changed, 1446 insertions(+), 20 deletions(-) create mode 100644 drv/isa_ce_sm3.c create mode 100644 drv/isa_ce_sm3_armv8.S create mode 100644 include/drv/arm_arch_ce.h create mode 100644 include/drv/isa_ce_sm3.h diff --git a/Makefile.am b/Makefile.am index 25853eb..b267e9e 100644 --- a/Makefile.am +++ b/Makefile.am @@ -43,7 +43,8 @@ nobase_pkginclude_HEADERS = v1/wd.h v1/wd_cipher.h v1/wd_aead.h v1/uacce.h v1/wd lib_LTLIBRARIES=libwd.la libwd_comp.la libwd_crypto.la uadk_driversdir=$(libdir)/uadk -uadk_drivers_LTLIBRARIES=libhisi_sec.la libhisi_hpre.la libhisi_zip.la +uadk_drivers_LTLIBRARIES=libhisi_sec.la libhisi_hpre.la libhisi_zip.la \ + libisa_ce.la libwd_la_SOURCES=wd.c wd_mempool.c wd.h wd_alg.c wd_alg.h \ v1/wd.c v1/wd.h v1/wd_adapter.c v1/wd_adapter.h \ @@ -79,7 +80,8 @@ libwd_crypto_la_SOURCES=wd_cipher.c wd_cipher.h wd_cipher_drv.h \ wd_digest.c wd_digest.h wd_digest_drv.h \ wd_util.c wd_util.h \ wd_sched.c wd_sched.h \ - wd.c wd.h + wd.c wd.h \ + arm_arch_ce.h isa_ce_sm3.h libhisi_sec_la_SOURCES=drv/hisi_sec.c drv/hisi_qm_udrv.c \ lib/crypto/aes.c lib/crypto/galois.c \ @@ -87,6 +89,10 @@ libhisi_sec_la_SOURCES=drv/hisi_sec.c drv/hisi_qm_udrv.c \ libhisi_hpre_la_SOURCES=drv/hisi_hpre.c drv/hisi_qm_udrv.c \ hisi_qm_udrv.h + +libisa_ce_la_SOURCES=drv/isa_ce_sm3.c drv/isa_ce_sm3_armv8.S arm_arch_ce.h \ + drv/isa_ce_sm3.h + if WD_STATIC_DRV AM_CFLAGS += -DWD_STATIC_DRV -fPIC AM_CFLAGS += -DWD_NO_LOG @@ -106,6 +112,10 @@ libhisi_sec_la_DEPENDENCIES = libwd.la libwd_crypto.la libhisi_hpre_la_LIBADD = $(libwd_la_OBJECTS) $(libwd_crypto_la_OBJECTS) libhisi_hpre_la_DEPENDENCIES = libwd.la libwd_crypto.la + +libisa_ce_la_LIBADD = $(libwd_la_OBJECTS) $(libwd_crypto_la_OBJECTS) +libisa_ce_la_DEPENDENCIES = libwd.la libwd_crypto.la + else UADK_WD_SYMBOL= -Wl,--version-script,$(top_srcdir)/libwd.map UADK_CRYPTO_SYMBOL= -Wl,--version-script,$(top_srcdir)/libwd_crypto.map @@ -134,6 +144,10 @@ libhisi_sec_la_DEPENDENCIES= libwd.la libwd_crypto.la libhisi_hpre_la_LIBADD= -lwd -lwd_crypto libhisi_hpre_la_LDFLAGS=$(UADK_VERSION) libhisi_hpre_la_DEPENDENCIES= libwd.la libwd_crypto.la + +libisa_ce_la_LIBADD= -lwd -lwd_crypto +libisa_ce_la_LDFLAGS=$(UADK_VERSION) +libisa_ce_la_DEPENDENCIES= libwd.la libwd_crypto.la endif # WD_STATIC_DRV pkgconfigdir = $(libdir)/pkgconfig diff --git a/configure.ac b/configure.ac index b198417..4ed111e 100644 --- a/configure.ac +++ b/configure.ac @@ -21,6 +21,9 @@ LT_INIT AC_SUBST([hardcode_into_libs], [no]) AM_PROG_CC_C_O +# Support assembler +AM_PROG_AS + AC_ARG_ENABLE([debug-log], AS_HELP_STRING([--enable-debug-log], [enable debug logging globally]), [ AS_IF([test "x$enable_debug_log" = "xyes"], diff --git a/drv/isa_ce_sm3.c b/drv/isa_ce_sm3.c new file mode 100644 index 0000000..d562730 --- /dev/null +++ b/drv/isa_ce_sm3.c @@ -0,0 +1,249 @@ +// SPDX-License-Identifier: Apache-2.0 +/* + * Copyright 2011-2022 The OpenSSL Project Authors. All Rights Reserved. + * + * Licensed under the Apache License 2.0 (the "License"). You may not use + * this file except in compliance with the License. You can obtain a copy + * in the file LICENSE in the source distribution or at + *
https://www.openssl.org/source/license.html
+ */ +/* + * Copyright 2023 Huawei Technologies Co.,Ltd. All rights reserved. + */ + +#include <stdlib.h> +#include <sys/auxv.h> +#include <pthread.h> +#include "drv/wd_digest_drv.h" +#include "drv/isa_ce_sm3.h" +#include "wd_digest.h" +#include "wd_util.h" + +typedef void (sm3_ce_block_fn)(__u32 word_reg[SM3_STATE_WORDS], + const unsigned char *src, size_t blocks); + +static int sm3_ce_drv_init(void *conf, void *priv); +static void sm3_ce_drv_exit(void *priv); +static int sm3_ce_drv_send(handle_t ctx, void *digest_msg); +static int sm3_ce_drv_recv(handle_t ctx, void *digest_msg); +static int sm3_ce_get_usage(void *param); + +static struct wd_alg_driver sm3_ce_alg_driver = { + .drv_name = "isa_ce_sm3", + .alg_name = "sm3", + .calc_type = UADK_ALG_CE_INSTR, + .priority = 200, + .priv_size = sizeof(struct sm3_ce_drv_ctx), + .queue_num = 1, + .op_type_num = 1, + .fallback = 0, + .init = sm3_ce_drv_init, + .exit = sm3_ce_drv_exit, + .send = sm3_ce_drv_send, + .recv = sm3_ce_drv_recv, + .get_usage = sm3_ce_get_usage, +}; + +static void __attribute__((constructor)) sm3_ce_probe(void) +{ + int ret; + + WD_INFO("Info: register SM3 CE alg driver!\n"); + ret = wd_alg_driver_register(&sm3_ce_alg_driver); + if (ret && ret != WD_ENODEV) + WD_ERR("Error: register SM3 CE failed!\n"); +} + +static void __attribute__((destructor)) sm3_ce_remove(void) +{ + wd_alg_driver_unregister(&sm3_ce_alg_driver); +} + +static int sm3_ce_get_usage(void *param) +{ + return 0; +} + +static inline void sm3_ce_init(struct sm3_ce_ctx *sctx) +{ + memset(sctx, 0, sizeof(*sctx)); + + sctx->word_reg[0] = SM3_IVA; + sctx->word_reg[1] = SM3_IVB; + sctx->word_reg[2] = SM3_IVC; + sctx->word_reg[3] = SM3_IVD; + sctx->word_reg[4] = SM3_IVE; + sctx->word_reg[5] = SM3_IVF; + sctx->word_reg[6] = SM3_IVG; + sctx->word_reg[7] = SM3_IVH; +} + +static void sm3_ce_update(struct sm3_ce_ctx *sctx, const void *data, + size_t data_len, sm3_ce_block_fn *block_fn) +{ + size_t remain_data_len, blk_num; + + /* Get the data num that need compute currently */ + if (sctx->num) { + remain_data_len = SM3_BLOCK_SIZE - sctx->num; + /* If data_len does not enough a block size, then leave it to final */ + if (data_len < remain_data_len) { + memcpy(sctx->block + sctx->num, data, data_len); + sctx->num += data_len; + return; + } + + memcpy(sctx->block + sctx->num, data, remain_data_len); + block_fn(sctx->word_reg, sctx->block, 1); + sctx->nblocks++; + data += remain_data_len; + data_len -= remain_data_len; + } + + /* Group the filled msg by 512-bits (64-bytes) */ + blk_num = data_len / SM3_BLOCK_SIZE; + if (blk_num) { + block_fn(sctx->word_reg, data, blk_num); + sctx->nblocks += blk_num; + data += SM3_BLOCK_SIZE * blk_num; + data_len -= SM3_BLOCK_SIZE * blk_num; + } + + sctx->num = data_len; + if (data_len) + memcpy(sctx->block, data, data_len); +} + +static void sm3_ce_final(struct sm3_ce_ctx *sctx, __u8 *md, + sm3_ce_block_fn *block_fn) +{ + int i; + + /* Add padding */ + sctx->block[sctx->num] = 0x80; + + if (sctx->num <= SM3_BLOCK_SIZE - 9) { + memset(sctx->block + sctx->num + 1, 0, SM3_BLOCK_SIZE - sctx->num - 9); + } else { + memset(sctx->block + sctx->num + 1, 0, SM3_BLOCK_SIZE - sctx->num - 1); + block_fn(sctx->word_reg, sctx->block, 1); + memset(sctx->block, 0, SM3_BLOCK_SIZE - 8); + } + + /* + * Put the length of the message in bits into the last two words, to get + * the length in bits we need to multiply by 8 (or left shift 3). This left shifted + * value is put in the last word. Any bits shifted off the left edge need to be put in the + * penultimate word, we can work out which bits by shifting right the length by 29 bits. + */ + PUTU32(sctx->block + 56, sctx->nblocks >> 23); + PUTU32(sctx->block + 60, (sctx->nblocks << 9) + (sctx->num << 3)); + + block_fn(sctx->word_reg, sctx->block, 1); + for (i = 0; i < 8; i++) + PUTU32(md + i * 4, sctx->word_reg[i]); +} + +static int do_sm3_ce(const __u8 *data, size_t len, __u8 *out_digest) +{ + struct sm3_ce_ctx sctx = {0}; + int ret = 0; + + sm3_ce_init(&sctx); + sm3_ce_update(&sctx, data, len, sm3_ce_block_compress); + sm3_ce_final(&sctx, out_digest, sm3_ce_block_compress); + + if (!out_digest) { + WD_ERR("failed to get digest!\n"); + ret = -WD_EINVAL; + } + + memset(&sctx, 0, sizeof(struct sm3_ce_ctx)); + return ret; +} + +static int do_hmac_sm3_ce(const __u8 *key, size_t key_len, + const __u8 *data, size_t data_len, + __u8 *out_hmac) +{ + unsigned char key_buf[HMAC_BLOCK_SIZE] = {0}; + unsigned char ipad[HMAC_BLOCK_SIZE] = {0}; + unsigned char opad[HMAC_BLOCK_SIZE] = {0}; + unsigned char hash[SM3_DIGEST_SIZE] = {0}; + struct sm3_ce_ctx sctx = {0}; + unsigned int i; + + if (!key_len) { + WD_ERR("invalid hmac key_len!\n"); + return -WD_EINVAL; + } + + if (key_len > HMAC_BLOCK_SIZE) { + do_sm3_ce(key, key_len, key_buf); + key_len = SM3_DIGEST_SIZE; + key = key_buf; + } + + memset(ipad, 0x36, HMAC_BLOCK_SIZE); + memset(opad, 0x5c, HMAC_BLOCK_SIZE); + for (i = 0; i < key_len; i++) { + ipad[i] ^= key[i]; + opad[i] ^= key[i]; + } + + sm3_ce_init(&sctx); + sm3_ce_update(&sctx, ipad, HMAC_BLOCK_SIZE, sm3_ce_block_compress); + sm3_ce_update(&sctx, data, data_len, sm3_ce_block_compress); + sm3_ce_final(&sctx, hash, sm3_ce_block_compress); + + sm3_ce_init(&sctx); + sm3_ce_update(&sctx, opad, HMAC_BLOCK_SIZE, sm3_ce_block_compress); + sm3_ce_update(&sctx, hash, SM3_DIGEST_SIZE, sm3_ce_block_compress); + sm3_ce_final(&sctx, out_hmac, sm3_ce_block_compress); + + return WD_SUCCESS; +} + +static int sm3_ce_drv_send(handle_t ctx, void *digest_msg) +{ + struct wd_digest_msg *msg = (struct wd_digest_msg *)digest_msg; + __u8 *out_digest, *data, *key; + size_t data_size, key_len; + int ret; + + if (!msg) { + WD_ERR("invalid: digest_msg is NULL!\n"); + return -WD_EINVAL; + } + + data_size = msg->in_bytes; + out_digest = msg->out; + data = msg->in; + key = msg->key; + key_len = msg->key_bytes; + + if (msg->mode == WD_DIGEST_NORMAL) { + ret = do_sm3_ce(data, data_size, out_digest); + } else if (msg->mode == WD_DIGEST_HMAC) { + ret = do_hmac_sm3_ce(key, key_len, data, data_size, out_digest); + } else { + WD_ERR("invalid digest mode!\n"); + ret = -WD_EINVAL; + } + + return ret; +} + +static int sm3_ce_drv_recv(handle_t ctx, void *digest_msg) +{ + return WD_SUCCESS; +} + +static int sm3_ce_drv_init(void *conf, void *priv) +{ + return WD_SUCCESS; +} + +static void sm3_ce_drv_exit(void *priv) +{ +} diff --git a/drv/isa_ce_sm3_armv8.S b/drv/isa_ce_sm3_armv8.S new file mode 100644 index 0000000..3d08e2d --- /dev/null +++ b/drv/isa_ce_sm3_armv8.S @@ -0,0 +1,765 @@ +/* SPDX-License-Identifier: Apache-2.0 */ +/* + * Copyright 2011-2022 The OpenSSL Project Authors. All Rights Reserved. + * + * Licensed under the Apache License 2.0 (the "License"). You may not use + * this file except in compliance with the License. You can obtain a copy + * in the file LICENSE in the source distribution or at + *
https://www.openssl.org/source/license.html
+ */ + +#include "../include/drv/arm_arch_ce.h" + +.arch armv8.2-a +.text +.globl sm3_ce_block_compress +.type sm3_ce_block_compress,%function +.align 5 +sm3_ce_block_compress: + AARCH64_VALID_CALL_TARGET +/* Loads state */ + /* + * Loads multiple single-element structures from memory(X0 register) and + * writes result to two SIMD&FP registers(v5.4s and v6.4s). + */ + ld1 {v5.4s,v6.4s}, [x0] /* 4s -- 4 * 32bit */ + /* + * Reverses the order of 32-bit(type:s) elements in each doubleword of the + * vector in the src SIMD&FP register(v5), places the result into a vector + * and writes the vector to the dst SIDM&FP register(v5). + */ + rev64 v5.4s, v5.4s + rev64 v6.4s, v6.4s + /* + * Extracts the lowest vector elements from the second src SIMD&FP register, + * and highest vector elements from the first source SIMD&FP register, + * concatenates the result into a vector, and writes the vector to the + * dst SIMD&FP register vector. #8 means the numbered byte element to be extracted. + * Format: ext <dst register>, <first src register>, <second src register>, <index> + * #imm: immediate data. + */ + ext v5.16b, v5.16b, v5.16b, #8 /* 16b -- 16 * 8bit */ + ext v6.16b, v6.16b, v6.16b, #8 + /* From PC-relative address adds an immediate value to form a PC-relative + * address, and writes the result to the dst register. + */ + adr x8, .Tj /* 'Tj' is the constant defined in SM3 protocol */ + /* Loads pair of register calculates an address from a base register value + * and an immediate offset, loads two 32-bit words from memory, and writes + * them to two registers. */ + ldp s16, s17, [x8] /* 'sn' is the scalar register, 'vn' is the vector register */ + +.Loop: +/* Loads input */ + /* + * Loads multipule single-element structrue to four registers. + * #64 is the immediate offset variant, it is the post-index immediate offset. + * Loads the input src data, msg to be hashed. + */ + ld1 {v0.16b,v1.16b,v2.16b,v3.16b}, [x1], #64 + /* + * Substracts an optionally-shifted immediate value from a register value, + * and writes the result to the dst register. + */ + sub w2, w2, #1 + + /* Copies the value in a src register to the dst register. */ + mov v18.16b, v5.16b + mov v19.16b, v6.16b + +#ifndef __ARMEB__ + rev32 v0.16b, v0.16b + rev32 v1.16b, v1.16b + rev32 v2.16b, v2.16b + rev32 v3.16b, v3.16b +#endif + + ext v20.16b, v16.16b, v16.16b, #4 + /* s4 = w7 | w8 | w9 | w10 */ + ext v4.16b, v1.16b, v2.16b, #12 + /* vtmp1 = w3 | w4 | w5 | w6 */ + ext v22.16b, v0.16b, v1.16b, #12 + /* vtmp2 = w10 | w11 | w12 | w13 */ + ext v23.16b, v2.16b, v3.16b, #8 + /* sm3partw1 v4.4s, v0.4s, v3.4s */ +.inst 0xce63c004 + /* sm3partw2 v4.4s, v23.4s, v22.4s */ +.inst 0xce76c6e4 + eor v22.16b, v0.16b, v1.16b + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1a v5.4s, v23.4s, v22.4s[0] */ +.inst 0xce5682e5 + /* sm3tt2a v6.4s, v23.4s, v0.4s[0] */ +.inst 0xce408ae6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1a v5.4s, v23.4s, v22.4s[1] */ +.inst 0xce5692e5 + /* sm3tt2a v6.4s, v23.4s, v0.4s[1] */ +.inst 0xce409ae6 + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1a v5.4s, v23.4s, v22.4s[2] */ +.inst 0xce56a2e5 + /* sm3tt2a v6.4s, v23.4s, v0.4s[2] */ +.inst 0xce40aae6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1a v5.4s, v23.4s, v22.4s[3] */ +.inst 0xce56b2e5 + /* sm3tt2a v6.4s, v23.4s, v0.4s[3] */ +.inst 0xce40bae6 + /* s4 = w7 | w8 | w9 | w10 */ + ext v0.16b, v2.16b, v3.16b, #12 + /* vtmp1 = w3 | w4 | w5 | w6 */ + ext v22.16b, v1.16b, v2.16b, #12 + /* vtmp2 = w10 | w11 | w12 | w13 */ + ext v23.16b, v3.16b, v4.16b, #8 + /* sm3partw1 v0.4s, v1.4s, v4.4s */ +.inst 0xce64c020 + /* sm3partw2 v0.4s, v23.4s, v22.4s */ +.inst 0xce76c6e0 + eor v22.16b, v1.16b, v2.16b + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1a v5.4s, v23.4s, v22.4s[0] */ +.inst 0xce5682e5 + /* sm3tt2a v6.4s, v23.4s, v1.4s[0] */ +.inst 0xce418ae6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1a v5.4s, v23.4s, v22.4s[1] */ +.inst 0xce5692e5 + /* sm3tt2a v6.4s, v23.4s, v1.4s[1] */ +.inst 0xce419ae6 + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1a v5.4s, v23.4s, v22.4s[2] */ +.inst 0xce56a2e5 + /* sm3tt2a v6.4s, v23.4s, v1.4s[2] */ +.inst 0xce41aae6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1a v5.4s, v23.4s, v22.4s[3] */ +.inst 0xce56b2e5 + /* sm3tt2a v6.4s, v23.4s, v1.4s[3] */ +.inst 0xce41bae6 + /* s4 = w7 | w8 | w9 | w10 */ + ext v1.16b, v3.16b, v4.16b, #12 + /* vtmp1 = w3 | w4 | w5 | w6 */ + ext v22.16b, v2.16b, v3.16b, #12 + /* vtmp2 = w10 | w11 | w12 | w13 */ + ext v23.16b, v4.16b, v0.16b, #8 + /* sm3partw1 v1.4s, v2.4s, v0.4s */ +.inst 0xce60c041 + /* sm3partw2 v1.4s, v23.4s, v22.4s */ +.inst 0xce76c6e1 + eor v22.16b, v2.16b, v3.16b + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1a v5.4s, v23.4s, v22.4s[0] */ +.inst 0xce5682e5 + /* sm3tt2a v6.4s, v23.4s, v2.4s[0] */ +.inst 0xce428ae6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1a v5.4s, v23.4s, v22.4s[1] */ +.inst 0xce5692e5 + /* sm3tt2a v6.4s, v23.4s, v2.4s[1] */ +.inst 0xce429ae6 + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1a v5.4s, v23.4s, v22.4s[2] */ +.inst 0xce56a2e5 + /* sm3tt2a v6.4s, v23.4s, v2.4s[2] */ +.inst 0xce42aae6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1a v5.4s, v23.4s, v22.4s[3] */ +.inst 0xce56b2e5 + /* sm3tt2a v6.4s, v23.4s, v2.4s[3] */ +.inst 0xce42bae6 + /* s4 = w7 | w8 | w9 | w10 */ + ext v2.16b, v4.16b, v0.16b, #12 + /* vtmp1 = w3 | w4 | w5 | w6 */ + ext v22.16b, v3.16b, v4.16b, #12 + /* vtmp2 = w10 | w11 | w12 | w13 */ + ext v23.16b, v0.16b, v1.16b, #8 + /* sm3partw1 v2.4s, v3.4s, v1.4s */ +.inst 0xce61c062 + /* sm3partw2 v2.4s, v23.4s, v22.4s */ +.inst 0xce76c6e2 + eor v22.16b, v3.16b, v4.16b + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1a v5.4s, v23.4s, v22.4s[0] */ +.inst 0xce5682e5 + /* sm3tt2a v6.4s, v23.4s, v3.4s[0] */ +.inst 0xce438ae6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1a v5.4s, v23.4s, v22.4s[1] */ +.inst 0xce5692e5 + /* sm3tt2a v6.4s, v23.4s, v3.4s[1] */ +.inst 0xce439ae6 + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1a v5.4s, v23.4s, v22.4s[2] */ +.inst 0xce56a2e5 + /* sm3tt2a v6.4s, v23.4s, v3.4s[2] */ +.inst 0xce43aae6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1a v5.4s, v23.4s, v22.4s[3] */ +.inst 0xce56b2e5 + /* sm3tt2a v6.4s, v23.4s, v3.4s[3] */ +.inst 0xce43bae6 + ext v20.16b, v17.16b, v17.16b, #4 + /* s4 = w7 | w8 | w9 | w10 */ + ext v3.16b, v0.16b, v1.16b, #12 + /* vtmp1 = w3 | w4 | w5 | w6 */ + ext v22.16b, v4.16b, v0.16b, #12 + /* vtmp2 = w10 | w11 | w12 | w13 */ + ext v23.16b, v1.16b, v2.16b, #8 + /* sm3partw1 v3.4s, v4.4s, v2.4s */ +.inst 0xce62c083 + /* sm3partw2 v3.4s, v23.4s, v22.4s */ +.inst 0xce76c6e3 + eor v22.16b, v4.16b, v0.16b + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[0] */ +.inst 0xce5686e5 + /* sm3tt2b v6.4s, v23.4s, v4.4s[0] */ +.inst 0xce448ee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[1] */ +.inst 0xce5696e5 + /* sm3tt2b v6.4s, v23.4s, v4.4s[1] */ +.inst 0xce449ee6 + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[2] */ +.inst 0xce56a6e5 + /* sm3tt2b v6.4s, v23.4s, v4.4s[2] */ +.inst 0xce44aee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[3] */ +.inst 0xce56b6e5 + /* sm3tt2b v6.4s, v23.4s, v4.4s[3] */ +.inst 0xce44bee6 + /* s4 = w7 | w8 | w9 | w10 */ + ext v4.16b, v1.16b, v2.16b, #12 + /* vtmp1 = w3 | w4 | w5 | w6 */ + ext v22.16b, v0.16b, v1.16b, #12 + /* vtmp2 = w10 | w11 | w12 | w13 */ + ext v23.16b, v2.16b, v3.16b, #8 + /* sm3partw1 v4.4s, v0.4s, v3.4s */ +.inst 0xce63c004 + /* sm3partw2 v4.4s, v23.4s, v22.4s */ +.inst 0xce76c6e4 + eor v22.16b, v0.16b, v1.16b + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[0] */ +.inst 0xce5686e5 + /* sm3tt2b v6.4s, v23.4s, v0.4s[0] */ +.inst 0xce408ee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[1] */ +.inst 0xce5696e5 + /* sm3tt2b v6.4s, v23.4s, v0.4s[1] */ +.inst 0xce409ee6 + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[2] */ +.inst 0xce56a6e5 + /* sm3tt2b v6.4s, v23.4s, v0.4s[2] */ +.inst 0xce40aee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[3] */ +.inst 0xce56b6e5 + /* sm3tt2b v6.4s, v23.4s, v0.4s[3] */ +.inst 0xce40bee6 + /* s4 = w7 | w8 | w9 | w10 */ + ext v0.16b, v2.16b, v3.16b, #12 + /* vtmp1 = w3 | w4 | w5 | w6 */ + ext v22.16b, v1.16b, v2.16b, #12 + /* vtmp2 = w10 | w11 | w12 | w13 */ + ext v23.16b, v3.16b, v4.16b, #8 + /* sm3partw1 v0.4s, v1.4s, v4.4s */ +.inst 0xce64c020 + /* sm3partw2 v0.4s, v23.4s, v22.4s */ +.inst 0xce76c6e0 + eor v22.16b, v1.16b, v2.16b + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[0] */ +.inst 0xce5686e5 + /* sm3tt2b v6.4s, v23.4s, v1.4s[0] */ +.inst 0xce418ee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[1] */ +.inst 0xce5696e5 + /* sm3tt2b v6.4s, v23.4s, v1.4s[1] */ +.inst 0xce419ee6 + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[2] */ +.inst 0xce56a6e5 + /* sm3tt2b v6.4s, v23.4s, v1.4s[2] */ +.inst 0xce41aee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[3] */ +.inst 0xce56b6e5 + /* sm3tt2b v6.4s, v23.4s, v1.4s[3] */ +.inst 0xce41bee6 + /* s4 = w7 | w8 | w9 | w10 */ + ext v1.16b, v3.16b, v4.16b, #12 + /* vtmp1 = w3 | w4 | w5 | w6 */ + ext v22.16b, v2.16b, v3.16b, #12 + /* vtmp2 = w10 | w11 | w12 | w13 */ + ext v23.16b, v4.16b, v0.16b, #8 + /* sm3partw1 v1.4s, v2.4s, v0.4s */ +.inst 0xce60c041 + /* sm3partw2 v1.4s, v23.4s, v22.4s */ +.inst 0xce76c6e1 + eor v22.16b, v2.16b, v3.16b + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[0] */ +.inst 0xce5686e5 + /* sm3tt2b v6.4s, v23.4s, v2.4s[0] */ +.inst 0xce428ee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[1] */ +.inst 0xce5696e5 + /* sm3tt2b v6.4s, v23.4s, v2.4s[1] */ +.inst 0xce429ee6 + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[2] */ +.inst 0xce56a6e5 + /* sm3tt2b v6.4s, v23.4s, v2.4s[2] */ +.inst 0xce42aee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[3] */ +.inst 0xce56b6e5 + /* sm3tt2b v6.4s, v23.4s, v2.4s[3] */ +.inst 0xce42bee6 + /* s4 = w7 | w8 | w9 | w10 */ + ext v2.16b, v4.16b, v0.16b, #12 + /* vtmp1 = w3 | w4 | w5 | w6 */ + ext v22.16b, v3.16b, v4.16b, #12 + /* vtmp2 = w10 | w11 | w12 | w13 */ + ext v23.16b, v0.16b, v1.16b, #8 + /* sm3partw1 v2.4s, v3.4s, v1.4s */ +.inst 0xce61c062 + /* sm3partw2 v2.4s, v23.4s, v22.4s */ +.inst 0xce76c6e2 + eor v22.16b, v3.16b, v4.16b + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[0] */ +.inst 0xce5686e5 + /* sm3tt2b v6.4s, v23.4s, v3.4s[0] */ +.inst 0xce438ee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[1] */ +.inst 0xce5696e5 + /* sm3tt2b v6.4s, v23.4s, v3.4s[1] */ +.inst 0xce439ee6 + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[2] */ +.inst 0xce56a6e5 + /* sm3tt2b v6.4s, v23.4s, v3.4s[2] */ +.inst 0xce43aee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[3] */ +.inst 0xce56b6e5 + /* sm3tt2b v6.4s, v23.4s, v3.4s[3] */ +.inst 0xce43bee6 + /* s4 = w7 | w8 | w9 | w10 */ + ext v3.16b, v0.16b, v1.16b, #12 + /* vtmp1 = w3 | w4 | w5 | w6 */ + ext v22.16b, v4.16b, v0.16b, #12 + /* vtmp2 = w10 | w11 | w12 | w13 */ + ext v23.16b, v1.16b, v2.16b, #8 + /* sm3partw1 v3.4s, v4.4s, v2.4s */ +.inst 0xce62c083 + /* sm3partw2 v3.4s, v23.4s, v22.4s */ +.inst 0xce76c6e3 + eor v22.16b, v4.16b, v0.16b + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[0] */ +.inst 0xce5686e5 + /* sm3tt2b v6.4s, v23.4s, v4.4s[0] */ +.inst 0xce448ee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[1] */ +.inst 0xce5696e5 + /* sm3tt2b v6.4s, v23.4s, v4.4s[1] */ +.inst 0xce449ee6 + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[2] */ +.inst 0xce56a6e5 + /* sm3tt2b v6.4s, v23.4s, v4.4s[2] */ +.inst 0xce44aee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[3] */ +.inst 0xce56b6e5 + /* sm3tt2b v6.4s, v23.4s, v4.4s[3] */ +.inst 0xce44bee6 + /* s4 = w7 | w8 | w9 | w10 */ + ext v4.16b, v1.16b, v2.16b, #12 + /* vtmp1 = w3 | w4 | w5 | w6 */ + ext v22.16b, v0.16b, v1.16b, #12 + /* vtmp2 = w10 | w11 | w12 | w13 */ + ext v23.16b, v2.16b, v3.16b, #8 + /* sm3partw1 v4.4s, v0.4s, v3.4s */ +.inst 0xce63c004 + /* sm3partw2 v4.4s, v23.4s, v22.4s */ +.inst 0xce76c6e4 + eor v22.16b, v0.16b, v1.16b + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[0] */ +.inst 0xce5686e5 + /* sm3tt2b v6.4s, v23.4s, v0.4s[0] */ +.inst 0xce408ee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[1] */ +.inst 0xce5696e5 + /* sm3tt2b v6.4s, v23.4s, v0.4s[1] */ +.inst 0xce409ee6 + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[2] */ +.inst 0xce56a6e5 + /* sm3tt2b v6.4s, v23.4s, v0.4s[2] */ +.inst 0xce40aee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[3] */ +.inst 0xce56b6e5 + /* sm3tt2b v6.4s, v23.4s, v0.4s[3] */ +.inst 0xce40bee6 + /* s4 = w7 | w8 | w9 | w10 */ + ext v0.16b, v2.16b, v3.16b, #12 + /* vtmp1 = w3 | w4 | w5 | w6 */ + ext v22.16b, v1.16b, v2.16b, #12 + /* vtmp2 = w10 | w11 | w12 | w13 */ + ext v23.16b, v3.16b, v4.16b, #8 + /* sm3partw1 v0.4s, v1.4s, v4.4s */ +.inst 0xce64c020 + /* sm3partw2 v0.4s, v23.4s, v22.4s */ +.inst 0xce76c6e0 + eor v22.16b, v1.16b, v2.16b + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[0] */ +.inst 0xce5686e5 + /* sm3tt2b v6.4s, v23.4s, v1.4s[0] */ +.inst 0xce418ee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[1] */ +.inst 0xce5696e5 + /* sm3tt2b v6.4s, v23.4s, v1.4s[1] */ +.inst 0xce419ee6 + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[2] */ +.inst 0xce56a6e5 + /* sm3tt2b v6.4s, v23.4s, v1.4s[2] */ +.inst 0xce41aee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[3] */ +.inst 0xce56b6e5 + /* sm3tt2b v6.4s, v23.4s, v1.4s[3] */ +.inst 0xce41bee6 + /* s4 = w7 | w8 | w9 | w10 */ + ext v1.16b, v3.16b, v4.16b, #12 + /* vtmp1 = w3 | w4 | w5 | w6 */ + ext v22.16b, v2.16b, v3.16b, #12 + /* vtmp2 = w10 | w11 | w12 | w13 */ + ext v23.16b, v4.16b, v0.16b, #8 + /* sm3partw1 v1.4s, v2.4s, v0.4s */ +.inst 0xce60c041 + /* sm3partw2 v1.4s, v23.4s, v22.4s */ +.inst 0xce76c6e1 + eor v22.16b, v2.16b, v3.16b + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[0] */ +.inst 0xce5686e5 + /* sm3tt2b v6.4s, v23.4s, v2.4s[0] */ +.inst 0xce428ee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[1] */ +.inst 0xce5696e5 + /* sm3tt2b v6.4s, v23.4s, v2.4s[1] */ +.inst 0xce429ee6 + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[2] */ +.inst 0xce56a6e5 + /* sm3tt2b v6.4s, v23.4s, v2.4s[2] */ +.inst 0xce42aee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[3] */ +.inst 0xce56b6e5 + /* sm3tt2b v6.4s, v23.4s, v2.4s[3] */ +.inst 0xce42bee6 + eor v22.16b, v3.16b, v4.16b + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[0] */ +.inst 0xce5686e5 + /* sm3tt2b v6.4s, v23.4s, v3.4s[0] */ +.inst 0xce438ee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[1] */ +.inst 0xce5696e5 + /* sm3tt2b v6.4s, v23.4s, v3.4s[1] */ +.inst 0xce439ee6 + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[2] */ +.inst 0xce56a6e5 + /* sm3tt2b v6.4s, v23.4s, v3.4s[2] */ +.inst 0xce43aee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[3] */ +.inst 0xce56b6e5 + /* sm3tt2b v6.4s, v23.4s, v3.4s[3] */ +.inst 0xce43bee6 + eor v22.16b, v4.16b, v0.16b + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[0] */ +.inst 0xce5686e5 + /* sm3tt2b v6.4s, v23.4s, v4.4s[0] */ +.inst 0xce448ee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[1] */ +.inst 0xce5696e5 + /* sm3tt2b v6.4s, v23.4s, v4.4s[1] */ +.inst 0xce449ee6 + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[2] */ +.inst 0xce56a6e5 + /* sm3tt2b v6.4s, v23.4s, v4.4s[2] */ +.inst 0xce44aee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[3] */ +.inst 0xce56b6e5 + /* sm3tt2b v6.4s, v23.4s, v4.4s[3] */ +.inst 0xce44bee6 + eor v22.16b, v0.16b, v1.16b + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[0] */ +.inst 0xce5686e5 + /* sm3tt2b v6.4s, v23.4s, v0.4s[0] */ +.inst 0xce408ee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[1] */ +.inst 0xce5696e5 + /* sm3tt2b v6.4s, v23.4s, v0.4s[1] */ +.inst 0xce409ee6 + /* sm3ss1 v23.4s, v5.4s, v20.4s, v6.4s */ +.inst 0xce5418b7 + shl v21.4s, v20.4s, #1 + sri v21.4s, v20.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[2] */ +.inst 0xce56a6e5 + /* sm3tt2b v6.4s, v23.4s, v0.4s[2] */ +.inst 0xce40aee6 + /* sm3ss1 v23.4s, v5.4s, v21.4s, v6.4s */ +.inst 0xce5518b7 + shl v20.4s, v21.4s, #1 + sri v20.4s, v21.4s, #31 + /* sm3tt1b v5.4s, v23.4s, v22.4s[3] */ +.inst 0xce56b6e5 + /* sm3tt2b v6.4s, v23.4s, v0.4s[3] */ +.inst 0xce40bee6 + eor v5.16b, v5.16b, v18.16b + eor v6.16b, v6.16b, v19.16b + /* + * cbnz: compare and branch on Nonzero, compares the value in a register + * with zero, and conditionally branches to a label at a PC-relative offset + * if the comparison is not equal. + * 'w2' is the 32-bit name of the general-purpose register to be tested. + * '.Loop' is the program label to be conditionally branched to. + */ + cbnz w2, .Loop + + /* save state, it is the result of one cycle */ + rev64 v5.4s, v5.4s + rev64 v6.4s, v6.4s + ext v5.16b, v5.16b, v5.16b, #8 + ext v6.16b, v6.16b, v6.16b, #8 + st1 {v5.4s,v6.4s}, [x0] + ret +.size sm3_ce_block_compress,.-sm3_ce_block_compress + +.align 3 +.Tj: +/* + * Inserts a list of 32-bit values as data into the assembly. + * In SM3 protocol: + * when 0 <= j <= 15, Tj = 0x79cc4519, + * when 16 <= j <= 63, Tj = 0x9d8a7a87. + */ +.word 0x79cc4519, 0x9d8a7a87 diff --git a/include/drv/arm_arch_ce.h b/include/drv/arm_arch_ce.h new file mode 100644 index 0000000..cad6e33 --- /dev/null +++ b/include/drv/arm_arch_ce.h @@ -0,0 +1,199 @@ +/* SPDX-License-Identifier: Apache-2.0 */ +/* + * Copyright 2011-2022 The OpenSSL Project Authors. All Rights Reserved. + * + * Licensed under the Apache License 2.0 (the "License"). You may not use + * this file except in compliance with the License. You can obtain a copy + * in the file LICENSE in the source distribution or at + *
https://www.openssl.org/source/license.html
+ */ + +#ifndef __ARM_ARCH_CE_H +#define __ARM_ARCH_CE_H + +#ifdef __cplusplus +extern "C" { +#endif + +#if !defined(__ARM_ARCH__) +# if defined(__CC_ARM) +# define __ARM_ARCH__ __TARGET_ARCH_ARM +# if defined(__BIG_ENDIAN) +# define __ARMEB__ +# else +# define __ARMEL__ +# endif +# elif defined(__GNUC__) +# if defined(__aarch64__) +# define __ARM_ARCH__ 8 + /* + * GCC does not define __ARM_ARCH__, instead it defines + * bunch of below macros. See all_architectures[] table in + * gcc/config/arm/arm.c. + */ +# elif defined(__ARM_ARCH) +# define __ARM_ARCH__ __ARM_ARCH +# elif defined(__ARM_ARCH_8A__) +# define __ARM_ARCH__ 8 +# elif defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) || \ + defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) || \ + defined(__ARM_ARCH_7EM__) +# define __ARM_ARCH__ 7 +# elif defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) || \ + defined(__ARM_ARCH_6K__) || defined(__ARM_ARCH_6M__) || \ + defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) || \ + defined(__ARM_ARCH_6T2__) +# define __ARM_ARCH__ 6 +# elif defined(__ARM_ARCH_5__) || defined(__ARM_ARCH_5T__) || \ + defined(__ARM_ARCH_5E__) || defined(__ARM_ARCH_5TE__) || \ + defined(__ARM_ARCH_5TEJ__) +# define __ARM_ARCH__ 5 +# elif defined(__ARM_ARCH_4__) || defined(__ARM_ARCH_4T__) +# define __ARM_ARCH__ 4 +# else +# error "unsupported ARM architecture" +# endif +# endif +#endif + +#if !defined(__ARM_MAX_ARCH__) +# define __ARM_MAX_ARCH__ __ARM_ARCH__ +#endif + +#if __ARM_MAX_ARCH__ < __ARM_ARCH__ +# error "__ARM_MAX_ARCH__ can't be less than __ARM_ARCH__" +#elif __ARM_MAX_ARCH__ != __ARM_ARCH__ +# if __ARM_ARCH__ < 7 && __ARM_MAX_ARCH__ >= 7 && defined(__ARMEB__) +# error "can't build universal big-endian binary" +# endif +#endif + +#ifndef __ASSEMBLER__ +extern unsigned int ARMCAP_P; +extern unsigned int ARM_MIDR; +#endif + +#define ARMV7_NEON (1<<0) +#define ARMV7_TICK (1<<1) +#define ARMV8_AES (1<<2) +#define ARMV8_SHA1 (1<<3) +#define ARMV8_SHA256 (1<<4) +#define ARMV8_PMULL (1<<5) +#define ARMV8_SHA512 (1<<6) +#define ARMV8_CPUID (1<<7) +#define ARMV8_RNG (1<<8) +#define ARMV8_SM3 (1<<9) +#define ARMV8_SM4 (1<<10) +#define ARMV8_SHA3 (1<<11) +#define ARMV8_UNROLL8_EOR3 (1<<12) +#define ARMV8_SVE (1<<13) +#define ARMV8_SVE2 (1<<14) + +/* + * MIDR_EL1 system register + * + * 63___ _ ___32_31___ _ ___24_23_____20_19_____16_15__ _ __4_3_______0 + * | | | | | | | + * |RES0 | Implementer | Variant | Arch | PartNum |Revision| + * |____ _ _____|_____ _ _____|_________|_______ _|____ _ ___|________| + * + */ + +#define ARM_CPU_IMP_ARM 0x41 +#define HISI_CPU_IMP 0x48 + +#define ARM_CPU_PART_CORTEX_A72 0xD08 +#define ARM_CPU_PART_N1 0xD0C +#define ARM_CPU_PART_V1 0xD40 +#define ARM_CPU_PART_N2 0xD49 +#define HISI_CPU_PART_KP920 0xD01 + +#define MIDR_PARTNUM_SHIFT 4 +#define MIDR_PARTNUM_MASK (0xfffU << MIDR_PARTNUM_SHIFT) +#define MIDR_PARTNUM(midr) \ + (((midr) & MIDR_PARTNUM_MASK) >> MIDR_PARTNUM_SHIFT) + +#define MIDR_IMPLEMENTER_SHIFT 24 +#define MIDR_IMPLEMENTER_MASK (0xffU << MIDR_IMPLEMENTER_SHIFT) +#define MIDR_IMPLEMENTER(midr) \ + (((midr) & MIDR_IMPLEMENTER_MASK) >> MIDR_IMPLEMENTER_SHIFT) + +#define MIDR_ARCHITECTURE_SHIFT 16 +#define MIDR_ARCHITECTURE_MASK (0xfU << MIDR_ARCHITECTURE_SHIFT) +#define MIDR_ARCHITECTURE(midr) \ + (((midr) & MIDR_ARCHITECTURE_MASK) >> MIDR_ARCHITECTURE_SHIFT) + +#define MIDR_CPU_MODEL_MASK \ + (MIDR_IMPLEMENTER_MASK | \ + MIDR_PARTNUM_MASK | \ + MIDR_ARCHITECTURE_MASK) + +#define MIDR_CPU_MODEL(imp, partnum) \ + (((imp) << MIDR_IMPLEMENTER_SHIFT) | \ + (0xfU << MIDR_ARCHITECTURE_SHIFT) | \ + ((partnum) << MIDR_PARTNUM_SHIFT)) + +#define MIDR_IS_CPU_MODEL(midr, imp, partnum) \ + (((midr) & MIDR_CPU_MODEL_MASK) == MIDR_CPU_MODEL(imp, partnum)) + +#if defined(__ASSEMBLER__) + /* + * Support macros for + * - Armv8.3-A Pointer Authentication and + * - Armv8.5-A Branch Target Identification + * features which require emitting a .note.gnu.property section with the + * appropriate architecture-dependent feature bits set. + * Read more: "ELF for the Arm® 64-bit Architecture" + */ +# if defined(__ARM_FEATURE_BTI_DEFAULT) && __ARM_FEATURE_BTI_DEFAULT == 1 +# define GNU_PROPERTY_AARCH64_BTI (1 << 0) /* Has Branch Target Identification */ +# define AARCH64_VALID_CALL_TARGET hint #34 /* BTI 'c' */ +# else +# define GNU_PROPERTY_AARCH64_BTI 0 /* No Branch Target Identification */ +# define AARCH64_VALID_CALL_TARGET +# endif + +# if defined(__ARM_FEATURE_PAC_DEFAULT) && \ + (__ARM_FEATURE_PAC_DEFAULT & 1) == 1 /* Signed with A-key */ +# define GNU_PROPERTY_AARCH64_POINTER_AUTH (1 << 1) /* Has Pointer Authentication */ +# define AARCH64_SIGN_LINK_REGISTER hint #25 /* PACIASP */ +# define AARCH64_VALIDATE_LINK_REGISTER hint #29 /* AUTIASP */ +# elif defined(__ARM_FEATURE_PAC_DEFAULT) && \ + (__ARM_FEATURE_PAC_DEFAULT & 2) == 2 /* Signed with B-key */ +# define GNU_PROPERTY_AARCH64_POINTER_AUTH (1 << 1) /* Has Pointer Authentication */ +# define AARCH64_SIGN_LINK_REGISTER hint #27 /* PACIBSP */ +# define AARCH64_VALIDATE_LINK_REGISTER hint #31 /* AUTIBSP */ +# else +# define GNU_PROPERTY_AARCH64_POINTER_AUTH 0 /* No Pointer Authentication */ +# if GNU_PROPERTY_AARCH64_BTI != 0 +# define AARCH64_SIGN_LINK_REGISTER AARCH64_VALID_CALL_TARGET +# else +# define AARCH64_SIGN_LINK_REGISTER +# endif +# define AARCH64_VALIDATE_LINK_REGISTER +# endif + +# if GNU_PROPERTY_AARCH64_POINTER_AUTH != 0 || GNU_PROPERTY_AARCH64_BTI != 0 + .pushsection .note.gnu.property, "a"; + .balign 8; + .long 4; + .long 0x10; + .long 0x5; + .asciz "GNU"; + .long 0xc0000000; /* GNU_PROPERTY_AARCH64_FEATURE_1_AND */ + .long 4; + .long (GNU_PROPERTY_AARCH64_POINTER_AUTH | GNU_PROPERTY_AARCH64_BTI); + .long 0; + .popsection; +# endif + +#endif /* defined __ASSEMBLER__ */ + +#define IS_CPU_SUPPORT_UNROLL8_EOR3() \ + (ARMCAP_P & ARMV8_UNROLL8_EOR3) + +#ifdef __cplusplus +} +#endif + +#endif /* __ARM_ARCH_CE_H */ diff --git a/include/drv/isa_ce_sm3.h b/include/drv/isa_ce_sm3.h new file mode 100644 index 0000000..d08c72f --- /dev/null +++ b/include/drv/isa_ce_sm3.h @@ -0,0 +1,66 @@ +/* SPDX-License-Identifier: Apache-2.0 */ +/* Copyright 2020-2021 Huawei Technologies Co.,Ltd. All rights reserved. */ +#ifndef __ISA_CE_SM3_H +#define __ISA_CE_SM3_H + +#include "../wd_alg_common.h" + +#ifdef __cplusplus +extern "C" { +#endif + +#define SM3_DIGEST_SIZE 32 +#define SM3_BLOCK_SIZE 64 +#define SM3_STATE_WORDS 8 +#define HMAC_BLOCK_SIZE 64 + +#define SM3_IVA 0x7380166f +#define SM3_IVB 0x4914b2b9 +#define SM3_IVC 0x172442d7 +#define SM3_IVD 0xda8a0600 +#define SM3_IVE 0xa96f30bc +#define SM3_IVF 0x163138aa +#define SM3_IVG 0xe38dee4d +#define SM3_IVH 0xb0fb0e4e + +#define PUTU32(p, V) \ + ((p)[0] = (uint8_t)((V) >> 24), \ + (p)[1] = (uint8_t)((V) >> 16), \ + (p)[2] = (uint8_t)((V) >> 8), \ + (p)[3] = (uint8_t)(V)) + +struct sm3_ce_ctx { + /* + * Use an array to represent the eight 32-bits word registers, + * SM3_IVA, SM3_IVB, ..., SM3_IVH, save IV and the final digest. + */ + __u32 word_reg[SM3_STATE_WORDS]; + /* + * The length (in bits) of all the msg fragments, the length of the + * whole msg should less than 2^64 bit, a msg block is 512-bits, + * make a 64-bits number in two parts, low 32-bits - 'Nl' and + * high 32-bits - 'Nh'. + */ + __u64 nblocks; + /* + * Message block, a msg block is 512-bits, use sixteen __u32 type + * element to store it, used in B(i) = W0||W1||W2||...||W15. + * Use a __u8 array to replace the 32-bit array. + */ + __u8 block[SM3_BLOCK_SIZE]; + /* The number of msg that need to compute in current cycle or turn. */ + size_t num; +}; + +struct sm3_ce_drv_ctx { + struct wd_ctx_config_internal config; +}; + +void sm3_ce_block_compress(__u32 word_reg[SM3_STATE_WORDS], + const unsigned char *src, size_t blocks); + +#ifdef __cplusplus +} +#endif + +#endif /* __ISA_CE_SM3_H */ diff --git a/include/wd_alg.h b/include/wd_alg.h index f8b136e..861b7d9 100644 --- a/include/wd_alg.h +++ b/include/wd_alg.h @@ -19,6 +19,49 @@ extern "C" { #define ALG_NAME_SIZE 128 #define DEV_NAME_LEN 128 +/* + * Macros related to arm platform: + * ARM puts the feature bits for Crypto Extensions in AT_HWCAP2, whereas + * AArch64 used AT_HWCAP. + */ +#ifndef AT_HWCAP +# define AT_HWCAP 16 +#endif + +#ifndef AT_HWCAP2 +# define AT_HWCAP2 26 +#endif + +#if defined(__arm__) || defined(__arm) +# define HWCAP AT_HWCAP +# define HWCAP_NEON (1 << 12) + +# define HWCAP_CE AT_HWCAP2 +# define HWCAP_CE_AES (1 << 0) +# define HWCAP_CE_PMULL (1 << 1) +# define HWCAP_CE_SHA1 (1 << 2) +# define HWCAP_CE_SHA256 (1 << 3) +#elif defined(__aarch64__) +# define HWCAP AT_HWCAP +# define HWCAP_NEON (1 << 1) + +# define HWCAP_CE HWCAP +# define HWCAP_CE_AES (1 << 3) +# define HWCAP_CE_PMULL (1 << 4) +# define HWCAP_CE_SHA1 (1 << 5) +# define HWCAP_CE_SHA256 (1 << 6) +# define HWCAP_CPUID (1 << 11) +# define HWCAP_SHA3 (1 << 17) +# define HWCAP_CE_SM3 (1 << 18) +# define HWCAP_CE_SM4 (1 << 19) +# define HWCAP_CE_SHA512 (1 << 21) +# define HWCAP_SVE (1 << 22) +/* AT_HWCAP2 */ +# define HWCAP2 26 +# define HWCAP2_SVE2 (1 << 1) +# define HWCAP2_RNG (1 << 16) +#endif + enum alg_dev_type { UADK_ALG_SOFT = 0x0, UADK_ALG_CE_INSTR = 0x1, diff --git a/wd_alg.c b/wd_alg.c index 3b111c8..f34a407 100644 --- a/wd_alg.c +++ b/wd_alg.c @@ -9,6 +9,7 @@ #include <stdbool.h> #include <stdlib.h> #include <pthread.h> +#include <sys/auxv.h> #include "wd.h" #include "wd_alg.h" @@ -90,6 +91,24 @@ static bool wd_check_accel_dev(const char *dev_name) return false; } +static bool wd_check_ce_support(const char *dev_name) +{ + unsigned long hwcaps = 0; + + #if defined(__arm__) || defined(__arm) + hwcaps = getauxval(AT_HWCAP2); + #elif defined(__aarch64__) + hwcaps = getauxval(AT_HWCAP); + #endif + if (!strcmp("isa_ce_sm3", dev_name) && (hwcaps & HWCAP_CE_SM3)) + return true; + + if (!strcmp("isa_ce_sm4", dev_name) && (hwcaps & HWCAP_CE_SM4)) + return true; + + return false; +} + static bool wd_alg_check_available(int calc_type, const char *dev_name) { bool ret = false; @@ -99,6 +118,7 @@ static bool wd_alg_check_available(int calc_type, const char *dev_name) break; /* Should find the CPU if not support CE */ case UADK_ALG_CE_INSTR: + ret = wd_check_ce_support(dev_name); break; /* Should find the CPU if not support SVE */ case UADK_ALG_SVE_INSTR: @@ -280,8 +300,13 @@ struct wd_alg_driver *wd_request_drv(const char *alg_name, bool hw_mask) struct wd_alg_driver *drv = NULL; int tmp_priority = -1; - if (!pnext || !alg_name) { - WD_ERR("invalid: request alg param is error!\n"); + if (!pnext) { + WD_ERR("invalid: requset drv pnext is NULL!\n"); + return NULL; + } + + if (!alg_name) { + WD_ERR("invalid: alg_name is NULL!\n"); return NULL; } @@ -289,7 +314,8 @@ struct wd_alg_driver *wd_request_drv(const char *alg_name, bool hw_mask) pthread_mutex_lock(&mutex); while (pnext) { /* hw_mask true mean not to used hardware dev */ - if (hw_mask && pnext->drv->calc_type == UADK_ALG_HW) { + if ((hw_mask && pnext->drv->calc_type == UADK_ALG_HW) || + (!hw_mask && pnext->drv->calc_type != UADK_ALG_HW)) { pnext = pnext->next; continue; } diff --git a/wd_digest.c b/wd_digest.c index acf341a..8c9a9b7 100644 --- a/wd_digest.c +++ b/wd_digest.c @@ -215,7 +215,7 @@ static void wd_digest_clear_status(void) } static int wd_digest_init_nolock(struct wd_ctx_config *config, - struct wd_sched *sched) + struct wd_sched *sched) { int ret; diff --git a/wd_sched.c b/wd_sched.c index 419280e..b43834d 100644 --- a/wd_sched.c +++ b/wd_sched.c @@ -453,7 +453,7 @@ static struct wd_sched sched_table[SCHED_POLICY_BUTT] = { .poll_policy = session_sched_poll_policy, }, { .name = "None scheduler", - .sched_policy = SCHED_POLICY_SINGLE, + .sched_policy = SCHED_POLICY_NONE, .sched_init = sched_none_init, .pick_next_ctx = sched_none_pick_next_ctx, .poll_policy = sched_none_poll_policy, diff --git a/wd_util.c b/wd_util.c index 6134239..39909ca 100644 --- a/wd_util.c +++ b/wd_util.c @@ -91,6 +91,11 @@ struct acc_alg_item { char *algtype; }; +struct wd_ce_ctx { + char *drv_name; + void *priv; +}; + static struct acc_alg_item alg_options[] = { {"zlib", "zlib"}, {"gzip", "gzip"}, @@ -229,7 +234,6 @@ int wd_init_ctx_config(struct wd_ctx_config_internal *in, ret = -WD_EINVAL; goto err_out; } - clone_ctx_to_internal(cfg->ctxs + i, ctxs + i); ret = pthread_spin_init(&ctxs[i].lock, PTHREAD_PROCESS_SHARED); if (ret) { @@ -2612,14 +2616,44 @@ out_freelist: return ret; } +static int wd_alg_ce_ctx_init(struct wd_init_attrs *attrs) +{ + struct wd_ctx_config *ctx_config = attrs->ctx_config; + + ctx_config->ctx_num = 1; + ctx_config->ctxs = calloc(ctx_config->ctx_num, sizeof(struct wd_ctx)); + if (!ctx_config->ctxs) { + return -WD_ENOMEM; + WD_ERR("failed to alloc ctxs!\n"); + } + ctx_config->ctxs[0].ctx = (handle_t)calloc(1, sizeof(struct wd_ce_ctx)); + + return WD_SUCCESS; +} + +static void wd_alg_ce_ctx_uninit(struct wd_ctx_config *ctx_config) +{ + __u32 i; + + for (i = 0; i < ctx_config->ctx_num; i++) { + if (ctx_config->ctxs[i].ctx) { + free((struct wd_ce_ctx *)ctx_config->ctxs[i].ctx); + ctx_config->ctxs[i].ctx = 0; + } + } + + free(ctx_config->ctxs); +} + static void wd_alg_ctx_uninit(struct wd_ctx_config *ctx_config) { __u32 i; - for (i = 0; i < ctx_config->ctx_num; i++) + for (i = 0; i < ctx_config->ctx_num; i++) { if (ctx_config->ctxs[i].ctx) { wd_release_ctx(ctx_config->ctxs[i].ctx); ctx_config->ctxs[i].ctx = 0; + } } free(ctx_config->ctxs); @@ -2633,9 +2667,9 @@ int wd_alg_attrs_init(struct wd_init_attrs *attrs) struct wd_ctx_config *ctx_config = NULL; struct wd_sched *alg_sched = NULL; char alg_type[CRYPTO_MAX_ALG_NAME]; - char *alg = attrs->alg; int driver_type = UADK_ALG_HW; - int ret; + char *alg = attrs->alg; + int ret = 0; if (!attrs->ctx_params) return -WD_EINVAL; @@ -2646,22 +2680,37 @@ int wd_alg_attrs_init(struct wd_init_attrs *attrs) switch (driver_type) { case UADK_ALG_SOFT: case UADK_ALG_CE_INSTR: - /* No need to alloc resource */ - if (sched_type != SCHED_POLICY_NONE) + /* No need to alloc resource */ + if (sched_type != SCHED_POLICY_NONE) { + WD_ERR("invalid sched_type\n"); return -WD_EINVAL; + } + + ctx_config = calloc(1, sizeof(*ctx_config)); + if (!ctx_config) { + WD_ERR("fail to alloc ctx config\n"); + return -WD_ENOMEM; + } + attrs->ctx_config = ctx_config; alg_sched = wd_sched_rr_alloc(SCHED_POLICY_NONE, 1, 1, alg_poll_func); if (!alg_sched) { WD_ERR("fail to alloc scheduler\n"); - return -WD_EINVAL; + goto out_ctx_config; } + attrs->sched = alg_sched; - ret = wd_sched_rr_instance(alg_sched, NULL); + ret = wd_alg_ce_ctx_init(attrs); if (ret) { - WD_ERR("fail to instance scheduler\n"); + WD_ERR("fail to init ce ctx\n"); goto out_freesched; } + + ret = alg_init_func(ctx_config, alg_sched); + if (ret) + goto out_pre_init; + break; case UADK_ALG_SVE_INSTR: /* Todo lock cpu core */ @@ -2720,7 +2769,10 @@ int wd_alg_attrs_init(struct wd_init_attrs *attrs) return 0; out_pre_init: - wd_alg_ctx_uninit(ctx_config); + if (driver_type == UADK_ALG_CE_INSTR || driver_type == UADK_ALG_SOFT) + wd_alg_ce_ctx_uninit(ctx_config); + else + wd_alg_ctx_uninit(ctx_config); out_freesched: wd_sched_rr_release(alg_sched); out_ctx_config: @@ -2733,10 +2785,19 @@ void wd_alg_attrs_uninit(struct wd_init_attrs *attrs) { struct wd_ctx_config *ctx_config = attrs->ctx_config; struct wd_sched *alg_sched = attrs->sched; + int driver_type = attrs->driver->calc_type; - if (ctx_config) { - wd_alg_ctx_uninit(ctx_config); - free(ctx_config); + if (driver_type == UADK_ALG_CE_INSTR || driver_type == UADK_ALG_SOFT) { + if (ctx_config) { + wd_alg_ce_ctx_uninit(ctx_config); + free(ctx_config); + } + } else { + if (ctx_config) { + wd_alg_ctx_uninit(ctx_config); + free(ctx_config); + } } + wd_sched_rr_release(alg_sched); } -- 2.33.0
1 year, 8 months
1
0
0
0
← Newer
1
2
3
4
5
6
...
13
Older →
Jump to page:
1
2
3
4
5
6
7
8
9
10
11
12
13
Results per page:
10
25
50
100
200