Hi folks,

Attached is the Linpack benchmark, which I ran GCC and Clang with and without vectorization (though most of the loops are not vectorized).

Reading the output of LLVM loop vectorizer, it also doesn't do much, but the net gain is due to the basic-block vectorizer. Does GCC has a similar concept?

The results are also attached.

cheers,
--renato