On 11/13/2015 07:35 PM, Zoltan Kiss wrote:
On 13/11/15 16:19, Nikolai Bozhenov wrote:
On 11/06/2015 05:48 PM, Zoltan Kiss wrote:
Hi,
We have a packaging/linking/optimization problem at LNG, I hope you guys can give us some advice on that. (Cc'ing ODP list in case someone want to add something) We have OpenDataPlane (ODP), an API stretching between userspace applications and hardware SDKs. It's defined in the form of C headers, and we already have several implementations to face SDKs (or whathever is actually controlling the hardware), e.g. linux-generic, a DPDK one etc. And we have applications, like Open vSwitch (OVS), which now is able to work with any ODP platform implementation which implements this API When it comes to packaging, the ideal scenario would be to create one package for the application, e.g. openvswitch.deb, and one for each platform, e.g odp-generic.deb, odp-dpdk.deb. The latter would contain the implementations in the form of a libodp.so file, so the application can dynamically load the actually installed platform's library runtime, with all the benefits of dynamic linking. The trouble is that we have several accessor functions in the API which are very short and __very__ frequently used. The best example is "uint32_t odp_packet_len(odp_packet_t pkt)", which returns the length of the packet. odp_packet_t is an opaque type defined by the implementation, often a pointer to the packet's actual metadata, so the actual function call yields to a simple load from that metadata pointer (+offset). Having it wrapped into a function call brings a significant performance decrease: when forwarding 64 byte packets at 10 Gbps, I got 13.2 Mpps with function calls. When I've inlined that function it brought 13.8 Mpps, that's ~5% difference. And there are a lot of other frequently used short accessor functions with the same problem. But obviously if I inline these functions I break the ABI, and I need to compile the application for each platform (and create packages like openvswitch-odp-dpdk.deb, containing the platform statically linked). I've tried to look around on Google and in gcc manual, but I couldn't find a good solution for this kind of problem. I've checked link time optimization (-flto), but it only helps with static linking. Is there any way to keep the ODP application and platform implementation binaries in separate files while having the performance benefit of inlining?
Regards,
Zoltan
Hi!
If all you need is to have fast and portable binary, I wonder if you could use relocations to attain your goal. I mean, to make the dynamic linker overwrite at startup time the call instructions with some machine specific absolute values. E.g. with 0xe590400c which is the binary representation of the 'ldr r0, [r0, #12]' instruction and which seems to be fully equivalent to the call to odp_packet_len.
Something like that would be the best, yes, but it seems gcc and friends doesn't support that. As others said, probably LLVM has a JIT which can do that.
I don't think you need JIT. JIT is obviously overkill for that. All you need is to reserve some space at compile time (e.g. with an inline assembly) and then patch the space at startup. The latter sounds like a task for the loader.
Though, I don't think there is support for that in any toolchain. It is not typical to have such hot small functions in shared libraries. So, you will have to do some development in the toolchain anyway to support the suggested optimization.
Nikolai