On Fri, Jul 27, 2018 at 7:41 AM, Yongqin Liu yongqin.liu@linaro.org wrote:
Hi, Sumit, John, Amit, All
I am investigating on the VtsKernelNetTest failure with HiKey 4.4 kernel, and I found the problem is that the socket.connect call returns -11, with adding printk lines in the SYSCALL_DEFINE3 of connect in socket.c file, I found that the error is returned by the line of "err = sock->ops->connect(sock, (struct sockaddr *)&address, addrlen,sock->file->f_flags);" https://android.googlesource.com/kernel/hikey-linaro/+/android-hikey-linaro-...
There actually I don't know which connect function is called, so I searched the .connect assignment in kernel/linaro/hisilicon-4.4/net/ipv4/
and with adding printk lines, I found the implementation is tcp_v4_connect in net/ipv4/tcp_ipv4.c here: https://android.googlesource.com/kernel/hikey-linaro/+/android-hikey-linaro-...
There with adding printk lines, I found it the -11 is returned by call of ip_route_connect here: https://android.googlesource.com/kernel/hikey-linaro/+/android-hikey-linaro-...
Then I need to go to the definition of ip_route_connect to add printks and so on to find the real place where -11 is return, and check the reason there.
but this work seems stupid and time consuming, I think there should be smart methods I need to learn.
Honestly, for situations like this where there is some failure in userspace in a subsystem I'm not familiar with, I usually do exactly what you've done. I know that's not what you probably want to hear though.
I strace initially to figure out what syscall is failing to userspace, then I add printk bits to the syscall handler in the kernel to figure out what component is failing, then dig down the stack.
ftrace can be useful in some cases (and critical in high perf cases where you can't slow the system down with printks), but the enabling/dumping steps are usually extra overhead in my process. Kernel debuggers can also be useful, but getting those enabled properly on a random arm board has always been a bigger barrier then I'm willing to jump when I have a small problem.
The only other useful tricks I have are: * git bisection * intuition-driven git diff comparisions (again when you have known good and bad commits). * git log/tig on directories to isolate changes
So yea, printk debugging is a bit "stupid" but its pretty quick/reliable for chasing down these sorts of problems. The biggest issue usually being unfamiliarity with the code, so while going function by function, reading the code and placing debug messages to trace how the logic runs might not feel smart, can have the useful sideeffect of letting you learn more about the subsystem, which will help next time your in the area.
Mostly when doing this, its important to be able to quickly automate the building/flashing/booting/testing process so you can iterate quickly.
thanks -john