On 8 November 2016 at 12:54, Tomeu Vizoso tomeu.vizoso@collabora.com wrote:
[Moving to lava-users as suggested by Neil]
Trimming the CC list - the lava-team can reply using lava-users directly.
On 11/07/2016 03:20 PM, Neil Williams (Code Review) wrote:
Neil Williams has posted comments on this change.
https://review.linaro.org/#/c/15203/3/lava_dispatcher/pipeline/actions/deplo...
File lava_dispatcher/pipeline/actions/deploy/tftp.py:
Line 127: def _ensure_device_dir(self, device_dir):
Cannot say that I have fully understood it yet. Would it be correct if the
The Strategy classes must not set or modify anything. The accepts method does some very fast checks and returns True or False. Anything which the pipeline actions need to know must be specified in the job submission or the device configuration. So either this is restricted to specific device-types (so a setting goes into the template) or it has to be set for every job using this method (for situations where the support can be used or not used on the same hardware for different jobs).
What is this per-device directory anyway and how is it meant to work with tftpd-hpa which does not support configuration modification without restarting itself? Jobs cannot require that daemons restart - other jobs could easily be using that daemon at the same time.
So each firmware image containing Depthcharge will also contain hardcoded values for the IP of the TFTP server, and for the paths of a cmdline.txt file and a FIT image. The FIT image containing a kernel and a DTB file, and optionally a ramdisk.
That's not a system that lends itself easily to automation. Fixed IP addresses are the exception rather than the rule, the dispatcher itself doesn't care if it's IP address changes as long as the restarted slave can still contact the master over ZMQ.
It sounds a lot like a system that can work for single jobs on a developer desk but it has clear problems when it comes to automating the process at any level of scale.
Because the paths are set when the FW image is flashed, we cannot use the per-job directory. Thus we add a parameter to the device that is to be set in the device-specific template of Chrome devices.
So that parameter needs to appear in the device-type template and therefore in the device.yaml delivered to the dispatcher. This way, there will be a readily identifiable value in the device['parameters'] block for which the accepts() can check. The validate() function in the Action initialised by the Strategy (once the accepts classmethod has finished) can then retrieve the value and do checks where required, setting self.errors if there are problems. (Do not raise exceptions in validate - the purpose is to collect all possible errors in a single run of all validate functions in the populated pipeline.) That is all standard V2.
If that parameter is present, then a directory in the root of the TFTP files tree will be created with the value of that parameter.
So the dispatcher needs read:write access on the TFTP server as well? How is that to be achieved? Where will this TFTP server be located? Is it shared by multiple instances or multiple dispatchers? Who manages any filesystem collisions? What process cleans up the per-job directories when the job completes or fails? Hint: a mountpoint on the dispatcher is not going to be workable either. We already have quite enough issues with SSHFS in V1 to go there again. We need to avoid solutions which make assumptions about local network topology. NFS is for DUTs only.
This is beginning to sound like it needs a full Protocol, similar to LXC or VLANd. A daemon would exist on the remote TFTP server which responds to (authenticated? encrypted?) requests from the dispatchers over a declared protocol (e.g. a tcp socket). The daemon makes the changes requested, manages conflicts, reports errors and generally keeps the remote TFTP server running and tidy. The dispatcher uses support just like VLANd or MultiNode to communicate with the daemon remotely, getting the IP address (of the daemon) from the device parameters.
The TFTP server doesn't need to be restarted because its configuration is left unchanged, we just create a directory where depthcharge will look for the files.
That's the basic problem: "we just create" 'foo on a remote server' is not going to work. The 'we' there needs to be fully automated, it needs to be managed for multiple simultaneous jobs, it needs to tidy up only when instructed and it will have to report errors and keep itself running. Otherwise, none of this will actually scale.