Discussion:
[lxc-devel] [PATCH] rootfs pinning: make file hidden, don't delete it, encode pid
Christian Seiler
2013-09-28 06:36:34 UTC
Permalink
This makes the following changes to the rootfs pinning mechanism:

- Contrary to a recent change, the file will not be deleted
immediately after creation. This should make NFS happy.

- To reduce nuissance for administrators, make the file hidden by
default.

- Delete the file at container shutdown. (i.e. clean up after oneself)

- Because the file is now deleted, and a rootfs could be shared
between containers, the filename now encodes the pid of the
lxc-start process, so that the rw-hold on the filesystem will only
be removed once the last container exits.

Signed-off-by: Christian Seiler <christian at iwakd.de>
---
src/lxc/conf.c | 24 +++++++++++++++---------
src/lxc/conf.h | 2 +-
src/lxc/start.c | 10 +++++++++-
src/lxc/start.h | 1 +
4 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/src/lxc/conf.c b/src/lxc/conf.c
index ecbcf41..4cd9462 100644
--- a/src/lxc/conf.c
+++ b/src/lxc/conf.c
@@ -695,15 +695,18 @@ static int mount_rootfs_block(const char *rootfs, const char *target)

/*
* pin_rootfs
- * if rootfs is a directory, then open ${rootfs}/lxc.hold for writing for
- * the duration of the container run, to prevent the container from marking
- * the underlying fs readonly on shutdown. unlink the file immediately so
- * no name pollution is happens
+ * if rootfs is a directory, then open ${rootfs}/.lxc-hold-$pid for writing
+ * for the duration of the container run, to prevent the container from
+ * marking the underlying fs readonly on shutdown. $pid is the pid of
+ * lxc-start. At shutdown the hold file will be removed again. The pid is
+ * encoded in the filename, so that if multiple containers use the same
+ * root filesystem, a hold will be kept open until the last container is
+ * stopped.
* return -1 on error.
* return -2 if nothing needed to be pinned.
* return an open fd (>=0) if we pinned it.
*/
-int pin_rootfs(const char *rootfs)
+int pin_rootfs(const char *rootfs, char **filename_ptr)
{
char absrootfs[MAXPATHLEN];
char absrootfspin[MAXPATHLEN];
@@ -725,16 +728,19 @@ int pin_rootfs(const char *rootfs)
if (!S_ISDIR(s.st_mode))
return -2;

- ret = snprintf(absrootfspin, MAXPATHLEN, "%s/lxc.hold", absrootfs);
+ ret = snprintf(absrootfspin, MAXPATHLEN, "%s/.lxc-hold-%lu", absrootfs, (unsigned long)getpid());
if (ret >= MAXPATHLEN)
return -1;

+ if (filename_ptr) {
+ *filename_ptr = strdup(absrootfspin);
+ if (!*filename_ptr)
+ return -1;
+ }
+
process_lock();
fd = open(absrootfspin, O_CREAT | O_RDWR, S_IWUSR|S_IRUSR);
process_unlock();
- if (fd < 0)
- return fd;
- (void)unlink(absrootfspin);
return fd;
}

diff --git a/src/lxc/conf.h b/src/lxc/conf.h
index 84acce8..e3a962a 100644
--- a/src/lxc/conf.h
+++ b/src/lxc/conf.h
@@ -331,7 +331,7 @@ extern int detect_shared_rootfs(void);
extern struct lxc_conf *lxc_conf_init(void);
extern void lxc_conf_free(struct lxc_conf *conf);

-extern int pin_rootfs(const char *rootfs);
+extern int pin_rootfs(const char *rootfs, char **filename_ptr);

extern int lxc_create_network(struct lxc_handler *handler);
extern void lxc_delete_network(struct lxc_handler *handler);
diff --git a/src/lxc/start.c b/src/lxc/start.c
index 7538403..26c71c4 100644
--- a/src/lxc/start.c
+++ b/src/lxc/start.c
@@ -386,6 +386,14 @@ static void lxc_fini(const char *name, struct lxc_handler *handler)
if (run_lxc_hooks(name, "post-stop", handler->conf, handler->lxcpath, NULL))
ERROR("failed to run post-stop hooks for container '%s'.", name);

+ /* remove rootfs pin file after we shutdown this
+ * container
+ */
+ if (handler->pin_filename) {
+ unlink(handler->pin_filename);
+ free(handler->pin_filename);
+ }
+
/* reset mask set by setup_signal_fd */
if (sigprocmask(SIG_SETMASK, &handler->oldmask, NULL))
WARN("failed to restore sigprocmask");
@@ -708,7 +716,7 @@ int lxc_spawn(struct lxc_handler *handler)
* marking it readonly.
*/

- handler->pinfd = pin_rootfs(handler->conf->rootfs.path);
+ handler->pinfd = pin_rootfs(handler->conf->rootfs.path, &handler->pin_filename);
if (handler->pinfd == -1)
INFO("failed to pin the container's rootfs");

diff --git a/src/lxc/start.h b/src/lxc/start.h
index c35c5c4..345b4e0 100644
--- a/src/lxc/start.h
+++ b/src/lxc/start.h
@@ -50,6 +50,7 @@ struct lxc_handler {
struct lxc_operations *ops;
void *data;
int sv[2];
+ char *pin_filename;
int pinfd;
const char *lxcpath;
struct cgroup_process_info *cgroup;
--
1.7.10.4
Serge Hallyn
2013-10-01 03:36:51 UTC
Permalink
Hi,
I'm attaching a patch that makes additional chanes to the rootfs
<https://github.com/chris-se/lxc/tree/rootfs-pinning>
This patch implements things that were floating around in discussions
earlier. I know it is still in flux of how to best handle ro-remounts
at shutdown, so the idea is that this patch would be applied if lxc
will stick with the pinning mechanism. (Which seems likely to me at
this point because of the kernel change that remount without bind will
still remount the whole filesystem, see automounting improvements.)
- Don't auto-delete the file after creation because of NFS problems.
Barring misunderstanding on my part, I disagree with this.
The deleted filename just gets renamed to another hidden
file, which is deleted on file close. If two files named 'b'
are both held open and deleted by two different tasks (even
on the same host) they are given different (sequential) names.

So I think it's better to continue to delete them as we were.

Is there another side effect I'm not considering?
- But then the file will lurk around the system for the whole time, so
make it hidden, i.e. make the filename start with a dot.
- Cleanup after ourselves at shutdown. This will still leave files
in case of a power failure.
- Because we now cleanup after ourselves at shutdow, and a rootfs
could be used by potentially multiple containers, encode the pid of
lxc-start in the filename. That way, even with cleanup, only after
the last container using that rootfs ends the last hold file will be
released.
- Continue to NOT use O_EXCL, see previous discussions, since we don't
want container startup to fail after reboot with power failure.
Note that the hold file is inside the rootfs now (previously it was
in a directory one level higher), but that was already the case with my
previous rootfs pinning commit that's already applied to master. The
rationale behind this is that I've run into the situation where I have
rootfs on a separate filesystem, but that this filesystem is already
mounted on the host (for various reasons), such that a hold file in the
parent directory will be of no use. But the current change means that
the hold file will leak into the container. I do not think this is
problematic, since the worst thing someone inside the container could
do is unlink it, and that does not affect the basic functionality.
-- Christian
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
_______________________________________________
Lxc-devel mailing list
Lxc-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel
Loading...