[GSoC][RPC] Project Proposal: "Introducing Job control to the storage driver"

Prathamesh Chavan pc44800 at gmail.com
Tue Mar 31 00:27:26 UTC 2020


GSoC Proposal - 2020
====================

Project Name: Introducing Job control to the storage driver
===========================================================

About Me:
=========

Name:       Prathamesh Chavan
University: Indian Institute of Technology Kharagpur
Major:      Computer Science and Engineering(Dual Degree)
Email:      pc44800 at gmail.com
Blog:       pratham-pc.github.io
Contact:    +91-993-235-8333
Time Zone:  IST (UTC +5:30)


Background:
===========

I am a final year dual degree (B.Tech & M.Tech) student from the Department
of Computer Science and Engineering at IIT Kharagpur. During my first year
of college, I got introduced to open source through Kharagpur Open Source
Society and later I became part of it.

As my master's thesis project, I'm working on a tiered file system with
Software Wear Management for NVM Technologies. I always wanted to get
involved in storage technologies and the development process and Google
Summer of Code is a great way to achieve it.

I've been part of Google Summer of code in my second year of college under
the Git Organization. It was my first experience with a large codebase.
Information related to it is available in GSoC - 2017's archive[1].

Last year summers, I also interned at Nutanix. I worked on Logbay, which is
a configuration-based data collection, archiving and streaming tool used by
all services on Nutanix Controller-VM (CVM). I added multi-platform support
to Logbay by identifying all the dependencies of Logbay on the other CVM
based services and introduced an interface between these dependencies and
Logbay-Core for allowing it to be used on different platforms where they
aren’t available.  I also implemented the interface on a Dev-VM and added
multi-port support in it for allowing multiple instances of Logbay to run
on a single Dev-VM to simulate a multi-node cluster which allowed the
developers to test their changes on their Dev-VM itself.


The Project:
============

Introducing job control to the storage driver
Summary: Implement abstract job control and use it to improve storage driver.
Mentor: Pavel Hrdina

Abstract:
=========

Currently, libvirt support job cancellation and progress reporting on domains.
That is, if there's a long-running job on a domain, e.g. migration, libvirt
reports how much data has already been transferred to the destination and how
much still needs to be transferred. However, libvirt lacks such information
reporting in storage area, to which libvirt developers refer to as the storage
driver. The aim is to report progress on several storage tasks, like volume
wiping, file allocation an others.


Job Control in Domain:
======================

In src/qemu/qemu_domain.h, we can find the struct qemuDomainJobObj, which is
a job object in domains, and is used for: coordinating between jobs, help
identify which API call owns the job object, and contain rest additional info
regarding the normal job/agent job/async job.  This qemuDomainJobObj is part
of another struct qemuDomainObjPrivate, which majorly is the object, the
driver's API majorly interacts with which calling jobs on.

Whenever an API call is made, depending upon the type of the job, specific
locks are acquired and then the job is carried out. Exact design details
regarding the implementation of such APIs are present in
`src/qemu/THREADS.txt`.


Job Control in Storage:
=======================

Whenever an API call is made to a storageVol, the member `in_use` of the
struct `virStorageVolDef` is used as a reference counter. This allows us to
check whether the storage volume is already in use or not, and whether the
current API call can be carried out.  Once the API call exits, the reference
counter is decremented.


Reporting of job-progress as done in case of Domains:
=====================================================

Additionally, when an async job is running: it also contains
qemuDomainJobInfo: which stores the progress data of the async job,
and an another qemuDomainJobInfo stores the statistics data of a recently
completed job.

Functions virDomainGetJobInfo() and virDomainGetJobStats() present in
libvirt-domain.c help extract information about progress of a background
job on a domain.


Plan to implement something similar in Storage Jobs:
====================================================

Firstly, it's important to bring in the notion of jobs to the storage driver.
Right now the API calls get directly executed if the required mutex locks are
acquired. But this gives the rest of the API calls less information about what
is running currently or has the locks acquired. Further, the domain jobs
additionally contain a lot of information which can even be useful in case of
the storage API calls.

Firstly, identification of what all API calls are occurring on Storage Volumes
in storage driver, and classifying them into something similar to normal jobs
and async jobs (the long-running ones). Also, some of the API calls will not
be acquiring a job (ones which didn't change the reference counter).

After this, a document similar to src/qemu/THREADS.txt needs to be created for
storage job handling and should mention the new design of the existing storage
APIs, acquiring jobs and appropriate locks as required.

Additional new APIs need to be implemented for the creation, deletion of
storage jobs. These would be similar to the domain job API present in
qemu/qemu_domain.h such as qemuDomainObjBeginJob(), etc.  This specifically
also included storage equivalent functions of virDomainGetJobInfo() and
virDomainGetJobStats(). These would be used by the long-running storage jobs
to report completion progress.

Existing storage API needs to be implemented with this new job notion and the
reference counter member 'in_use' be removed.


Other desired changes:
======================

1. Unification of the notion of jobs throughout: one of the steps taking
keeping this in mind could be to have the storage job API implementation as
close to the domain job API, so that later on unification would be easier.
Unification, IMO, is left in the future scope of the project.

2. Prediction of time-remaining for job completion as the progress for
various jobs is reported.


BiteSizedTask attempted:
========================

Clean up variables in tools/libvirt-guests.sh.in
Mentor: Martin Kletzander
A patch was floated on the mailing list and its latest version can be found
here[2].


Rest of the plans for the summers:
==================================

Due to the on-going COVID-19 Pandemic, my internship was canceled. Hence,
I'll be available full-time for the project. In August, I'll be joining
Nutanix as a Full-time Employee.


PS:
===

1. It was already very late by the I decided to take part in GSoC'20. Hence, I
wasn't able to give the required amount of time for preparing this project's
proposal. If it would be okay, I'll still like to keep updating this proposal
after the deadline and add a few important things, such as solid project
deliverables along with a timeline.

2. As Pavel Hrdina is the mentor of this project, and as this project was
suggested by Michal Privoznik, I've cc'd both of them to this email.

3. Since I still haven't spent enough time understanding the details of the
existing APIs, I might have gone wrong at a few places, and I would be glad
to have them pointed out.

4. One of the mentioned requirements according to libvirt's GSoC FAQ page[3]
is passing an interview. When does this interview typically take place, in the
GSoC timeline?

5. A google-doc of this proposal can be found here[4]. Comments on the
doc are welcomed.

[1]: https://summerofcode.withgoogle.com/archive/2017/projects/5434523185577984/
[2]: https://www.redhat.com/archives/libvir-list/2020-March/msg01303.html
[3]: https://wiki.libvirt.org/page/Google_Summer_of_Code_FAQ
[4]: https://docs.google.com/document/d/1Js-yi1oNrl9fRhzvMJwBHyygYCAtr_q7Bby8k1jOdig/edit?usp=sharing





More information about the libvir-list mailing list