Google Drive Management Tool
This repository contains a custom management tool for scanning google user drives for shared files, logging information about shared files, and changing sharing permissions for files owned by users who will be deleted.
Configuration is performed via a configuration file. Take a look at the example configuration file for more information.
Usage
The tool can be invoked from the command line:
$ gdrivemanager scan
By default this will log what will be done but not implement the change. To implement the management operation the --write
flag is required:
:
$ gdrivemanager scan --write
See the output of gdrivemanager --help
for more information on valid
command-line flags and commands.
Unless overridden on the command line, the tool searches for its configuration file in the following places in the following order:
- A
configuration.yaml
file in the current directory. -
~/.gdrivemanager/configuration.yaml
. -
/etc/gdrivemanager/configuration.yaml
.
The first located file is used.
Installation
The command-line tool can be installed directly from the git repository:
$ pip3 install git+https://gitlab.developers.cam.ac.uk/uis/gsuite/synctool.git
For developers, the script can be installed from a cloned repo using pip
.
It recommended to do this in a virtual environment:
$ cd /path/to/this/repo
$ python -m venv .venv
$ . ./.venv/bin/activate
$ pip3 install -e .
Operations
UCam fields
Google users in the domain being scanned should have a set of custom fields
defined in a schema called Ucam
. These fields are:
mydrive-shared-action
mydrive-shared-result
mydrive-shared-filecount
mydrive-shared-doc
The mydrive-shared-action
field is a string field set by the
directory synchronisation tool
and should contain one of the values:
scan-files
scan-folders
recover
-
scan
(deprecated)
The mydrive-shared-result
field is a string field and is set by
gdrivemanager
; it will contain one of the values:
permissions-none
permissions-removed
permissions-recovered
The mydrive-shared-filecount
and mydrive-shared-doc
fields are debug fields
containing the number of files with shared permissions removed, and the name of
the permissions recovery document written as part of the scan.
Scan
The scan operation retrieves a list of all users marked for scan and removes the shared file permissions from any of their shared files. The scan operates in two passes, the first removing all shared folder permissions and the second removing all other shared permissions.
Users are marked for scan if their UCam.mydrive-shared-action
field is set
to scan-folders
, scan-files
, or scan
; this last is deprecated and treated
the same as scan-folders
.
The first pass scan happens when the mydrive-shared-action
is set to
scan-folders
. A list of all the user's shared folders is retrieved, and the
sharing permissions applied to them are removed. The list of permissions
removed is preserved and stored in a YAML
document in the google shared drive
configured by google.shared_storage_drive
. Finally the user's UCam fields are
updated, setting mydrive-shared-action
to scan-files
and the filecount and
document fields to the number of shared folders processed and the name of the
written YAML
document.
The second pass scan happens when the mydrive-shared-action
is set to
scan-files
. A list of all the user's shared files is retrieved, and the
sharing permissions applied to them are removed. The list of permissions
removed is preserved and stored in a YAML
document in the google shared drive
configured by google.shared_storage_drive
. If a permissions document exists,
from the first-pass scan the document is updated rather than a new document
being created.
Once the second-pass scan is complete the user's UCam fields are updated, removing
the mydrive-shared-action
. The mydrive-shared-result
is set to
permissions-removed
if any permissions were removed in either the first- or
second- pass scan, or to permissions-none
if there were no permissions removed.
The filecount and document fields to the number of shared folders and files
processed over both scans and the name of the written YAML
document.
The scan operation has three limits. max_scanned_users
limits how many users
will be scanning in each pass. A new user won't be scanned if the total elapsed
time of the current pass has excessed the max_total_scan_duration
in minutes.
The max_user_scan_duration
is the number of minutes the task of listing a
user's personal drive files is allowed to take before being aborted. If a user
is aborted this way, their mydrive-shared-action
is prefixed with "manual-"
to indicated they need manual attention and to prevent them from blocking
further progress with other users.
Recover
The recover operation uses the permissions YAML
document written during the
scan operation to re-instate any shared permissions that were removed.
Users are marked for recovery if their UCam.mydrive-shared-action
field is
set to recover
.
The recovery operation retrieves the permissions YAML
document from the
configured shared drive, and applies all the permissions found in this document
to the user's files.
Shared Drive Usage
The "shared-drive-usage" operation maintains a cache file of the list of shared
drives with their permissions, usage and number of files. Each drive and the list
in general has a last_updated
timestamp.
This cache file is stored in the shared drive configured by google.shared_storage_drive
with a name configured by google.shared_drive_list_file
(defaults to
"shared_drive_list.yaml").
Each run of this operation, first checks if the list needs updating. i.e. the list's
last_updated
is older than google.shared_drive_list_cache_days
days (default 7).
If so all the shared drives are relisted, their current permissions added and then
cached usage
, files
, last_updated
, last_scanned
and last_scan_success
merged.
A list of shared drives to be scanned is obtained and sorted. These are drives
whose last_scanned
(last_updated
as a fallback) is older than
google.shared_drive_usage_cache_days
days (default 7). Drives without a
last_scanned
or last_updated
(new to list) are first to be scanned with the
rest sorted from oldest to newest.
A successful scan will result in last_updated
being updated. last_scanned
and
last_scan_success
are always updated when a drive scan is completed.
While the run hasn't taken limits.max_shared_drive_usage_duration
minutes, each
drive's usage and file count is determined. Unfortunately, there is currently no
way to immediately get these without scanning through all files and summing their
quotaBytesUsed
values.
The shared drive cache is rewritten after each shared drive is updated as there is a (high) potential for the API calls to drop out when counting a large number of files in a shared drive.
Reporting
The "shared-drive-report" operation will use the shared drive cache file to compile the report data. It will also search for all users in Lookup to add institutions and put users in appropriate fields.
This report will be written to the file specified by
report.all_shared_drive_filename
(defaults to shared-drive-usage-{timestamp}.csv
).
If the configuration report.output_location
is given (as a Google Drive folder id)
then the report output file will be saved to this location, otherwise it will be
saved locally.
User, Inst or Group Reporting
Specifying who to report on
The "report" operation can be given a single user (--user=CRSID
), an institution
(--instid=INSTID
) or group (--groupid=GROUPID
). For the latter two, the Lookup
institution/group active and cancelled members will be obtained. For groupid, the
group's short name (e.g. uis-devops-hamilton
) can be used. For institutions, the
--children
option can be added to include members of all child institutions too.
If the configuration report.output_location
is given (as a Google Drive folder id)
then the report output files will be saved to this location.
Instead of specifying a institution, group or user directly, the --request
option
can be used to produce a report based on the existence of a report request file
(content irrelevant) in the report.output_location
(locally if not specified).
The format of this file's name should be report-request-{type}-{id}
where the {type}
is as follows:
- "i", "inst", "institution" with {id} being the instid
- "g", "group" with {id} being either groupid or group's short name
- "u", "user" with {id} being the CRSid
This report request file will be deleted after the report output files have been saved.
Report process
For all the users in the institution or group, or just the single user, firstly,
their MyDrive usage is gathered and exported to a CSV file with a name configured by
report.mydrive_filename
(defaults to {id}-mydrive-{timestamp}.csv
with id
being the CRSid, InstID or GroupID).
Next, all shared drives are checked using the shared drive cache file, see above.
Any drives that has a manager or content manager matching at least one of the
user(s) is included in another CSV file with a name configured by
report.shared_drive_filename
(defaults to {id}-shared-drive{timestamp}.csv
).
Note, if a shared drive usage has yet to be counted then it will have blank values in the CSV and the report may need regenerating later.
Testing against gdev.apps.cam.ac.uk, (UCam test Google Workspace)
Download the gdrive-management-bot-test
GCP service account credentials from 1password and save as credentials.json
.
Copy the configuration.yaml.example
file to configuration.yaml
.
The following configuration has already been performed for this service account.
Preparing a service account (Admin Roles)
Google have updated the API to allow service accounts direct access to the API without needing domain-wide delegation.
Therefore, in order to read and write users' custom schema, the service account
only needs to be added to the "User Management" admin role. It therefore doesn't need
to impersonate an admin user (nor need the admin.directory.user
scope that
would require).
- Create a service account in the Google Cloud Platform Console for this script.
- Copy the service account's full email address.
- In the Google Workspace admin panel, go to "Account" > "Admin Roles" and open the "User Management" role.
- Add the service account to the role using the "Assign service accounts" option when viewing the role's admins
Required API scopes
In order for the tool to impersonate the users that it will need to read and write files and permissions for, the service account will need the following scopes:
https://www.googleapis.com/auth/drive.metadata.readonly
https://www.googleapis.com/auth/drive
These need to be added to the Domain-Wide-Delegation configuration for the domain:
- View the service account in the Google Cloud Platform Console.
- Copy the service account's unique id (this is the same as the
client_id
). - In the Google Workspace admin panel, go to "Security Settings" > "Access and data control" > "API Controls".
- Click "Manage Domain-Wide Delegation" then "Add new"
- Paste in the service account Client ID and add a comma-separated list of scopes above.