FAQ | This is a LIVE service | Changelog

Skip to content
Snippets Groups Projects
Commit 4af9bd71 authored by Dave Hart's avatar Dave Hart :pizza:
Browse files

Post-review refactor Lookup API code. (#35)

The review discovered that the sync tool, when run on production, would never
succeed because the membership list of some groups/institutions was so large
that the API response exceeded the size limit imposed by API Gateway (due to
Lookup API returning detailed lists of members). This commit refactors the code
to fetch group/institution membership for each user and use that to populate
the membership set for each group/institution.

Use Python 3.9 in Dockerfile as new Python features are now used.

Update version of Flake8 so the checks are compatible with the newer version
of Python.

Remove redundant function `_extract_uid()`.

Modify Flake8 configuration to exclude the standard set of folders used across
projects.
parent 50564659
No related branches found
No related tags found
1 merge request!27Switch to using Lookup API instead of Ibis client
Pipeline #238212 passed with warnings
# This Dockerfile is intended only to support the Auto-DevOps pipeline on GitLab.
# It's not intended to package the application.
FROM registry.gitlab.developers.cam.ac.uk/uis/devops/infra/dockerimages/python:3.7-alpine
FROM registry.gitlab.developers.cam.ac.uk/uis/devops/infra/dockerimages/python:3.9-alpine
WORKDIR /usr/src/app
......
......@@ -177,34 +177,43 @@ api_gateway:
lookup:
# Filter to use to determine the "eligible" list of users. If a non-admin user
# is found on Google who isn't in this list, their account will be suspended.
# Filter defined in LQL ( https://www.lookup.cam.ac.uk/lql ).
eligible_user_filter: "person: crsid != ''"
# Filter to use to determine the "eligible" list of groups. If a group is
# found on Google that isn't in this list, it will be deleted.
# Filter defined in LQL.
eligible_group_filter: "group: groupid != ''"
# Filter to use to determine the "eligible" list of institutions. If an
# institution is found on Google that isn't in this list, it will be deleted.
# Filter defined in LQL.
eligible_inst_filter: "inst: instid != ''"
# Filter to use to determine the "managed" list of users. If a user appears in
# this list who isn't in Google their account is created. If the user metadata
# for a user in this list changes, the change is propagated to Google. If
# null, the value of "eligible_user_filter" is used. Default: null.
# Filter to use to determine the "managed" list of users (a subset of the
# "eligible" users). If a user appears in this list who isn't in Google their
# account is created. If the user metadata for a user in this list changes,
# the change is propagated to Google.
# Filter defined in Python.
# Default: null.
managed_user_filter: null
# Filter to use to determine the "managed" list of groups. If a group appears
# in this list that isn't in Google it is created. If the group metadata or
# list of members for a group in this list changes, the change is propagated
# to Google. If null, the value of "eligible_group_filter" is used.
# Filter to use to determine the "managed" list of groups (a subset of the
# "eligible" groups). If a group appears in this list that isn't in Google
# it is created. If the group metadata or list of members for a group in this
# list changes, the change is propagated to Google.
# Filter defined in Python.
# Default: null.
managed_group_filter: null
# Filter to use to determine the "managed" list of institutions. If an
# institution appears in this list that isn't in Google it is created. If the
# institution metadata or list of members for an institution in this list
# changes, the change is propagated to Google. If null, the value of
# "eligible_inst_filter" is used. Default: null.
# Filter to use to determine the "managed" list of institutions (a subset of
# the "eligible" institutions). If an institution appears in this list that
# isn't in Google it is created. If the institution metadata or list of
# members for an institution in this list changes, the change is propagated
# to Google.
# Filter defined in Python. The Python data structure that defines an
# institution is the same as the one that defines a group.
# Default: null.
managed_inst_filter: null
# Details about the Google Domain we're managing.
......
......@@ -3,6 +3,7 @@ Load current user, group and institution data from Lookup.
"""
import collections
import functools
import logging
import yaml
......@@ -16,16 +17,15 @@ from .base import ConfigurationStateConsumer
LOG = logging.getLogger(__name__)
# Number of entries to request with each API request to list entities
LIST_FETCH_LIMIT = 1000
# Default number of entries to request with each API request to list entities
DEFAULT_LIST_FETCH_LIMIT = 1000
# The scheme used to identify users
UID_SCHEME = 'crsid'
# Extra attributes to fetch when checking managed entities
USER_FETCH = 'all_groups,all_insts,firstName'
MANAGED_USER_FETCH = 'firstName'
USER_INST_FETCH = 'all_insts'
MANAGED_GROUP_FETCH = 'all_members'
# Properties containing user/group/institution information in search query results
USER_RESULT_PROPERTY = 'people'
......@@ -33,7 +33,9 @@ GROUP_RESULT_PROPERTY = 'groups'
INST_RESULT_PROPERTY = 'institutions'
# User and group information we need to populate the Google user directory.
UserEntry = collections.namedtuple('UserEntry', 'uid cn sn displayName givenName licensed')
UserEntry = collections.namedtuple(
'UserEntry', 'uid cn sn displayName givenName groupIDs groupNames instIDs licensed'
)
GroupEntry = collections.namedtuple('GroupEntry', 'groupID groupName description uids')
......@@ -49,10 +51,13 @@ class LookupRetriever(ConfigurationStateConsumer):
self.inst_api_client = InstitutionApi(self.lookup_api_client)
def retrieve_users(self):
# Get a set containing all CRSids. These are all the people who are eligible to be in our
# GSuite instance. If a user is in GSuite and is *not* present in this list then they are
"""
Retrieve information about Lookup users.
"""
# Get a set containing the CRSids of all people who are eligible to be in our GSuite
# instance. If a user is in GSuite and is *not* present in this list then they are
# suspended.
LOG.info('Reading eligible user entries from Lookup')
eligible_uids = self.get_eligible_uids()
LOG.info('Total Lookup user entries: %s', len(eligible_uids))
......@@ -89,7 +94,15 @@ class LookupRetriever(ConfigurationStateConsumer):
'licensed_uids': licensed_uids,
})
self.has_retrieved_users = True
def retrieve_groups(self):
"""
Retrieve information about Lookup groups and institutions.
Must be run after `retrieve_users()`.
"""
# Get a set containing all groupIDs. These are all the groups that are eligible to be in
# our GSuite instance. If a group is in GSuite and is *not* present in this list then it
# is deleted.
......@@ -169,112 +182,73 @@ class LookupRetriever(ConfigurationStateConsumer):
###
# Functions to perform Lookup API calls
###
def get_eligible_uids(self):
"""
Return a set containing all CRSids who are eligible to have a Google account.
"""
return {
uid for uid in [
_extract_uid(person)
for person in self._fetch_all_list_results(
self.person_api_client.person_search,
USER_RESULT_PROPERTY,
self.lookup_config.eligible_user_filter
)
] if len(uid) > 0
}
def get_eligible_groupIDs(self):
"""
Return a set containing all groupIDs that are eligible for Google.
"""
return {
group_id for group_id in [
group.get('groupid', '')
for group in self._fetch_all_list_results(
self.group_api_client.group_search,
GROUP_RESULT_PROPERTY,
self.lookup_config.eligible_group_filter
)
] if len(group_id) > 0
}
def get_eligible_instIDs(self):
@functools.cached_property
def eligible_users_by_uid(self):
"""
Return a set containing all instIDs that are eligible for Google.
Dictionary mapping CRSid to UserEntry instances. An entry exists in the dictionary for each
person who is eligible to have a Google account.
"""
LOG.info('Reading eligible user entries from Lookup')
return {
inst_id for inst_id in [
inst.get('instid', '')
for inst in self._fetch_all_list_results(
self.inst_api_client.institution_search,
INST_RESULT_PROPERTY,
self.lookup_config.eligible_inst_filter
)
] if len(inst_id) > 0
}
def get_managed_user_entries(self):
"""
Return a list containing all managed user entries as UserEntry instances.
"""
search_filter = (
self.lookup_config.managed_user_filter
if self.lookup_config.managed_user_filter is not None
else self.lookup_config.eligible_user_filter
)
return [
person.identifier.value:
UserEntry(
uid=_extract_uid(person),
uid=person.identifier.value,
cn=person.get('registered_name', ''),
sn=person.get('surname', ''),
displayName=person.get('display_name', ''),
givenName=_extract_attribute(person, 'firstName'),
licensed=len(_extract_attribute(person, 'misAffiliation')) > 0,
groupIDs={group.groupid for group in person.groups},
groupNames={group.name for group in person.groups},
instIDs={institution.instid for institution in person.institutions},
licensed=len(person.get('mis_affiliation', '')) > 0,
)
for person in self._fetch_all_list_results(
self.person_api_client.person_search,
USER_RESULT_PROPERTY,
search_filter,
extra_props=dict(fetch=MANAGED_USER_FETCH)
)
]
self.lookup_config.eligible_user_filter,
extra_props=dict(fetch=USER_FETCH)
) if person.identifier.scheme == UID_SCHEME and len(person.identifier.value) > 0
}
def get_managed_group_entries(self):
@functools.cached_property
def eligible_groups_by_groupID(self):
"""
Return a list containing all managed group entries as GroupEntry instances.
Dictionary mapping groupID to GroupEntry instances. An entry exists in the dictionary for
each group that is eligible for Google. Information about eligible users is used to
populate the member list of each group as fetching the member list directly from Lookup
API results in errors for groups with very large numbers of members (exceeds the API
Gateway limit for the response size).
"""
search_filter = (
self.lookup_config.managed_group_filter
if self.lookup_config.managed_group_filter is not None
else self.lookup_config.eligible_group_filter
)
return [
groups = {
group.groupid:
GroupEntry(
groupID=group.get('groupid', ''),
groupID=group.groupid,
groupName=group.get('name', ''),
description=group.get('description', ''),
uids=set([
member.identifier.value for member in group.get('members', [])
if member.identifier.scheme == UID_SCHEME
])
uids=set()
)
for group in self._fetch_all_list_results(
self.group_api_client.group_search,
GROUP_RESULT_PROPERTY,
search_filter,
extra_props=dict(fetch=MANAGED_GROUP_FETCH)
)
]
def get_managed_inst_entries(self):
self.lookup_config.eligible_group_filter
) if len(group.groupid) > 0
}
for crsid, person in self.eligible_users_by_uid.items():
for groupID in person.groupIDs:
if groupID in groups:
groups[groupID].uids.add(crsid)
return groups
@functools.cached_property
def eligible_insts_by_instID(self):
"""
Return a list containing all managed institution entries as GroupEntry instances.
Dictionary mapping instID to GroupEntry instances. An entry exists in the dictionary for
each institution that is eligible for Google. Information about eligible users is used to
populate the member list of each institution as fetching the member list directly from
Lookup API results in errors for institutions with very large numbers of members (exceeds
the API Gateway limit for the response size).
Note that we return GroupEntry instances here since Lookup institutions become groups in
Google, and this simplifies the sync code by allowing us to handle institutions in the same
......@@ -284,46 +258,84 @@ class LookupRetriever(ConfigurationStateConsumer):
allows longer strings, and so will not truncate the name).
"""
# This requires 2 Lookup queries. First find the managed institutions.
search_filter = (
self.lookup_config.managed_inst_filter
if self.lookup_config.managed_inst_filter is not None
else self.lookup_config.eligible_inst_filter
)
managed_insts = [
insts = {
inst.instid:
GroupEntry(
groupID=group.get('instid', ''),
groupName=group.get('name', ''),
description=group.get('name', ''),
groupID=inst.instid,
groupName=inst.get('name', ''),
description=inst.get('name', ''),
uids=set()
)
for group in self._fetch_all_list_results(
for inst in self._fetch_all_list_results(
self.inst_api_client.institution_search,
INST_RESULT_PROPERTY,
search_filter
)
self.lookup_config.eligible_inst_filter
) if len(inst.instid) > 0
}
for crsid, person in self.eligible_users_by_uid.items():
for instID in person.instIDs:
if instID in insts:
insts[instID].uids.add(crsid)
return insts
def get_eligible_uids(self):
"""
Return a set containing all CRSids who are eligible to have a Google account.
"""
return self.eligible_users_by_uid.keys()
def get_eligible_groupIDs(self):
"""
Return a set containing all groupIDs that are eligible for Google.
"""
return self.eligible_groups_by_groupID.keys()
def get_eligible_instIDs(self):
"""
Return a set containing all instIDs that are eligible for Google.
"""
return self.eligible_insts_by_instID.keys()
def get_managed_user_entries(self):
"""
Return a list containing all managed user entries as UserEntry instances.
"""
return [
person for _, person in self.eligible_users_by_uid.items()
if self.lookup_config.managed_user_filter is None
or eval(self.lookup_config.managed_user_filter, {}, person._asdict())
]
managed_insts_by_instID = {g.groupID: g for g in managed_insts}
# Then get each eligible user's list of institutions and use that data to populate each
# institution's uid list.
eligible_users = self._fetch_all_list_results(
self.person_api_client.person_search,
USER_RESULT_PROPERTY,
self.lookup_config.eligible_user_filter,
extra_props=dict(fetch=USER_INST_FETCH)
)
for user in eligible_users:
uid = user.identifier.value if user.identifier.scheme == UID_SCHEME else ''
if len(uid) > 0:
for inst in user.institutions:
if inst.instid in managed_insts_by_instID:
managed_insts_by_instID[inst.instid].uids.add(uid)
def get_managed_group_entries(self):
"""
Return a list containing all managed group entries as GroupEntry instances.
return managed_insts
"""
return [
group for _, group in self.eligible_groups_by_groupID.items()
if self.lookup_config.managed_group_filter is None
or eval(self.lookup_config.managed_group_filter, {}, group._asdict())
]
def get_managed_inst_entries(self):
"""
Return a list containing all managed institution entries as GroupEntry instances.
"""
return [
inst for _, inst in self.eligible_insts_by_instID.items()
if self.lookup_config.managed_inst_filter is None
or eval(self.lookup_config.managed_inst_filter, {}, inst._asdict())
]
def _fetch_all_list_results(self, api_func, result_prop, query, extra_props={}):
def _fetch_all_list_results(
self, api_func, result_prop, query, extra_props={}, limit=DEFAULT_LIST_FETCH_LIMIT
):
"""
Repeatedly make an API call with incrementing offset until no more results are returned,
at which point return all the retrieved results as a single list.
......@@ -331,18 +343,16 @@ class LookupRetriever(ConfigurationStateConsumer):
"""
offset = 0
result = []
all_results = []
while offset <= 0 or len(result) >= LIST_FETCH_LIMIT:
while offset <= 0 or len(result) >= limit:
LOG.info(f'Fetching from Lookup API {result_prop}, offset {offset}')
response = api_func(
query=query, **self.default_api_params, **extra_props, limit=LIST_FETCH_LIMIT,
query=query, **self.default_api_params, **extra_props, limit=limit,
offset=offset
)
result = response.get('result', {}).get(result_prop, [])
all_results.extend(result)
offset += LIST_FETCH_LIMIT
return all_results
for entity in result:
yield entity
offset += limit
def _get_lookup_client(self):
"""
......@@ -362,11 +372,6 @@ class LookupRetriever(ConfigurationStateConsumer):
return LookupApiClient(config, pool_threads=10)
def _extract_uid(person):
identifier = person.identifier
return identifier.value if identifier.scheme == UID_SCHEME else ''
def _extract_attribute(entity, attr):
return next(
(x.value for x in entity.attributes if x.scheme == attr), ''
......
......@@ -57,7 +57,7 @@ deps=
# We specify a specific version of flake8 to avoid introducing "false"
# regressions when new checks are introduced. The version of flake8 used may
# be overridden via the TOXINI_FLAKE8_VERSION environment variable.
flake8=={env:TOXINI_FLAKE8_VERSION:3.6.0}
flake8=={env:TOXINI_FLAKE8_VERSION:3.9.2}
commands=
flake8 --version
flake8 .
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment