GDPR: allow more types of pruning
We want to be able to prune data from old incomplete registrations. These can't be found by looking up foreign key references to person(id). This adds a collection of additional pruning queries to be run at the end of the per-person pruning which can have arbitrary SQL queries defined to determine what to delete.
In order not to duplicate the code for producing reverse files and counting the deleted rows the function that deletes rows with person(id) foreign keys has been reworked so that it can be used to do the work for both cases; it now gets passed an SQL query to find the rows to remove and some information about the schema/table it needs to remove from and the column it should search for. It uses those to construct the reversing and deleting queries. This then gets called with the appropriate parameters by both the per-person 'delete whole rows' part of the code, and at the end for the 'additional pruning' code.
Closes #3 (closed)
Merge request reports
Activity
203 "role_enrol_audit": { 204 "query": sql.SQL("SELECT l.id FROM leave_recording.role_enrol_audit l JOIN post_history p ON l.post_id = p.id WHERE p.person_id = {person_id}"), 205 "key_column": "id" 206 }, 207 "role_enrol": { 208 "query": sql.SQL("SELECT l.id FROM leave_recording.role_enrol l JOIN post_history p ON l.post_id = p.id WHERE p.person_id = {person_id}"), 209 "key_column": "id" 210 }, 211 "pending_leave_request": { 212 "query": sql.SQL("SELECT l.id FROM leave_recording.pending_leave_request l JOIN post_history p ON l.post_id = p.id WHERE p.person_id = {person_id}"), 213 "key_column": "id" 214 }, 215 "mm_role_shared_calendar_group": { 216 "query": sql.SQL("SELECT l.id FROM leave_recording.mm_role_shared_calendar_group l JOIN post_history p ON l.post_id = p.id WHERE p.person_id = {person_id}"), 217 "key_column": "id" 218 }, - Comment on lines +190 to +218
@ajh221 I finally got around to updating the GDPR purging code so it can handle more sophisticated purging where given a person.id it runs a custom query to identify rows in other tables it should remove. I've pulled out what I think are the right queries for the leave recording schema in order to remove a person's leave when we purge their records. Could you sanity check it please?
assigned to @ajh221
@ajh221 I've updated the constraint on leave_audit_notified, added the missing tables keyed on post_history.id and given gdpr_purge rights on them. I haven't tried to do anything about rows in tables keyed with person.id foreign keys, but given the way you've designed it I'm not sure we need to. The rows with a person.id foreign key seem to refer to things the referred to person did in a management role such as approving other people's leave, managing shared calendars, delegating approval rights. I think those are things we wouldn't want to remove just because that person had been gone seven years (or whatever) - they aren't personal data of that person.
Maybe there should be a policy of removing leave records older than a certain number of years whether the people involved are still here or not? I wouldn't like to say how long that should be without consulting with HR.
I notice there's a 'generic_audit' table and haven't tracked down what populates it. This probably needs to have old data cleaned out too and again I'm not sure what the time limit should be, although seven years as we do on the main audit table is probably plenty.