FAQ | This is a LIVE service | Changelog

Skip to content
Snippets Groups Projects
Commit e712b2f5 authored by Vitor Trovisco's avatar Vitor Trovisco
Browse files

Upload New File - FullCantoSOP.txt

Comprehensive SOP document for the installation, setup, configuration and use of canto for phenotype curation 
parent d83e464f
No related branches found
No related tags found
1 merge request!6Testing
A. CREATE A VIRTUAL MACHINE
One may need to set a virtual machine to, for example, run a test Canto instance:
1 - Install VirtualBox, a common Virtual Machine creation suite - https://www.virtualbox.org/
2 - Create a Virtual Machine and Install a Linux OS - Debian or Umbutu
a) Use guided installation and follow the default settings
b) Make it bootable
3 - Update the Debian/Umbutu virtual machine
(keep in mind that most of the following terminal instructions will have to be performed as su/sudo)
a) Login and open Terminal
b) install sudo
(i) enable su-mode:
su -
(ii) install sudo by running:
apt-get install sudo -y
su -
c) give sudo rights to your own user
usermod -aG sudo <yourusername>
d) make sure your sudoers file have a sudo group, by using 'visudo':
(i) type:
visudo
(ii) if inexistent, add a sudo group by appending these three lines to the sudoers file - the last line starts with your username:
# Allow members of group sudo to execute any command
%sudo ALL=(ALL:ALL) ALL
<username> ALL=(ALL) NOPASSWD:ALL
(iv) reboot/re-start the system
e) update sudo
sudo apt-get update
f) Install packages to allow apt to use a repository over HTTPS:
sudo apt-get install apt-transport-https ca-certificates curl gnupg2 software-properties-common
Now the virtual machine should be set up and ready for Canto installation
B. INSTALL CANTO
If in sudo mode (by 'sudo -i'), there is no need to use the sudo prefix in the command lines
1 - Install Docker, a compatibility package that allows the same behaviours/outcomes in different operating systems/environments
a) Download/install Docker - https://docs.docker.com/install/linux/docker-ce/debian/
a) Add Docker’s official GPG key:
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
b) Verify that you now have the key with the fingerprint 9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88, by searching for the last 8 characters of the fingerprint.
sudo apt-key fingerprint 0EBFCD88
c) Use the following command to set up a stable Docker repository
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian (lsb_release -cs) stable"
d) update sudo
sudo apt-get update
e) install the latest version of Docker CE
sudo apt-get install docker-ce
f) Verify that Docker CE is installed correctly by running the hello-world image.
sudo docker run hello-world
2 - Follow PomBase's installation instructions @ https://curation.pombase.org/docs/canto_admin/installation
a) the following commands will install and initialise a Canto instance (retrieve the Canto code, start a temporary Docker container and then initialise Canto)
i) the following directory can be anywhere and have any name, but the convention is to call it 'canto-space'
mkdir canto-space
ii) make sub-directories and then git pull 'canto' from PomBase's GitHub repository:
cd canto-space
mkdir data
mkdir import_export
git clone https://github.com/pombase/canto.git
ii) initialise the data directory - must be done from the canto-space directory
cd ..
sudo ./canto/script/canto_start_docker --initialise /data
b) Now the 'data' directory is initialised and the default 'canto/canto_deploy.yaml' configuration file has been created - 'canto_deploy.yaml' will have to be changed according to your needs (see below)
(i) to test that the installation was successful, use the command that starts Canto in conventional mode:
sudo ./canto/script/canto_start_docker
(ii) The running canto instance will be accessible via a web browser at http://localhost:5000/. If a page loads, the installation was successful.
(iii) return to the terminal window and stop Canto with 'Control+C'
3 - Initial setup
a) add the Drosophila melanogaster taxon:
sudo ./canto/script/canto_docker ./script/canto_add.pl --organism "Drosophila melanogaster" 7227 "fruit fly"
b) if the genes to use are not query-able from a database, you must also copy a genes file into (e.g. genes.tsv) and load it (genes are queryable from Chado, so no need for this):
sudo ./canto/script/canto_docker ./script/canto_load.pl --genes /import_export/genes.tsv --for-taxon 7227
c) to add users:
(i) arguments are <user's name>, <user's email address>, <password>, <ORCID> and <user type> (user or admin). e.g.:
sudo ./canto/script/canto_docker ./script/canto_add.pl --person "Susan Testuser" testuser@pombase.org pass 0000-0001-5000-0007 admin
(ii) you should add a mock user, to be used for the bulk import of publications into canto:
sudo ./canto/script/canto_docker ./script/canto_add.pl --person "FlyBase Curator" ignore@flybase.org pass '0000-0000-0000-0000' admin
d) You must create CV terms in Canto's internal database for the triage statuses (publication sub-categories) and priority score range that are going to be needed
(i) examples of triage statuses:
sudo ./canto/script/canto_docker ./script/canto_add.pl --cvterm "Canto publication triage status" "DISEASE"
sudo ./canto/script/canto_docker ./script/canto_add.pl --cvterm "Canto publication triage status" "PHENO"
...
(ii) examples of priority scores:
sudo ./canto/script/canto_docker ./script/canto_add.pl --cvterm "Canto curation priorities" "0" FB:cantoscore0
sudo ./canto/script/canto_docker ./script/canto_add.pl --cvterm "Canto curation priorities" "1" FB:cantoscore1
sudo ./canto/script/canto_docker ./script/canto_add.pl --cvterm "Canto curation priorities" "2" FB:cantoscore2
sudo ./canto/script/canto_docker ./script/canto_add.pl --cvterm "Canto curation priorities" "3" FB:cantoscore3
4 - Load of ontologies
a) copy all ontology files to the 'import_export' sub-directory of 'canto-space'
(i) ontology files can be reached via a URL, in which case there is no need for a local file (see below)extensions
b) Create an 'extension_config.tsv' file that defines the rules for usage of certain qualifiers (extensions in PomBase's lingo) with certain terms; e.g. lethality phenotype annotations cannot include developmental stage qualifiers
(i) instructions can be found here: https://github.com/pombase/canto/wiki/AnnotationExtensionConfig
(ii) 'extension_config.tsv' is the conventional name but it can be different
(iii) the file must live in the 'canto' sub-directory of 'canto-space'
c) add the extension_config filename to the canto_deploy.yaml file, under 'extension_conf_files':
(i) if only one file:
extension_conf_files:
- extension_config.tsv'
(ii) if several files
extension_conf_files:
- extension_config1.tsv
- extension_config2.tsv
- extension_config3.tsv
- extension_config-GO.tsv
- extension_config-pheno.tsv
- ...
d) Because ontology files are very heavy, their loading is very memory-demanding and can fail. To prevent problems, increase the memory available to OWLtools (JAVA) by running:
export OWLTOOLS_MEMORY=20g
e) All needed ontologies must be loaded at the same time, with the following command scheme; '--process-extension-config' is only needed if there are extension_config files with rules for the usage of qualifiers/extensions
sudo ./canto/script/canto_docker ./script/canto_load.pl --process-extension-config --ontology /import_export/<ontology file 1> --ontology /import_export/<ontology file 2> --ontology /import_export/<ontology file ...>
f) Ontologies can also be loaded from a URL, in which case there is no need to a local file in the 'import_export' sub-directory
e.g. sudo ./script/canto_load.pl --ontology http://purl.obolibrary.org/obo/go/go-basic.obo
g) To check that the ontologies have been loaded, one can ask for the list of loaded ontologies by running:
sudo ./canto/script/canto_docker sqlite3 /data/track.sqlite3 'select name from cv'
Canto as a few in-built ontologies, which will also show up.
5 - Add writing permission to the 'canto' folder, so that the configuration file canto_deploy.yaml can be edited
sudo chmod 0777 -R ./canto/
The remaining configuration is done via the canto_deploy.yaml configuration file, which lives in the 'canto' sub-directory.
C. CONFIGURE CANTO IN CANTO_DEPLOY.YAML
??to do??
Instructions to edit canto_deploy.yaml are here: 'https://curation.pombase.org/docs/canto_admin/configuration_file'
i) ATTENTION: spaces in 'namespaces' should be replaced with underscores e.g. 'FlyBase miscellaneous cv' should be 'FlyBase_miscellaneous_cv'
D. START/STOP/RESTART CANTO
Canto can run on conventional mode (a single copy/worker active at a time) or in server mode (several workers/copies running in parallel).
1. Conventional mode:
You can run canto in conventional mode from the 'canto-space' directory.
a) to start run:
sudo ./canto/script/canto_start_docker
b) to stop, press 'Ctrl+C'
Canto running in conventional mode is accessible via a web-browser at 'http://flybase-vm.pdn.cam.ac.uk:5000' (if installed in your local machine, 'localhost:5000')
2. Server mode:
The Canto instance in flybase-vm runs automatically on server mode. This mode has the advantages of working non-stop through updates/restarts/etc because while one of the workers updates/restarts the remaining workers are still active!
a) Enable server mode:
This mode needs two files that can be found in the repository https://gitlab.developers.cam.ac.uk/jwrn3/pdn-canto-config:
(i) 'canto-docker-initd' - defines the filepath of the 'canto-space' directory, the number of simultaneous canto copies (i.e. workers) that will run and the 'start', 'stop' and 'restart' actions
1. ensure that the CANTO_SPACE variable has the correct path to the 'canto-space' directory
2. have identical copies of the file in the '/sbin/' directory and in the 'canto' sub-directory of 'canto-space' and make them executable.
If the starting file is in 'canto-space':
cp canto-docker-initd /sbin/
chmod a+x /sbin/canto-docker-initd
cp canto-docker-initd ./canto/etc/
chmod a+x ./canto/etc/canto-docker-initd
rm canto-docker-initd
(ii) 'canto_for_etc-initd' - calls/executes 'canto-docker-initd'
1. move 'canto_for_etc-initd' to the '/etc/init.d/' directory, rename it 'canto' and make it executable.
If the starting file is in 'canto-space':
mv canto_for_etc-initd /etc/init.d/canto
chmod a+x /etc/init.d/canto
b) Start/Stop/Restart:
To start, stop or restart canto respectively run:
/etc/init.d/canto start
/etc/init.d/canto stop
/etc/init.d/canto restart
Canto running in server mode is accessible via a web-browser at 'http://flybase-vm.pdn.cam.ac.uk:7000' (if installed in your local machine, 'localhost:7000')
3. Find if canto is running:
Because canto is Dockerised, starting canto (conventional or server mode) will create an active canto instance within a docker container.
a) To know if canto is active, first find all active docker containers:
docker ps -a
The canto container will have the image '/pombase/canto-base:v12';
b) Knowing the id of the canto container provides an alternative way of stopping or restarting canto:
docker stop <container_id>
docker restart <container_id>
c) If no such container exists, then canto is not running and you can start it again, as described above in D.2.b.
E. AUTOMATED RUN OF CANTO IN FLYBASE-VM
The canto instance in flybase-vm is set to be on autopilot, as it goes through updates and data exports which are performed by automated weekly runs of two scripts: weekly_routine.sh and weekly_export.sh
1. weekly_routine.sh - Mondays, 1am
Runs Mondays at 1am and performs an array of tasks to keep the canto instance as much in sync with the epicycle as possible:
a) checks for changes in ontologies (function 'update_obo_file'), by comparing the local ontology files (in 'canto-space/import_export') with the source files (in '/data/export/curfiles/ontologies/trunk/').
(i) If any ontology changes:
1. the local file/s is/are updated
2. the FBbt-GO.obo file, which merges the FBbt and GO CC ontologies, is re-created
3. the 'extension_config.tsv' file, which configures the rules of usage of qualifiers (extensions in PomBase's lingo), is recreated by an R script
4. the ontologies and the rules for qualifier usage (set in 'extension_config.tsv') are reloaded
b) checks for new version of the production chado (function 'check_if_canto_restart_required'), by comparing the newest version number in fbadmin@deneb.pdn.cam.ac.uk:instance/canto_done with the one in use.
(i) If the database version changes:
1. the version number is updated in the configuration canto_deploy.yaml file
2. the support script 'get_fbrfs_to_add_to_canto.pl' creates (and archives) a 'fbrf_input_list.tsv' file with the list of newly thin-curated publications submitted to '/data/export/staging/'
3. the support script 'canto_json_input_maker.pl' creates (and archives) a 'import-fbrfs.json' file, with all the data associated with each of the publications in the aforementioned 'fbrf_input_list.tsv' file
4. the data in 'import-fbrfs.json' is loaded into canto
5. then memory/cache is refreshed and canto resta
c) all activity log from weekly_routine.sh is recorded in the 'canto_weekly_update.log' file, stored in the 'logs' subdirectory of 'canto-space'
2. weekly_export.sh
Runs Wednesdays at noon to exports approved session from Canto:
a) exports all 'APPROVED' sessions in canto into a 'canto_server_export_latest.json' file in the 'import_export' sub-directory of 'canto-space'
b) the 'APPROVED' sessions in canto become relabelled as 'EXPORTED'
c) the 'canto_server_export_latest.json' file also gets archived as 'canto_server_export_<date>' in the 'archive' sub-directory of 'canto-space'
d) all activity log from this weekly export is recorded in the 'canto_weekly_export.log' file in the 'logs' subdirectory of 'canto-space'
If you have to run the script manually, run it from the 'canto-space' folder and using 'sudo'. If the sudo mode is enabled (by 'sudo -i'), there is no need to prefix the commands with 'sudo'.
cd /data/export/canto-space
sudo bash weekly_export.sh
3. Logs
If the log files 'canto_weekly_update.log' and 'canto_weekly_export.log' files do not show the expected activity, the scripts have not worked, which may mean the timer functions have been disabled or flybase-vm has crashed, which will have to be fixed. If flybase-vm has crashed, canto will also have to be started again (see D.2.b above)
The scripts can still be run manually, from 'canto-space' and using 'sudo'. If the sudo mode is enabled (by 'sudo -i') there is no need for the sudo prefix:
cd /data/export/canto-space
sudo bash weekly_routine.sh
or
cd /data/export/canto-space
sudo bash weekly_export.sh
F. IMPORT OF PUBLICATION DATA INTO CANTO
1. Automated import of publication data into Canto at flybase-vm
The automated import of publication data into Canto is included in tasks of weekly_routine.sh, a script that is set to run automatically every Monday at 1am (see E.1 above)
If you have to run the script manually, run it from the 'canto-space' folder and using 'sudo'. If the sudo mode is enabled (by 'sudo -i'), there is no need to prefix the commands with 'sudo'.
cd /data/export/canto-space
sudo bash weekly_routine.sh
2. Manual import of publication data into Canto at flybase-vm
You can manually import publication-associated data into canto:
a) create a .tsv file with the list of publications to import into canto, in the form of one FBrf plus a tab character per line; e.g. FBrf0100000/t/n
(i) the list can be manually made/custom
(ii) or it can be automatically-generated from the newly thin-curated publications, using the support script 'get_fbrfs_to_add_to_canto.pl', which extracts the FBrfs ids of all 'thin' records in staging); the .tsv file must live in the 'canto-space' directory
sudo /usr/bin/perl /data/export/support_scripts/get_fbrfs_to_add_to_canto.pl /data/export/support_scripts/modules_server.cfg > <pub_list.tsv>
(ii) create a .json file with all relevant data from each of the publications in the aforementioned .tsv file, by applying the the support script 'canto_json_input_maker.pl'; the json file must live in the 'import_export' sub-directory.
sudo /usr/bin/perl /data/export/support_scripts/canto_json_input_maker.pl /data/export/support_scripts/modules_server.cfg <pub_list.tsv> > /import_export/<import_file.json>
(iii) load that publication-associated data into canto:
sudo ./canto/script/canto_docker ./script/canto_add.pl --sessions-from-json /import_export/<import_file.json> <curator_email@example.com> <taxon_id>
e.g. sudo ./canto/script/canto_docker ./script/canto_add.pl --sessions-from-json /import_export/import-fbrfs.json "ignore@flybase.org" 7227
This loading step will:
1. import publications and associated data into canto
2. or update those publications already present in canto, namely:
(i) genes (list and names)
(ii) alleles (list and names)
(iii) triage status/publication type ('HIGH PRIORITY', 'DISEASE', etc)
G. EXPORT OF CURATION FROM CANTO (& CONVERSION TO PROFORMAE RECORDS)
1. Automated export of approved sessions from Canto at flybase-vm
The automated export of approved sessions from Canto is performed by weekly_export.sh, set to run every Wednesday at noon (see F.1 above)
2. Manual export of approved sessions from Canto (export vs dump)
a) Manual export of approved sessions
If for some reason the 'weekly_export.sh' script does not work, you can run the command lines below to get an export file, that you can then archive; if the sudo mode is already enabled (by sudo -i) there is no need for the sudo prefix.
cd /data/export/canto-space
sudo ./canto/script/canto_docker ./script/canto_export.pl canto-json --export-curator-names --export-approved > <filepath>+<filename>
cp <filepath>+<filename> /data/export/canto-space/archive/<filename><date>
b) Manual dump of approved sessions (without marking sessions as exported!)
An alternative way to get the curation data in json format without formally exporting the approved sessions (i.e. without sessions becoming labelled 'exported' in canto), you can run the command lines below, but keep in mind that they will continue to be suitable for export in the future, so be careful not to submit the same publication data twice!
If the sudo mode is already enabled (by sudo -i) there is no need for the sudo prefix.
cd /data/export/canto-space
sudo ./canto/script/canto_docker ./script/canto_export.pl canto-json --export-curator-names --dump-approved > <filepath>+<filename>
3. Convert export json file into proformae files
To convert the import json file into proformae you need to use the support script 'canto_json_output_parser.pl':
/usr/bin/perl /export-vm/support_scripts/canto_json_output_parser.pl /export-vm/support_scripts/modules.cfg <filepath>+<filename>>
This will create a 'temp_record_folder' in your home directory with corresponding .phen records for all the approved/exported sessions in canto. These .phen records can now be copied into your sub-folder /export-vm/records/ where they can be Peeves-checked and submitted.
H. CURATION
1. Access Canto tool
Canto is a web-based tool accessible using any browser.
The canto instance in flybase-vm runs automatically on server mode, accessibile at:
'http://flybase-vm.pdn.cam.ac.uk:7000'
('localhost:7000' if running on a local/test instance)
If for some reason it needs to run on conventional mode, it can be accessibile at:
'http://flybase-vm.pdn.cam.ac.uk:5000'
('localhost:5000' if running on a local/test instance)
2. Log in/admin mode
a) Admin login
(i) click on the wheel icon on the top-right of the page and select 'Admin login'
(ii) then type your user Id and password to login
(iii) click again on the wheel icon on the top-right and select 'Advanced mode' for enhanced features
Logging in as admin enables enhanced features including viewing/adding internal notes and creating diploid genotypes
3. Curation management
a) View all loaded publications:
(i) click on the wheel icon on the top-right of any page and select 'Admin pages'
(ii) On the Reports section, you can navigate through all loaded publications by triage status (e.g. 'high priority', 'disease', 'pheno') and curation state ('active', 'approved', 'exported')
(iii) click on any of the links to view a table with the list publications of the selected type
b) Select a publication:
To go to a publication's detail page, either:
(i) click on a PMID link
(ii) or search for a specific PMID
c) Start curating a publication
On the publication's details page, click on the 'Go to the curation session' link
d) Remove a curation session
Occasionally, data import for a publication previously loaded into canto will create a second a session for that publication.
To remove the redundant session:
(i) go the publication's details page
(ii) click on the 'Curs key' link to go to the curation session's details page
(iii) click on the 'Remove this session' button
I. LOG OF CANTO'S ACTIVITY - canto_running_log.sh
# comment 1: ??Kim is working on a system to continually logging canto's curation activity into an easy-0access file, so any of the following instructions may soon become unnecessary, or only needed if Kim's logging system fails
# comment 2: ??for Gillian: Running any of the commands or the script needs admin access/sudo, but it may be useful to make it available to any user, to help troubleshoot any problems
Canto can run on conventional mode (a single copy/worker active at a time) or in server mode (several workers/copies running in parallel).
The Canto instance on flybase-vm runs automatically on server mode, which, unlike the conventional mode, shows minimal activity log in the terminal window, making it hard to troubleshoot any issues.
Because canto is Dockerised a running canto instance will be in a docker container, which allows retrieving all activity log by the 'docker logs' command.
The canto_running_log.sh script will:
a) identify the canto docker container
b) and use the 'docker logs' command to retrieve and print all the activity messages of that container to the 'canto_server_running.log' file, which will be stored in the 'logs' sub-directory of 'canto-space'.
The script lives in and should be run from the canto-space directory. It must be run in sudo mode, so if it already enabled (by sudo -i) there is no need for the sudo prefix:
cd /data/export/canto-space
sudo bash ./canto_running_log.sh
The terminal message "Canto not running - no log messages to show" means that canto is not running, so there is no logged activity to retrieve.
The terminal message "Canto running on container <container_id> - see canto_server_running.log for the log messages", means that canto is running and all the log is stored in the file 'canto_server_running.log', but debugging is needed. Please check whether both copies of the file 'canto-docker-initd', in '/sbin/' and './canto/etc/', have the variable CANTO_SPACE with the correct filepath for the 'canto-space' directory.
You can also run the 'docker logs' command alone, and from any directory; if you are already in sudo mode there is no need for the sudo prefix:
a) For all the activity log since the canto container was started, you can run
sudo docker logs canto
b) For all the activity log in paginated form, run:
sudo docker logs canto | less
c) You can run the command with the "-f" option, which will show you the most recent message and will update as more messages become available:
sudo docker logs -f canto
If any of these commands gives the terminal message 'Error: No such container: canto' it usually means that Canto is not running, which may explain any problems when using Canto. But it can also mean that the canto server is running but not set up correctly (e.g. wrong location), which will need debugging.
J. QUICK SET-UP OF A CANTO INSTANCE FOR PHENOTYPE CURATION
To quickly setup a canto instance you can use the starting pack in the repository - https://gitlab.developers.cam.ac.uk/jwrn3/pdn-canto-config
1. If a virtual machine is needed, please follow the instructions above, in A.
2. Follow instructions above to install Canto - B. INSTALL CANTO
3. Clone repository 'https://gitlab.developers.cam.ac.uk/jwrn3/pdn-canto-config' into the 'canto-space' directory set up in step B.2
4. make sure all file-paths to 'canto-space' are accurate in:
a) file 'weekly_routine.sh' - variable 'CANTOSPACE'
b) file 'weekly_export.sh' - variable 'CANTOSPACE'
c) file 'canto_running_log.sh' - variable 'CANTOSPACE'
d) file 'canto-docker-initd' - variable 'CANTOSPACE'
5. configure canto to your requirement by editing the canto_deploy.yaml file from the repository
a) make sure the 'Model::ChadoModel:' field points to the desired database and that the user and password are accurate
6. run 'starting-pack.sh' in sudo mode
7. ??don't know how to make the changes to enable the timed execution of 'weekly_routine.sh' and 'weekly_export.sh'??
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment