GeneList

Overview

GeneList is the heart of the GenIE-Sys; this will be the entry point to many of the tools and workflows. Foundation to entire GenIE-Sys database has been designed based on GeneList tables. Tables that are started with gene_ or transcript_ prefixes are considered as GeneList tables. GeneList tables consist of two types of tables according to our vocabulary. The first one is primary tables and the second one is annotation tables. transcript_info and gene_info tables are considered as primary tables and rest of the GeneList tables are known as annotation tables.

GeneList tables

Primary tables

There should only be two primary tables (transcript_info and gene_info) in GenIE-Sys database. Primary tables keep basic gene and transcript information. Since the smallest data unit is based on transcript ids or gene ids, all primary tables are used transcript_i/gene_i as a primary key.

Loading data into the primary tables can be easily accomplished using dedicated scripts listed on geniesys/scripts folder. First, we need to find corresponding GFF3 and FASTA files related to the species that we are going to load into the GenIE-Sys.

Creating Primary tables

⚠️ You do not need to create the following tables separately, instead use this script to create all tables at once. Then move to load data into Primary tables section.

The following example will show you how to load basic information into the primary tables.

Loading data into Primary tables

head input/Potra01-gene-mRNA-wo-intron.gff3

Use GFF3 file and generate source input file to load into gene_info mysql table.

Use GFF3 and generate source input file to load into transcript_info mysql table

results file(transcript_info.txt) looks like following

Two files are ready for loading into the primary tables. load_data.sh script can be used to load them into the database and load_data.sh script can be found inside geniesys/scripts folder.

Following two lines will load transcript_info.txt and gene_info.txt files into respective tables.

Now we just need to fill the description column in gene_info and transcript_info tables. Therefore, we need files similar to following example.

There is a script called update_description.sh in geniesys/scripts folder. The script looks like following.

We can use update_description.sh script to load description into gene_info and transcript_info tables.

Finally update the gene_iin transcript_info table using update_gene_i.sh.

Run the following command

Annotation tables

Whenever a user needs to integrate new annotation field into the GeneList, it is possible to create a new table which is known as the annotation table. The user can create as many annotation tables depend on their requirements.

Loading data into the annotation tables can be easily done using corresponding scripts listed on geniesys/scripts folder. First, we need to create the source file to fill the annotation table. The source file should contain two fields. The first field should be either a gene_id or transcript_id and the other fields should be the annotation.

Load data into transcript_[go/pfam/kegg] tables

Now we need to create a MySQL Annotation table to load Best BLAST results.

Previous load_data.sh script can be used to load Best BLAST results to transcript_atg table.

Finally update the transcript_i in transcript_atg table using following script.

Run the following command to update transcript_i

Load data into gene_[go/pfam/kegg] tables

Although it is recommended to have all the annotation are based on transcript IDs, sometimes we may have annotation with gene IDs. Following example will show you how to load gene ID-based annotation files into GenIE-Sys database.

Load data into gene_[go/pfam/kegg] tables

As you see in the above example, one gene ID associated with several Gene ontology IDs. Therefore, we need to format the above results in the right format. Following parse.py script can be used. Now we need to create MySQL Annotation table to load GO results.

Then the output will be similar to following.

Now we need to create a table to load newly generated annotation data.

Previousy used load_data.sh script can be used to load go_gene results to gene_go table.

Finally update the gene_i in gene_go table using following script.

Run the following command to update gene_i

Installation

  1. Download the genelist.zip file and unzip into the plugins directory.

  2. Edit database details in services/settings.php file.

Usage

Navigate to http://[your server name]/geniesys/genelist

Last updated

Was this helpful?