Instruction manual for MetClassifier
Introduction
Though the chemical structures of around 50,000 secondary metabolites are known information on their pathways is very limited. In this work we have developed an algorithm to predict metabolic pathways on the basis of chemical structures of metabolites by exploiting the information contained in their cyclic substructures. Also to handle huge amount of these metabolites and to predict metabolic pathways automatically, we have developed a software tool called MetClassifier. MetClassifier is written in C and used the OpenGL, GULT and AntTweakBar libraries.
Paper
1 K. Tanaka, K. Nakamura, T. Saito, H. Osada, A. Hirai, H. Takahashi, S. Kanaya, Md.Altaf-Ul-Amin Metabolic pathway prediction based on inclusive relation between cyclic substructures., Plant Biotechnology, 26 459-468 (2009)
1. Preparation
1.1 Download MetClassifier
The compressed file MetClassifier.zip can be downloaded by clicking the following hilighted link(MetClassifier.zip).
Download
For Windows OS
Current version : MetClassifier.zip
Plant Biotechnology version : 20090623_MetClassifier.zip
1.2 Uncompressing MetClassifier.zip
When the MetClassifier.zip file is uncompressed, user can find under the 'MetClassifier' directory, four different directories namely 'Database', 'input', 'output' and 'Testdata' and direct link libraries, 'glut32.dll' and 'AntTweakBar.dll', and an executable file, 'MetClassifier.exe' as follows:

2. Metabolic pathway prediction based on inclusive relation between cyclic substructures
The detail of prediction algorithm are written in Paper(1).
Required execution times mentioned in this manual correspond to the machine with following specifications: Intel core2 Duo procesor T7600 2.33GHz, 1GB RAM
2.1 Execution of MetClassifier
User can start by clicking the executable file 'MetClassifier.exe'.
The main window appears as given below.

2.2 Opening a Database
By clicking 'Menu->Database->Open', File Chooser Dialog is opened.

As an example here we select 'MetClassifier/Database/KNApSAcK_20090415_version3.dat'


Then automatically a seperate window entitled 'Compound' is opened, 'Information bar' is added to 'Main Window' and related menus are added to 'Menu bar'.


Tips
MetClassifier uses chemical structural data format of MDL Molfile and Sdfile (Symyx, http://www.symyx.com/).
If user have original Molfiles or a Sdfile, he/she can make a new database as follows:
1. Click 'menu->database->New(Molfile)' or 'menu->database->New(Sdfile)'.
2. Sellect the directory containing the Molfiles or the Sdfile.
3. Click 'Menu->Database->Save' and File Chooser Dialog is opend.
4. Input desired database name.
5. Click 'Save' button in File Chooser Dialog.
Caution
To create new database from a set of Molfiles, all Molfiles have to exist in the same directory.
Before creating a new database, close the carrent database by selecting 'Menu->Database->Close'.
2.3 Decide target metabolites
In paper(1) we considered all metabolites in KNApSAcK for pathway prediction. However if user is interested to consider specific metabolites, please execute following operations.
1. Search target metabolites by search functions 'Menu->Compound Window->Search->xxx'.
2. Choose metabolites as accepted metaboites by 'Menu->Compound Window->Refine->Accept'.
If user has 'CompoundID List' or 'Species List', he/she can choose target metabolites as follows:
1. Click 'Menu->Compound Window->Refine->(Compound ID List) or (Species List)'.
2.4 Unique cyclic subgraph extraction
Mode of cyclic subgraph extraction can be chosen by selecting 'Menu->Compound Window->Extract->Extract Mode->CyclicSubstructure'.

Then mode of extraction and related options are set automatically.

Cyclic substrcutres are extracted by clicking 'Menu->Compound Window->Extract->Execute'.
It takes about 20 seconds in case of extracting cyclic subgraphs of all metabolites of KNApSAcK.

Then extracted cyclic substructures are automatically displayed in a seperate window entitled 'Substructure' .

2.5 Determination of inclusive relations between metabolites
A popup menu can be displayed by right clicking on 'Substructure window'.
By sellecting 'Analysis->1.Make Fingerprint->Inclusive relation of Substructure', inclusive relations between cyclic structures of metabolites are determined.
In case of KNApSAcK database this process takes about 5 minutes.

2.6 Determination of parent-child relations between metabolites
Same as before a popup menu can be displayed by right clicking on 'Substructure window'.
By sellecting 'Analysis->3.Find parents->Inclusive relation' parent-child relations between cyclic structures of metabolites are determined.
For KNApSAcK database this process takes less than 1 second.

2.7 Output
The output of MetClassifier are predicted metabolic pathways in terms of parent-child relations between cyclic structures of metabolites. User can save the following two types of outputs.
(a) Pathways related to sellected species
This type of output requires species-metabolite relations and presently only possible if KNApSAcK_20090415_version3.dat file is used as database (notice Opening a database i.e. section 2.2 above).A popup menu can be displayed by right clicking on 'Substructure window'. By sellecting 'File->Output: Pathway with # of organism' a file chooser dialog can be opened from where a file containing a species-list can be selected and that displays another file chooser dialog with save-mode using which the output can be saved. In the following figures this process is shown for Camellia sinensis.




(b) Pathways involving all metabolites
Alternatively the predicted metabolic pathways involving all metabolites can be saved as output. This can be done by sellecting 'File->Output: Pathway' using a pop up menu obtained by right clicking on 'Substructure window' that displays file chooser dialog with save-mode using which the output can be saved as shown in the figure below.

2.8 Output file dataformat
If user saves the output according to process(a) of Section 2.7 then the dataformat is as follows:
1st column is cyclic substructure ID of parent
2nd column is cyclic substructure ID of child
3rd column is layer number of parent cyclic ID which is useful for visualizing hierarchical pathway,
4th column is layer number of child cyclic ID
5th column is number of species to which both parent and child belong based on all KNApSAcK data,
6th column is number of species to which both parent and child belong among the sellected species.
If user saves the output according to process(b) of Section 2.7 then the output file contains 1st to 4th columns but not the 5th and the 6th columns
MetClassifier
Copyright (C) 2006-2009 Kenichi Tanaka
All rights reserved.