Table Of Contents

Previous topic

SPInDel Concept

Next topic

Screenshots

This Page

SPInDel workbench

Workbench

The SPInDel workbench is a computational platform to facilitate the planning and management of SPInDel projects, alignment of nucleotide sequences, visualization and selection of conserved regions, calculation of PCR primers properties, prediction of SPInDel profiles and diverse statistical and phylogenetic analyses. It includes a large dataset comprising nearly 1,800 numeric profiles for the identification of eukaryotic, prokaryotic and viral species.

SPInDel - version 1.1 Documentation (1 February 2012)

1. About

SPInDel workbench version 1.1

Population Genetics group (http://www.portugene.com)

IPATIMUP - Institute of Molecular Pathology and Immunology of the University of Porto, Portugal (http://www.ipatimup.pt)

Copyright © 2009, 2010, 2011, 2012 by IPATIMUP. All rights reserved. Software developed by João Carneiro and Filipe Pereira

2. License Agreement

Terms of license:

The SPInDel workbench is provided “as is”, “with all faults” and without any express or implied warranty. In no event shall the authors or IPATIMUP be held liable for any damages arising out of the use of or inability to use this software, even if its authors or IPATIMUP has been advised of the possibility of such damages. If you do not want to accept the terms of this license, you must not install the SPInDel workbench. By choosing to install this software you are accepting these terms.

3. System requirements

  • Windows 95/98/NT/2000/XP/VISTA/7 or Linux.
  • At least 190 Mb free hard disk space.
  • A minimum of 64 MB of RAM.

4. Installation

Windows 95/98/NT/2000/XP

Windows VISTA/7

  • Download the SPInDelv1.1.exe file from http://www.portugene.com or http://sourceforge.net/projects/spindel/files/ to any directory.
  • Execute the SPInDelv1.1.exe and run the Installation Wizard with administrative privileges. If you install SPInDel workbench in the predefined directory “C:\Program Files” or any other directory that does not give you administrative privileges and you are experiencing problems executing the program, please click mouse right button in SPInDel workbench shortcut and select properties. Go to compatibility tab and mark the box ‘Run this program as an administrator’.

Linux

5. General Features

-> Projects viewer

Displays current SPInDel projects.

  • ‘New project’ button: Creates a new project by loading a DNA sequence alignment in the FASTA format (projects can be added or removed at any point).
  • ‘Remove project’ button: Deletes the current selected project running on the SPInDel workbench.

-> Alignment editor

Displays the DNA sequence alignment from the current loaded project.

  1. SPInDel project box:

    • ‘Undo all changes’ button: Undo all previous changes made on a SPInDel project.
    • ‘Save project’ button: Saves all alterations made on a project.
    • ‘Add sequences’ button: Adds sequences to a project (sequences must be in a FASTA file).
    • ‘Remove sequences’ button: Removes selected sequences from current project.
  2. Conserved region box:

    • ‘Add’ button: Adds a conserved region in current project.
    • ‘Remove’ button: Removes a conserved region.
  3. Profiles:

    • ‘Calculate profiles’ button: Retrieves the list of numeric profiles defined by selected conserved regions (see theoretical background for details on calculations).
  4. Graphic options - Shows basic features of the current sequence alignment:

    • ‘Track 1’ combobox: Selects the current alignment feature to be displayed in track 1.
    • ‘Track 2’ combobox: Selects the current alignment feature to be displayed in track 2.
    • ‘Window (track 2)’ combobox: Select window length to be used in the feature displayed in track 2.
    • ‘Step (track 2)’ combobox: Selects the step value to calculate the feature displayed in track 2.
    • ‘In (Zoom box)’ combobox: Zoom in selected column range in graphical display of track 2.
    • ‘Out (Zoom box)’ combobox: Zoom out column range in graphical display of track 2.

-> SPInDel alignment options

Perform sequence alignments using PyCogent TreeAlign.

-> SPInDel profiles frame

Shows profiles and general statistics.

  1. Hypervariable regions box:

    • ‘Undo changes’: Recalculates general statistics using all regions defined in the alignment.
    • ‘Remove selected’: Recalculates general statistics using unselected columns.
    • ‘Remove unselected’: Recalculates general statistics using selected columns.
  2. SPInDel calculations box:

    • ‘Region by region’ button: Calculates the frequency of species-specific alleles and average pairwise differences for each hypervariable region.

    • ‘Mismatch distribution’ button: Calculates the number of pairwise differences between all profiles.

    • ‘UPGMA tree’ button: Calculates the UPGMA tree using the matrix of pairwise differences between profiles.

    • ‘Primers properties’ button: Calculates several PCR primers properties (sequence length, Tm, GC content).

      • ‘Export primers’ button: Exports PCR primers properties in excel csv format.
    • ‘Combinations’ button: An algorithm generates m-combinations without repetition, which are subsets of m distinct elements of the set of all possible regions. For each m-combination, all Nsp and Ndp values are displayed on tables and graphs. The algorithm also included a ‘multiplex PCR option’ to retrieve only m-combinations not sharing conserved regions.

    • ‘Search profile’ button: Identifies an unknown profile in current databases.

    • ‘PCA analysis’ button: Performs a principal component analysis using profiles matrix.

  3. SPInDel exporter:

    • ‘Profiles’ button: Exports profiles in the excel csv format.
    • ‘Pairwise matrix’ button: Exports the matrix of pairwise differences in the excel csv format.
    • ‘General statistics’ button: Exports general statistics in text format.
    • ‘UPGMA tree’ button: Exports the UPGMA tree in the newick format.
    • ‘PCA’ button: Exports principal component analysis results.
    • ‘Print’ button: Prints profiles and general statistics.

-> SPInDel profiles evaluation frame

Show results of f(ts) and f(dp) for combinations of profiles with n regions:

  1. SPInDel profiles box:

    • ‘Standard’ or ‘Multiplex PCR’ combobox : Filters hypervariable regions for standard SPInDel profiles (defined by all conserved regions) or multiplex PCR SPInDel profiles (only hypervariable regions not sharing conserved regions).
    • ‘Graph’: Displays a gaphic representation of f(sp) for all profiles from 1 to n regions.
  2. Exporter tools box:

    • ‘PCR primers’ button: Exports PCR primers to a *.csv excel file.
    • ‘Tables’ button: Exports tables with f(sp) and f(dp) values.

-> SPInDel search frame

Identifies unknown samples in the current database.

  1. Profiles box:

    • ‘Add’: Adds target profile.
    • ‘Remove’: Removes profile.
    • ‘Search’: Retrieves profiles from the database equal or similar to the target profile.
  2. k-nearest neighbor box:

    • Combobox: Selects the k value to use in k-nearest neighbor calculations.
    • ‘Cross-validation’ button: Gives the prediction accuracy of k-nearest neighbor model for the selected k value.
The SPInDel software was written in PYTHON 2.6 using Biopython (http://biopython.org/), SciPy (http://www.scipy.org/),
The graphical interface was created using the VisualWX Rapid Application Development (RAD) environment (http://visualwx.altervista.org)
and Eclipse platform to debug and test the software.
A single EXE file was created using the Inno Setup software (http://www.jrsoftware.org/isinfo.php) for installation purposes.

6. Theoretical background

Alignment calculations (Identity, GC and AT content, GC and AT skews)

An identity value is plotted for each nucleotide position by estimating the frequency of the most common nucleotide in that position (indels are ignored). Conserved regions can be easily identified by observing the graphic output for identity values (highest conservation represented in green and lowest represented in red) and can be defined directly in the alignment window using column selection. GC and AT skews and content were implemented using the GenomeDiagram Utilities.

Alignment algorithm

Projects with aligned sequences can be uploaded, although alignments can also be done with the Pycogent progressive alignment implemented on the workbench. The user can select among different nucleotide substitution models (JC69, F81, HKY85 and GTR) to perform the alignment.

Calculations on SPInDel profiles (pairwise differences, mismatch distribution, f(ts), f(sh) and f(dp) )

‘SPInDel conserved regions’: regions with no or small variability at the sequence level.

‘SPInDel hypervariable regions’: regions containing multiple indels across species that potentially allow for differentiation by the determination of sequence length.

‘Standard SPInDel profile’: the combination of the fragment length of all contiguous SPInDel hypervariable regions observed in a sequence.

‘Multiplex PCR SPInDel profile’: similar to a standard profile but only including SPInDel hypervariable regions that do not share the same conserved region.

‘Species-specific SPInDel profiles’: profiles that are only found in one species within a taxonomic group and allow their unequivocal identification.

‘Frequency of species-specific SPInDel profiles’:

fnG= Nsp/N,

where G denotes the taxonomic group under investigation according to a two-letter code, n is the number of SPInDel hypervariable regions included on the profile, Nsp is the number of species-specific SPInDel profiles and N is the total number of sequences represented on group G.
'Number of species-shared profiles' (Nsh): number of profiles that were found in more than one species inside a taxonomic group.

'Average number of pairwise differences':

pnG=(∑Nk=1Nl>kdkl )/(N(N-1)/2)

where k and l are indices that refer to individual SPInDel profiles, dkl is the number of SPInDel hypervariable regions (from the total set of n) that differ in length between profiles k and l, and N is the total number of sequences represented in group G.

'Average number of pairwise differences per locus':

(pnG)/n,

where n is the number of loci (i.e., hypervariable regions).

UPGMA tree

UPGMA (Unweighted Pair Group Method with Arithmetic mean) is used to build a guide tree to discriminate between species in each database. The distance between any two profiles A and B is taken to be the average of all distances between pairs of hypervariable regions “x” in A and “y” in B, that is, the mean distance between elements of each profile. The Pycogent UPGMA algorithm is used to cluster profiles based on the dissimilarity matrix obtained from the number of differences between profiles in each database.

Primers properties

Calculations on PCR primers were implemented using Oligocalc. For sequences less than 14 nucleotides,

Tm= (wA+xT)*2 + (yG+zC)*4 - 16.6*log10(0.050) + 16.6*log10([Na+])

where w,x,y,z are the number of the bases A,T,G,C in the sequence, respectively. The term 16.6*log10([Na+]) adjusts the Tm for changes in the salt concentration, and the term log10 (0.050) adjusts for the salt adjustment at 50 mM Na+. Other monovalent and divalent salts will have an effect on the Tm of the oligonucleotide, but sodium ions are much more effective at forming salt bridges between DNA strands and therefore have the greatest effect in stabilizing double-stranded DNA, although trace amounts of divalent cations have significant and often overlooked affects (See Nakano et al, (1999) Proc. Nuclec Acids Res. 27:2957-65).

For sequences longer than 13 nucleotides,

Tm= 100.5 + (41 * (yG+zC)/(wA+xT+yG+zC)) - (820/(wA+xT+yG+zC)) + 16.6*log10([Na+])

This equation is accurate for sequences in the 18-25mer range (Howley,P.M., Israel,M.F., Law,M-F., and Martin,M.A. (1979) J Biol Chem 254:4876-4883).

K-nearest neighbors implementation in ‘Search profile’

SPInDel profiles of unknown origin can be predicted by a k-nearest neighbor method using a database of known profiles. The k-nearest neighbor algorithm is a supervised learning approach that finds the k closest matches in a database of known profiles using a distance metric. SPInDel uses the discrete metric:

if x = y then d(x,y) = 0; otherwise, d(x,y) = 1.

We implemented the algorithm using Biopython and added the discrete distance metric. Classification accuracy can be estimated within the SPInDel workbench by testing the performance of the k-nearest neighbors by modified leave-one-out cross validation using profiles from known species profiles. The modified leave-one-out cross validation ensures that classes with only one specie or genus are not subtracted or left out from the reference set. The profiles label in dataset should be in the following format to perform the leave-one-out cross validation test: "taxonomic-level-1 + underscore + taxonomic-level-2".

7. SPInDel Versions

Version 1.1 Win32 (1 February 2012)

New features:

  • Three taxonomic groups added to database.
  • Windows operating system standard resolution 1024x768 is now supported.
  • Graphics output with PyQt4 design.

Version 1.0.1 Win32 and Linux (19 July 2010)

New features:

  • Leave-one-out cross validation in search profile module.

Version 1.0 Linux32 (13 May 2010)

Version 1.0 Win32 (20 April 2010)

8. Support

If you are experiencing problems with the SPInDel please address to jcarneiro@ipatimup.pt or fpereira@ipatimup.pt.