Frequently Asked Questions


What is Spliceman?

Spliceman is an online tool that predicts how likely distant mutations around annotated splice sites were to disrupt splicing. Spliceman takes a set of DNA sequences with point mutations and returns a ranked list to predict the effects of point mutations on pre-mRNA splicing.

Why hexamers?

This is a web implementation of previously published method. The choice of word size is not that important at the clustering step because hexamers, pentamers or other k-mer choices can be aligned to make motifs of length equal to or greater than k. In this sense, the word choice is self correcting.

What is the L1 distance reported in the results page?

The mutation analysis fragments an 11-mer into a set of overlapping singly shifted hexamers. A point mutation alters six hexamers to new hexamers with new L1 distances. For instance, an 11-mer "acgta(a/c)gtagt" results in the 6 following comparisons as illustrated in the figure below.



We report the comparison that results in the highest L1 distance.

Is the order of mutation important?

The order is not important. Following the example used above, switching the order to (c/a) will generate the same 6 comparisons, and the highest L1 distance between the 6 comparisons will be the same.

What is the actual classifier of the ROC statistics?

We used a binary classifier (‘0’ corresponds to true positive samples derived from a set of 618 confirmed splicing mutations found in the Human Gene Mutation Database (HGMD) and ‘1’ corresponds to false positive samples constructed from a set of simulated mutations). For each intron or exon that contains a HGMD mutation, we randomly selected a position in the same region and simulated a point mutation randomly using equal rates of transversions and transitions (e.g. a nucleotide A has two chances of mutating to a G, one chance to a C, and one chance to a T).

Is the distance based solely on the 6 impacted 6 mers? Or does the context impact the scoring?

Out of the 6 comparisons, the tool picks the one that results in the highest L1 distance.

What genomes are included in this implementation?

In addition to human, we added 10 more genomes to the webserver (chimp, rhesus, mouse, rat, dog, cat, chicken, guinea pig, frog, and zebrafish).