Yes, of course, I can explain it in a more detailed way! Actually, I performed a NGS analysis from 454 data of a non-model organism for which no genome data are available. I did a keyword selection form our blast results and finally, as the data came from a transcriptome, I aligned the results to an open reading frame reference sequence. Well, I was able to separate the 454 reads according to the known proteins they came from, so I had different groups of related 454 reads, each of them belonging to a given protein. The whole of aligned reads (that is, the coding ones) were assembled with MIRA in order to get the as many different coding sequences I had for a given protein. According to this, the number of resulting contigs could give an idea of the number of expressed genes for a given protein. My doubts came from the debris file. What should I do with those sequences? They really encode for protein and they are not included in a contig, so I think they should be considered as a putative different.. "isoforms"?? But why they are included in the debris file?? Perhaps singletons? I hope I have explained the issue better. Thanks a lot. 2012/4/23 Bastien Chevreux <bach@xxxxxxxxxxxx> > On Apr 20, 2012, at 12:49 , Jordi Durban wrote: > > [...] > > What do you think about such an approach?? > > Hi Jordi, > > I, uh, am not sure I completely understood what you are trying to do, > sorry for that. Care to explain in more detail? > > B. > > > -- > You have received this mail because you are subscribed to the mira_talk > mailing list. For information on how to subscribe or unsubscribe, please > visit http://www.chevreux.org/mira_mailinglists.html > -- Jordi