[mira_talk] Re: De novo assemblie. Reference genome and plasmids genomes?

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 20 May 2015 00:12:56 +0200

On 19 May 2015, at 18:43 , Dietmar Fernandez <dietmar.fernandez@xxxxxxxxxxxx>
wrote:

You can find the unasembled reads at de debrislist in the info? folder. To
select the reads ( no overlap and not align..) I develop a small python
script I could send you […]

Scripts are certainly a nice training. You might be quicker with the command
line though. E.g., to extract read names which have either the NO_OVERLAP or
the TINY_CLUSTER code in the debris file, use this:

grep -w -E “NO_OVERLAP|TINY_CLUSTER” debris.txt | cut -f 1 >myreadnames.txt

I’m sure you will be able to change that for other or more codes in one go.

if you want and afterwards you could filter them from the original fastq file
(there is another script i don't remember right now google??) […]

There are certainly scripts for that. Or there is miraconvert:

miraconvert -n myreadnames.txt input.fastq myfilteredreads.fastq

B.


--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html

Other related posts: