On Apr 9, 2012, at 14:28 , visam wrote: > I (a PhD student) am working on a project that we (some professors and other > PhD students) want to have some organisms' genomic and transcriptomic > sequences, starting from Yeast. Around the globe, who/which company would you > recommend? The criteria for me would be the price/reliability. I guess for > each project we will be having one 454 and one of other type of seq. > technology runs, so that we can have some hybrid assemblies. Note to the list: the above question from Visam is not the first which arrived to my private inbox, nor will it probably be the last. I encouraged him to post it to MIRA talk as I hope to generate some additional viewpoints, recount of experiences etc. on this topic. I'd like to ask for one favour though: please no names on the public list. You can, if you wish to do so, send a second mail with likes/dislikes regarding companies and/or institutions to Visam personally (CC me, pretty please, I'm curious :-) Hello Visam, this response got, errrr, a little longer, but allow me to note that I will not give you names. The reasons are manyfold: once upon a time I worked for a sequencing company the company I am currently employed with is not in the sequencing provider business, but the company uses more than one sequencing provider on a regular base and I get to see quite some data due to my development on MIRA in my free time, I'm getting insight into a number of highs and lows of sequencing technologies at different sequencing providers which I would not get if I were to expose them publicly ... I do not want to jeopardise these relationships. That being said, there are a number of general considerations which could help you. Excuse me in case the detours I am going to make are obvious to you, but I'm writing this also for future references. Also, please bear with me if I look at "sequencing" a bit differently than you might be accustomed to from academia, but I have worked for quite some time now in industry ... and there cost-effectiveness respectively "probability of success" of a project as whole is paramount to everything else. I'll come back to that further down. There's one - and only one - question which you, as sequencing customer, need to be able to answer ... if necessary in every excruciating detail, but you must know the answer. The question is: "WHAT DO YOU WANT?!" --------------------------------------------------- Detour - Sequencing - For me, every "sequencing project", be it genomic or transcriptomic, really consists of four major phases: data generation This can be broadly seen as everything to get the DNA/RNA ready to be sent off to sequencing (usually something the client does), the library prep at the sequencing provider and finally the sequencing itself (including base calling). An area of thousand pitfalls where each step (and the communication) is crucial and even one slight inadvertence can make the difference between a "simple" project and a "hard" project. E.g.: taking DNA from growing cells (especially bacteria in exponential growing phase) might not be a good idea ... it makes assembly more difficult. Some DNA extraction methods generate more junk than good fragments etc.pp The reason I am emphasizing this is simple: nowadays, the "sequencing" itself is not the most expensive part of a sequencing project, the next two steps are (most of the time anyway). assembly & finishing Still a hard problem. Even a "simple" bacterium can present weeks of effort to get right if its riddled with phages, prophages, transposon elements, genetically engineered repeats etc.pp. And starting with eukaryotes the real fun starts: ploidy, retrotransposons etc. make for an unbelievable genome plasticity and almost always have their own surprises. I've seen "simple" Saccharomyces cerevisiae - where biologist swore to high heaven they were "close to the publicly sequenced strains" - being *very* different from what they were expected to be, both on the DNA level and the genome organisation level. Getting eukaryotes right "down to the last base" might cost quite some money, especially when looping back to step 1 (data generation) to tackle difficult areas. annotation Something many people forget: give the sequence a meaning. Here too, things can get quite costly if done "right", i.e., with hand curation. Especially on organism which are not part of the more commonly sequenced species or are generally more complex. Annotation of a de-novo transcriptome assembly is also not for the faint of heart, especially if done on short, unpaired read assemblies. "Using the sequencing data" ... for whatever it was generated for. --------------------------------------------------- The above makes it clear that, depending on what you are really interested in within your project and what you expect to be able to do with the sequencing data, one can cut corners and reduce cost here and there (but not everywhere). And therefore, the above question "What do you want?" is one which - after the initial chit-chat of "hi, hello, nice to meet you, a pleasure to be here, etc." - every good representative of respectable sequencing providers I have met so far will ask as very first question. Usually in the form of "what do you want to sequence and what will you want to use the data for (and what not)?" Every other question - like where to sequence, which sequencing technology to use, how to process the sequencing data afterwards - is incidental and subordinated to your answer(s) to the question of "what do you want?!" But often sequencing customers get their priorities wrong by putting forward another question: "WHAT WILL IT COST ME?" respectively "Can you make it cheaper?" --------------------------------------------------- Detour - Putting things into perspective - Come to think of it, people sometimes have very interesting ideas regarding costs. Interesting as in "outright silly." It may be because they do not really know what they want or feel unsure on a terrain unbeknownst to them, and often instead focus their energy on single aspects of a wider project because they feel more at home there. And suddenly the focus lies on haggling and bartering for some prices because, after all, this is something everyone knows how to do, right? As I hinted earlier, the pure sequencing costs are nowadays probably not the biggest factor in any sequencing project: 454, Illumina, IonTorrent and other technology providers have seen to that. E.g., in 20043/2004 it still cost somewhere between 150 - 200 k€ to get an 8x Sanger coverage of a moderately sized bacterium (4 to 5 mb). Nowadays, for the same organism, you get coverages in the dozens (going with 454) for a few thousand Euro ... or coverages in the hundreds or even thousands (going with Illumina) for a few hundred Euro. Cost for assembly, finishing and annotation have not followed the same decrease. Yes, advances in algorithms have made things easier in some parts, but not really on the same scale. Furthermore, the "short read" technologies have more than made up for algorithmical complexity when compared to the old Sanger reads. Maybe that "(ultra)long read" technologies will alleviate the problem, but I would not hold my breath for them to really work well. One thing however has almost not changed at all: your costs of actually doing followup experiments and data interpretation! Remember that sequencing in itself is most of the time not the ultimate goal, you actually want to gain something out of it. Be it abstract knowledge for a paper or concrete hints for producing some compounds or whatever, chances are that you will actually devote a substantial amount of your resources (time, manpower, mental health) into followup activities (lab experiments, genetic engineering, writing papers) to turn the abstract act of sequencing into something tangible, be it papers, fame, new products, money, or whatever you want to achieve. And this is the place where it pays to stop and think: "what do I want? what are my strengths and where are my weaknesses? where are my priorities?" The English have a nice saying: "Being penny-wise and pound-foolish is not wise." I may add: Especially not if you are basing man months / years of lab work and your career on the outcome of something like sequencing. Maybe I'm spoiled because I have left academia for quite some time now, but in sequencing I always prefer to throw a bit more money at the sequencing process itself to minimise risks of the later stages. --------------------------------------------------- There's one last detour I'd like to make, and that is the question of "where to sequence?" --------------------------------------------------- Detour - Public or private, old-timers or young-timers ? - Choosing a sequencing provider is highly dependent on your answer to "what do you want?" In case you want to keep the sequencing data (or the very act of sequencing) secret (even only for some time) will probably lead you to commercial sequencing companies. There you more or less have complete control on the data. Paranoid people might perhaps argue that you can have that only with own sequencing equipment and personnel, but I have the feeling that only a minority is able to cough-up the necessary money for purchasing sequencing equipment for a small one-time project. Instead of companies you could however also look whether one of the existing sequencing centers in the world might be a good cooperation candidate. Especially if you are doing this project within the scope of your university. Note however that there might be a number of gotchas lurking there, beside the obvious "the data is not really secret anymore": sometimes the raw sequencing data needs to be publicly released, maybe earlier than you would like; or the sequencing center imposes that each and every paper you publish with that data as basis has them as (co-)first author. A related problem is "whom do I trust to deliver good work?" Intuition says that institutes with a long sequencing history have amassed quite some knowledge in this field, making them experts in all three aspects (data generation, assembly & finishing, annotation) of a sequencing project ... and intuition probably isn't wrong there. The same thing is probably true for sequencing companies which have existed for more than just a couple of years, though from what I have seen so far is that - due to size - sequencing companies sometimes really focus on the data generation and rely on partner companies for "assembly" and "annotation". This is not to say that younger companies are bad. Incidentally, it is my belief that in this field, people are still more important than technology ... and every once in a while good people split off a well known institute (or company) to try their luck in an own company. Always look for references there. The following statement is a personal opinion (and you can call me biased for that): Personally, I am however quite wary of sequencing done at locations where a sequencer exists because someone got a grant to buy one (because it was chic & en-vogue to get a shiny new toy) but where the instrument then slowly starts to collect dust after the initial flurry ... and because people often do not calculate chemistry costs which arise in case they'd really thought of using the machine 24/7. I want to know that technicians actually work with those things every day, that they know the ins and outs of the work, the protocols, the chemistry, the moods of the machine (even an instrument can have a bad day). I honestly do not believe that one can build up enough expertise when operating these things "every once in a while". --------------------------------------------------- All of the above means that depending on what I need the data for, I have the freedom choose among different providers. In case I just need masses of raw data and potential savings are substantial, I might go with the cheapest whom I know to generate good data. If I want good service and second round of data in case I am not 110% satisfied with the first round (somehow people have stopped questioning me there), this is usually not the cheapest provider ... but the additional costs are not really high. If I wanted my data really really quick, I'd search for a provider with Ion Torrent, or MiSeq (I am actually looking for one with a MiSeq, so if anyone knows a good one, preferably in Europe -> mail me). Though I already did transcriptomics on eukaryotes, in case I needed larger eukaryotes assembled de-novo & also annotated, I would probably look for the help of a larger sequencing center as this starts to get dangerously near the fringe of my field of expertise. In closing this part, here are a couple of guidelines which have not failed me so far for choosing sequencing providers: Building a good relationship helps. In case your institute / university already has good (or OK) experience with a provider, ask there first. It is a lot easier to build a good relationship with someone who speaks your language ... or a good(!) english. I will not haggle for a couple of hundred Euros in a single project, I'll certainly reconsider this when savings are in the tens of thousands. Managing expectations: some sequencing projects are high risk from the start, for lots of possible reasons (underfunded, bad starting material, unclear organism). This is *sometimes* (!) OK as long as everyone involved knows and acknowledges this. However, you should always have a clear target ("what am I looking for?") and preferably know in advance how to treat the data to get there. Errors occur, stay friendly at first. In case the expectations were clear (see above), the material and organism are not at fault but the data quality somehow is bad, it is not too difficult to have the sequencing provider acknowledge this and get additional sequencing for no added cost. > It would be more than good if one can comment on the mixture of independent > technologies (sequencing tech.), in respect to what kind of projects would > yield higher quality with which kind of sequencers. Really depends on what you want to do :-) And note that I base my answers on technologies available today without bigger problems: 454, Illumina, with IonTorrent as Joker for quick projects. PacBio and Oxford Nanopore might become game changers, but are not just yet. Here's what I would do, which might not necessarily be what others would: On a gene fishing expedition? Probably Illumina HiSeq, at least 100bp, 150 to 200bp if your provider supports it well. Ion if a small organism and you need it quick without caring for possible frameshifts. Want some larger contigs? 454 Titanium + Illumina 100bp (150 to 200bp if provider supports it). The same as above, but maybe cheaper? Ion Torrent (long chemistry) & Illumina. (Never tried that myself) Even larger contigs and scaffolding? 454 Titanium + 454 paired-end + Illumina HiSeq (also paired-end if you need more coverage). Larger scaffolds? Like above, but different library sizes in the paired-end libraries. Feeling adventurous or have a complex eukaryote? PacBio + at least Illumina, preferably with 2x to 4x 454 mixed in. The raw PacBio single-reads I have seen so far were ... ummm ... difficult to handle (85% accuracy, with very unevenly spaced errors), 3-pass reads should alleviate most problems (at cost of read lengths), maybe 2-pass also. Make sure the provider does the whole read-processing and collapsing for you or else you are in for some fun time. Have some good friends at Oxford Nanopore who can give you some MinIon engineering samples? Man, I'd kill for some bacterial test sets with those (especially Bacillus subtilis 168) Hope that helps, Bastien