[mira_talk] Re: off topic- suggestion for sequencing companies

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 10 Apr 2012 02:37:21 +0200

On Apr 9, 2012, at 14:28 , visam wrote:
> I (a PhD student) am working on a project that we (some professors and other 
> PhD students) want to have some organisms' genomic and transcriptomic 
> sequences, starting from Yeast. Around the globe, who/which company would you 
> recommend? The criteria for me would be the price/reliability. I guess for 
> each project we will be having one 454 and one of other type of seq. 
> technology runs, so that we can have some hybrid assemblies.

Note to the list: the above question from Visam is not the first which arrived 
to my private inbox, nor will it probably be the last. I encouraged him to post 
it to MIRA talk as I hope to generate some additional viewpoints, recount of 
experiences etc. on this topic. I'd like to ask for one favour though: please 
no names on the public list. You can, if you wish to do so, send a second mail 
with likes/dislikes regarding companies and/or institutions to Visam personally 
(CC me, pretty please, I'm curious :-)

Hello Visam,

this response got, errrr,  a little longer, but allow me to note that I will 
not give you names. The reasons are manyfold:
once upon a time I worked for a sequencing company
the company I am currently employed with is not in the sequencing provider 
business, but the company uses more than one sequencing provider on a regular 
base and I get to see quite some data
due to my development on MIRA in my free time, I'm getting insight into a 
number of highs and lows of sequencing technologies at different sequencing 
providers which I would not get if I were to expose them publicly ... I do not 
want to jeopardise these relationships.

That being said, there are a number of general considerations which could help 
you. Excuse me in case the detours I am going to make are obvious to you, but 
I'm writing this also for future references. Also, please bear with me if I 
look at "sequencing" a bit differently than you might be accustomed to from 
academia, but I have worked for quite some time now in industry ... and there 
cost-effectiveness respectively "probability of success" of a project as whole 
is paramount to everything else. I'll come back to that further down.

There's one - and only one - question which you, as sequencing customer, need 
to be able to answer ... if necessary in every excruciating detail, but you 
must know the answer. The question is:

"WHAT DO YOU WANT?!"

---------------------------------------------------
Detour - Sequencing -

For me, every "sequencing project", be it genomic or transcriptomic, really 
consists of four major phases:
data generation
This can be broadly seen as everything to get the DNA/RNA ready to be sent off 
to sequencing (usually something the client does), the library prep at the 
sequencing provider and finally the sequencing itself (including base calling). 
An area of thousand pitfalls where each step (and the communication) is crucial 
and even one slight inadvertence can make the difference between a "simple" 
project and a "hard" project. E.g.: taking DNA from growing cells (especially 
bacteria in exponential growing phase) might not be a good idea ... it makes 
assembly more difficult. Some DNA extraction methods generate more junk than 
good fragments etc.pp
The reason I am emphasizing this is simple: nowadays, the "sequencing" itself 
is not the most expensive part of a sequencing project, the next two steps are 
(most of the time anyway).
assembly & finishing
Still a hard problem. Even a "simple" bacterium can present weeks of effort to 
get right if its riddled with phages, prophages, transposon elements, 
genetically engineered repeats etc.pp. And starting with eukaryotes the real 
fun starts: ploidy, retrotransposons etc. make for an unbelievable genome 
plasticity and almost always have their own surprises. I've seen "simple" 
Saccharomyces cerevisiae - where biologist swore to high heaven they were 
"close to the publicly sequenced strains" - being *very* different from what 
they were expected to be, both on the DNA level and the genome organisation 
level.
Getting eukaryotes right "down to the last base" might cost quite some money, 
especially when looping back to step 1 (data generation) to tackle difficult 
areas.
annotation
Something many people forget: give the sequence a meaning. Here too, things can 
get quite costly if done "right", i.e., with hand curation. Especially on 
organism which are not part of the more commonly sequenced species or are 
generally more complex.
Annotation of a de-novo transcriptome assembly is also not for the faint of 
heart, especially if done on short, unpaired read assemblies.
"Using the sequencing data"
... for whatever it was generated for.
---------------------------------------------------

The above makes it clear that, depending on what you are really interested in 
within your project and what you expect to be able to do with the sequencing 
data, one can cut corners and reduce cost here and there (but not everywhere). 
And therefore, the above question "What do you want?" is one which - after the 
initial chit-chat of "hi, hello, nice to meet you, a pleasure to be here, etc." 
- every good representative of respectable sequencing providers I have met so 
far will ask as very first question. Usually in the form of "what do you want 
to sequence and what will you want to use the data for (and what not)?"

Every other question - like where to sequence, which sequencing technology to 
use, how to process the sequencing data afterwards - is incidental and 
subordinated to your answer(s) to the question of "what do you want?!" But 
often sequencing customers get their priorities wrong by putting forward 
another question:

"WHAT WILL IT COST ME?" respectively "Can you make it cheaper?"

---------------------------------------------------
Detour - Putting things into perspective -

Come to think of it, people sometimes have very interesting ideas regarding 
costs. Interesting as in "outright silly." It may be because they do not really 
know what they want or feel unsure on a terrain unbeknownst to them, and often 
instead focus their energy on single aspects of a wider project because they 
feel more at home there. And suddenly the focus lies on haggling and bartering 
for some prices because, after all, this is something everyone knows how to do, 
right?

As I hinted earlier, the pure sequencing costs are nowadays probably not the 
biggest factor in any sequencing project: 454, Illumina, IonTorrent and other 
technology providers have seen to that. E.g., in 20043/2004 it still cost 
somewhere between 150 - 200 k€ to get an 8x Sanger coverage of a moderately 
sized bacterium (4 to 5 mb). Nowadays, for the same organism,  you get 
coverages in the dozens (going with 454) for a few thousand Euro ... or 
coverages in the hundreds or even thousands (going with Illumina) for a few 
hundred Euro.

Cost for assembly, finishing and annotation have not followed the same 
decrease. Yes, advances in algorithms have made things easier in some parts, 
but not really on the same scale. Furthermore, the "short read" technologies 
have more than made up for algorithmical complexity when compared to the old 
Sanger reads. Maybe that "(ultra)long read" technologies will alleviate the 
problem, but I would not hold my breath for them to really work well.

One thing however has almost not changed at all: your costs of actually doing 
followup experiments and data interpretation! Remember that sequencing in 
itself is most of the time not the ultimate goal, you actually want to gain 
something out of it. Be it abstract knowledge for a paper or concrete hints for 
producing some compounds or whatever, chances are that you will actually devote 
a substantial amount of your resources (time, manpower, mental health) into 
followup activities (lab experiments, genetic engineering, writing papers) to 
turn the abstract act of sequencing into something tangible, be it papers, 
fame, new products, money, or whatever you want to achieve.

And this is the place where it pays to stop and think: "what do I want? what 
are my strengths and where are my weaknesses? where are my priorities?" The 
English have a nice saying: "Being penny-wise and pound-foolish is not wise." I 
may add: Especially not if you are basing man months / years of lab work and 
your career on the outcome of something like sequencing. Maybe I'm spoiled 
because I have left academia for quite some time now, but  in sequencing I 
always prefer to throw a bit more money at the sequencing process itself to 
minimise risks of the later stages.
---------------------------------------------------

There's one last detour I'd like to make, and that is the question of "where to 
sequence?"

---------------------------------------------------
Detour - Public or private, old-timers or young-timers ? -

Choosing a sequencing provider is highly dependent on your answer to "what do 
you want?" In case you want to keep the sequencing data (or the very act of 
sequencing) secret (even only for some time) will probably lead you to 
commercial sequencing companies. There you more or less have complete control 
on the data. Paranoid people might perhaps argue that you can have that only 
with own sequencing equipment and personnel, but I have the feeling that only a 
minority is able to cough-up the necessary money for purchasing sequencing 
equipment for a small one-time project.

Instead of companies you could however also look whether one of the existing 
sequencing centers in the world might be a good cooperation candidate. 
Especially if you are doing this project within the scope of your university. 
Note however that there might be a number of gotchas lurking there, beside the 
obvious "the data is not really secret anymore": sometimes the raw sequencing 
data needs to be publicly released, maybe earlier than you would like; or the 
sequencing center imposes that each and every paper you publish with that data 
as basis has them as (co-)first author.

A related problem is "whom do I trust to deliver good work?" Intuition says 
that institutes with a long sequencing history have amassed quite some 
knowledge in this field, making them experts in all three aspects (data 
generation, assembly & finishing, annotation) of a sequencing project ... and 
intuition probably isn't wrong there. The same thing is probably true for 
sequencing companies which have existed for more than just a couple of years, 
though from what I have seen so far is that - due to size - sequencing 
companies sometimes really focus on the data generation and rely on partner 
companies for "assembly" and "annotation". This is not to say that younger 
companies are bad. Incidentally, it is my belief that in this field, people are 
still more important than technology ... and every once in a while good people 
split off a well known institute (or company) to try their luck in an own 
company. Always look for references there.

The following statement is a personal opinion (and you can call me biased for 
that): Personally, I am however quite wary of sequencing done at locations 
where a sequencer exists because someone got a grant to buy one (because it was 
chic & en-vogue to get a shiny new toy) but where the instrument then slowly 
starts to collect dust after the initial flurry ... and because people often do 
not calculate chemistry costs which arise in case they'd really thought of 
using the machine 24/7. I want to know that technicians actually work with 
those things every day, that they know the ins and outs of the work, the 
protocols, the chemistry, the moods of the machine (even an instrument can have 
a bad day). I honestly do not believe that one can build up enough expertise 
when operating these things "every once in a while".

---------------------------------------------------


All of the above means that depending on what I need the data for, I have the 
freedom choose among different providers. In case I just need masses of raw 
data and potential savings are substantial, I might go with the cheapest whom I 
know to generate good data. If I want good service and second round of data in 
case I am not 110% satisfied with the first round (somehow people have stopped 
questioning me there), this is usually not the cheapest provider ... but the 
additional costs are not really high. If I wanted my data really really quick, 
I'd search for a provider with Ion Torrent, or MiSeq (I am actually looking for 
one with a MiSeq, so if anyone knows a good one, preferably in Europe -> mail 
me). Though I already did transcriptomics on eukaryotes, in case I needed 
larger eukaryotes assembled de-novo & also annotated, I would probably look for 
the help of a larger sequencing center as this starts to get dangerously near 
the fringe of my field of expertise.


In closing this part, here are a couple of guidelines which have not failed me 
so far for choosing sequencing providers:
Building a good relationship helps. In case your institute / university already 
has good (or OK) experience with a provider, ask there first.
It is a lot easier to build a good relationship with someone who speaks your 
language ... or a good(!) english.
I will not haggle for a couple of hundred Euros in a single project, I'll 
certainly reconsider this when savings are in the tens of thousands.
Managing expectations: some sequencing projects are high risk from the start, 
for lots of possible reasons (underfunded, bad starting material, unclear 
organism). This is *sometimes* (!) OK as long as everyone involved knows and 
acknowledges this. However, you should always have a clear target ("what am I 
looking for?") and preferably know in advance how to treat the data to get 
there.
Errors occur, stay friendly at first. In case the expectations were clear (see 
above), the material and organism are not at fault but the data quality somehow 
is bad, it is not too difficult to have the sequencing provider acknowledge 
this and get additional sequencing for no added cost.

> It would be more than good if one can comment on the mixture of independent 
> technologies (sequencing tech.), in respect to what kind of projects would 
> yield higher quality with which kind of sequencers.

Really depends on what you want to do :-) And note that I base my answers on 
technologies available today without bigger problems: 454, Illumina, with 
IonTorrent as Joker for quick projects. PacBio and Oxford Nanopore might become 
game changers, but are not just yet. Here's what I would do, which might not 
necessarily be what others would:

On a gene fishing expedition? Probably Illumina HiSeq, at least 100bp, 150 to 
200bp if your provider supports it well. Ion if a small organism and you need 
it quick without caring for possible frameshifts.
Want some larger contigs? 454 Titanium + Illumina 100bp (150 to 200bp if 
provider supports it).
The same as above, but maybe cheaper? Ion Torrent (long chemistry) & Illumina. 
(Never tried that myself)
Even larger contigs and scaffolding? 454 Titanium + 454 paired-end + Illumina 
HiSeq (also paired-end if you need more coverage).
Larger scaffolds? Like above, but different library sizes in the paired-end 
libraries.
Feeling adventurous or have a complex eukaryote? PacBio + at least Illumina, 
preferably with 2x to 4x 454 mixed in. The raw PacBio single-reads I have seen 
so far were ... ummm ... difficult to handle (85% accuracy, with very unevenly 
spaced errors), 3-pass reads should alleviate most problems (at cost of read 
lengths), maybe 2-pass also. Make sure the provider does the whole 
read-processing and collapsing for you or else you are in for some fun time.
Have some good friends at Oxford Nanopore who can give you some MinIon 
engineering samples? Man, I'd kill for some bacterial test sets with those 
(especially Bacillus subtilis 168)

Hope that helps,
  Bastien




Other related posts: