Hi Dr. Bolstad,Thank you very much for your helpful comments and for sending me your thesis. I have read the part for background correction and I guess I have got the big picture of your method to do background correction. So now you are only using PM values on the array to find the background by fitting a model. I really appreciate it!
I went back to check the extent of variation that is explained by each component. The first component explains 26.2% of total variation, while the second component (which separates different batches) explained 13.1% of total variation. So at least more than 10% of the total variation of the whole data set is due to batches. I guess I still need to correct for batch effect (maybe by fitting a model and including batch in it). Your comments will be very much appreciated.
Best, Jun ---------------------------- Jun Ding, Ph.D. student Department of Biostatistics University of Michigan Ann Arbor, MI, 48105 ---------------------------- Quoting Ben Bolstad <bmb@xxxxxxxxxxxxx>:
Hi Jun, As you have noticed, RMA did not completely remove the batch effect even when you processed it as a single dataset. I take this as both good and bad. The good is that at least batch effect is reflected in the second principal component rather than the first as it was in your previous analysis meaning you have at least reduced it's effect and meaning that it was better to do the combined analysis than in separate 40 CEL batches. You don't say how much of the variation is being explained be each of the components, but if you feel that the batch effect is still of significance you should consider accounting for it in subsequent analysis. In regards to the RMA algorithm: 1) The background is done separately for each chip, no information is combined between arrays. It does not use MM probe intensities at all (although very early versions almost 5 years ago did). The background is based upon a convolution model (it is more complicated then just taking the mode). You may read about the RMA background model here: http://www.bmbolstad.com/Dissertation/Bolstad_2004_Dissertation.pdf Particularly pages 17-20 for the details on the background adjustment step. 2) Quantile normalization is done after background correction of the PM probe intensities but before summarization. While it does indeed reduce technical variation it is no magical panacea to poor experimental design. As with most normalization methods it does has its limitation and may not remove all technical variability in every situation. Best, Ben On Fri, 2007-01-26 at 16:19 -0500, Jun Ding wrote:Hi Dr. Bolstad, I have tried to process all 120 arrays in RMA with the hope to eliminate the batch effect (as I mentioned in the last email, when I used RMA to process each batch separately and then combined them, the means for samples from different batches were completely different while within batches the means were similar). What I get this time is pretty interesting. First, as I expect, all the means of samples are similar. When I make the PCA plot, the first component separates cases and controls, while the second component separates different batches. So I guess there is still a batch effect in my data set? What do you think? I have two more questions that are kind of related to this. First, is background adjustment done absolutely separately for each microarray? By reading your papers, my understanding is that you will get a distribution of the MM probes on one array and use the mode as the background, and then subtract each PM intensity by this background. So if on one array, MM values are consistently higher than MM values on another array, then this background adjustment can correct for it. If that is true, I think this step actually should help with eliminating batch effect. Second, is quantile normalization done before summarizing PMs of one transcript to one measure? For example, if I have 10,000 probe sets and each of them has 16 probes, quantile normalization is dealing with a vector of 160,000 values? I would think this step actually aslo eliminates batch effect. Thank you! Best, Jun ---------------------------- Jun Ding, Ph.D. student Department of Biostatistics University of Michigan Ann Arbor, MI, 48105 ---------------------------- Quoting Jun Ding <junding@xxxxxxxxx>: > Hi Dr. Bolstad, > > Thank you very much for your quick and detailed suggestions! It is > surely very helpful to me. > > Just a little follow-up. Yesterday I told you that I used RMA to > analyze each batch individually and then when I combined them > together, the PCA plot perfectly separated three batches of samples. > I tried to calculate the mean of genes' expression for each sample > (i.e. the mean of ~54,000 transcripts of each sample). It turns out > that within each batch, the means of samples are very close to each > other, however, across batches, means are significantly different. I > guess this would be the reason that the first principal component > would perfectly separate three batches of samples. Meanwhile, the > variances of genes' expression for samples are pretty similar to each > other across batches. Are all these what you expect? > > I will try to use RMA to analyze all 120 samples together and let you > know what happens. > > Thanks! > > Jun > ---------------------------- > Jun Ding, Ph.D. student > Department of Biostatistics > University of Michigan > Ann Arbor, MI, 48105 > ---------------------------- > > > Quoting Ben Bolstad <bmb@xxxxxxxxxxxxx>: > >> Jun, >> >> An interesting question, and an issue I am well aware of. Hopefully, >> your experiment is not such that the batches effect is not confounded >> with the treatment effect. My instinct would be that it is still better >> to process all 120 together rather than as 3 sets of 40 if you intend to >> do an analysis involving all the samples. >> >> As for dealing with the remaining batch effect: >> >> One solution might be to include a batch effect parameter in your >> subsequent analysis. >> >> Another that might be worth your time >> >> W. Evan Johnson , Cheng Li , and Ariel Rabinovic >> Adjusting batch effects in microarray expression data using >> empirical Bayes methods >> Biostatistics Advance Access published on January 1, 2007, DOI >> 10.1093/biostatistics/kxj037. >> Biostat 8: 118-127. >> >> http://biostatistics.oxfordjournals.org/cgi/content/abstract/8/1/118 >> >> >> I do have a probe-level normalization which does remove batch effects. >> However, I have yet to publish on it and it will be some months yet >> before it is incorporated into RMAExpress. >> >> Best, >> >> Ben >> >> On Tue, 2007-01-23 at 19:08 -0500, Jun Ding wrote: >>> Hi Dr. Bolstad, >>> >>> I have a question regarding how to use RMA correctly. >>> >>> We have data of 120 microarrays. But those 120 microarrays were not >>> done all together at one time. Actually, we collected 40 samples every >>> time and then went ahead to do microarrays on those 40 samples. So >>> basically we have 3 batches of microarrays (microarrays from the same >>> batch were done at the same time and there was a gap of several months >>> between two batches). I wonder in this case, when I use RMA, whether I >>> should analyze those 120 microarrays together or I should analyze each >>> batch of microarrays separately. I don't know the details of RMA, so I >>> really don't know which way I should take. >>> >>> I have tried to use RMA to analyze each batch of microarrays separately >>> and then combined them together. I used PCA (principal component >>> analysis) to do an unsupervised analysis and what I found was that the >>> first principal component could perfectly separate three batches. I >>> guess that means there is an obvious batch effect in the data after RMA. >>> >>> Look forward to getting your suggestions! Thanks a lot! >>> >>> Jun >>> >>> ---------------------------- >>> Jun Ding, Ph.D. student >>> Department of Biostatistics >>> University of Michigan >>> Ann Arbor, MI, 48105 >>> ---------------------------- >>> >>> >> >> >> >> >> > > > > > >--