Skip to main content

Evaluation of putative reference genes for gene expression normalization in soybean by quantitative real-time RT-PCR



Real-time quantitative reverse transcription PCR (RT-qPCR) data needs to be normalized for its proper interpretation. Housekeeping genes are routinely employed for this purpose, but their expression level cannot be assumed to remain constant under all possible experimental conditions. Thus, a systematic validation of reference genes is required to ensure proper normalization. For soybean, only a small number of validated reference genes are available to date.


A systematic comparison of 14 potential reference genes for soybean is presented. These included seven commonly used (ACT2, ACT11, TUB4, TUA5, CYP, UBQ10, EF1b) and seven new candidates (SKIP16, MTP, PEPKR1, HDC, TIP41, UKN1, UKN2). Expression stability was examined by RT-qPCR across 116 biological samples, representing tissues at various developmental stages, varied photoperiodic treatments, and a range of soybean cultivars. Expression of all 14 genes was variable to some extent, but that of SKIP16, UKN1 and UKN2 was overall the most stable. A combination of ACT11, UKN1 and UKN2 would be appropriate as a reference panel for normalizing gene expression data among different tissues, whereas the combination SKIP16, UKN1 and MTP was most suitable for developmental stages. ACT11, TUA5 and TIP41 were the most stably expressed when the photoperiod was altered, and TIP41, UKN1 and UKN2 when the light quality was changed. For six different cultivars in long day (LD) and short day (SD), their expression stability did not vary significantly with ACT11, UKN2 and TUB4 being the most stable genes. The relative gene expression level of GmFTL3, an ortholog of Arabidopsis FT (FLOWERING LOCUS T) was detected to validate the reference genes selected in this study.


None of the candidate reference genes was uniformly expressed across all experimental conditions, and the most suitable reference genes are conditional-, tissue-specific-, developmental-, and cultivar-dependent. Most of the new reference genes performed better than the conventional housekeeping genes. These results should guide the selection of reference genes for gene expression studies in soybean.


Gene expression analysis plays an important role in furthering our understanding of the signalling and metabolic pathways which underlie developmental and cellular processes. Real-time quantitative reverse transcription PCR (RT-qPCR) represents a particularly suitable technology platform for this purpose, thanks to its sensitivity, specificity, dynamic range and high throughput capacity [14]. To avoid experimental errors arising from variation in the quantity and integrity of the RNA template, as well as in the efficiency of the RT reaction used to synthesize cDNA, a normalization step is an essential pre-requisite. The most common way to achieve normalization is to include one, or a small number of reference genes, whose expression is assumed to be constitutive [57]. Such genes are expressed at a constant level in all tissues independent of the growing environment [1, 58]. Commonly used reference genes include ribosomal RNA (18SrRNA) and a number of housekeeping genes, such as those encoding actin (ACT), tubulin (TUB), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), polyubiquitin (UBQ) and elongation factor 1-α (EF1α) [1, 6, 9, 10]. Typically, these genes have been simply assumed to be constitutively expressed, as they are involved in basic and ubiquitous cellular processes [1, 5, 9, 11]. However, the evidence is that transcript levels of housekeeping genes can vary considerably in response to changes in experimental conditions and/or tissue types, so that none of the commonly exploited genes can be viewed as a universal reference. Instead, the onus is on the experimenter to select a panel of genes which is appropriate for the specific set of chosen experimental conditions and tissue types [7, 8, 1214]. In many cases, a single reference gene is inadequate, and any such reliance is likely to produce erroneous conclusions vis-à-vis expression patterns [1518].

The importance of expression stability in the choice of reference genes is high enough to have prompted the development of software packages, such as geNorm [19] and NormFinder [20], to identify them [17, 21]. A number of reference gene validation attempts have been reported [2229], and in plants specifically, these have covered both model and crop species: Arabidopsis thaliana [9, 30], rice [31, 32], Brachypodium sp. [33], wheat [34], barley [35], soybean [36, 37], tomato [38], potato [39], sugarcane [40], grape [16] and poplar [15, 41]. The A. thaliana ATH1 array has been used to identify a set of reference genes superior to the conventionally applied housekeeping genes [9], and the wider relevance of this set has been demonstrated in Brachypodium sp. [33], tomato [38], grape [16] and poplar [15].

Soybean is the leading legume crop, and has been used as a model plant in the context of the flowering response to photoperiod. Many of these studies have used TUB and/or ACT as a reference gene (Additional file 1). A literature search based on the keywords "soybean" and "gene expression" produced 54 hits in PubMed (publication period 2001 to 2009). In 23 of these studies (43%), TUB was the reference gene, in 15 of them (28%) ACT, and in six (11%) 18SrRNA. All of the studies surveyed used one single reference gene and no preliminary validations were performed (Additional file 1). To date, only a limited number of statistically validated reference genes have been identified in soybean. A comparison of the performance of ten conventional housekeeping genes across 21 soybean samples allowed the identification of a panel of genes suitable for gene expression normalization [36]. However, the limited number of samples tested meant that a full representation of developmental stages and tissues/organs could not be achieved; instead, a set of new reference genes, chosen to exhibit constancy of expression over a range of experimental conditions, was mined from multiple soybean microarray datasets [37]. In the present report, we compare the performance of seven commonly used housekeeping genes and seven of these new reference genes across a large set of biological samples representing various developmental stages, tissues, photoperiod treatments and cultivars of soybean. The recently released soybean whole genome sequence [42] has facilitated genome-wide mining for reference genes in soybean. Based on sequence homology, soybean orthologs of the best three A. thaliana reference genes have been identified. A further four genes have been selected, which have shown stable expression on a micro-array platform [37]. Our data indicate that many of these newer reference genes indeed have greater expression stability than the conventionally used housekeeping genes. As a result, the use of combinations of these reference genes should provide a more reliable means of normalizing gene expression.


Transcription profiling of soybean reference genes

A RT-qPCR assay based on SYBR Green detection was carried out to examine the stability of the expression of the 14 candidate genes (Table 1). The full sample set was included in each technical replicate to exclude any artefacts due to between-run variation. Each RT reaction was repeated once, and three independent technical replicates were performed for each experiment. The expression level of the candidate reference genes are presented as quantification cycle (Cq) values (Figure 1). The mean Cq values of the genes ranged from 17 to 32, with most lying between 20 and 25. CYP was the most highly expressed of the set, with a mean Cq of 19.6, and HDC the least (mean Cq of 32.7). EF1b showed the least variation (CV of 5.6%), while ACT2/7 (7.3%) and TUB4 (7.7%) were the most variable. The variation in Cq is illustrated as a scatter diagram in Additional file 2.

Table 1 Reference genes used for gene expression normalization in soybean.
Figure 1

Expression levels of the candidate reference genes across experimental samples. Values are given in the form of RT-qPCR quantification cycle numbers (Cq values). The boxes represent mean Cq values, the bars standard deviations.

The variation in relative transcript quantity of the reference genes across all samples is shown as Figure 2. Here, transcript quantities are represented as percentages, relative to the aggregated reference transcript pool of each sample. The proportion of SKIP16, UKN2 and UKN1 transcript remained relatively constant across samples, while those of HDC and TUB4 were rather variable, especially with respect to developmental stage and tissue type. Although the expression level of UKN2 was fairly constant among almost all the samples, its expression was particularly low in the 2nd triofoliolate at the stage when the 3rd triofoliolate fully expanded. In contrast, the expression of HDC was particularly high in this tissue/developmental stage combination. TUA5 expression varied widely across developmental stages and tissue types, but was largely unaffected by photoperiodic treatment or cultivar. Thus, the transcript level of none of the reference genes was truly constant, rather it varied both temporally and spatially.

Figure 2

Distribution of relative transcript quantities of the reference genes across all samples. Transcript quantities are represented as percentages of the aggregated 14-transcript pool for each sample. 1-20: across various developmental stages; 21-44: across different tissues; 45-56: across cultivars; 57-92: response to short day (SD) and long day (LD) photoperiods; 93-116: response to exposure to red (RL) and blue (BL) light. Detailed sample information given in Additional file 5.

PCR efficiency analyses

Melting curve analyses were performed following the RT-qPCR. The specificity of the amplicons was confirmed by the presence of a single peak (a representative trace is shown as Additional file 3). Electrophoretic separation of the amplicons produced a single fragment of the expected size in all cases, with no visible primer-dimer products. Five primer pairs were designed either to span an intron, or to target exon-exon junctions (Table 2), and used to compare amplicons derived from genomic DNA template with those from cDNA template. This comparison demonstrated that the cDNA template was free of contaminating gDNA. No amplification was detectable in the absence of template. Standard curves were generated using a ten-fold serial dilution of a cDNA pool, and these enjoyed a linear correlation coefficient (R2) of 0.994-0.999. Based on the slopes of these standard curves, the estimated PCR amplification efficiencies ranged from 94% to 106% (Table 2 and Additional file 4).

Table 2 Reference gene primer sequences and amplicon characteristics.

Gene expression stability analyses

The expression stability of the set of candidate reference genes was examined by geNorm software, which calculates, for each gene, a measure of its expression stability (M) based on the average pairwise variation between all genes tested (Figure 3). Stepwise exclusion of the least stable gene allowed the genes to be ranked according to their M value (the lower the M value, the higher the gene's expression stability) [17], as depicted in Figure 3A. All the genes had an M value below the geNorm threshold of 1.5. Across all the samples, SKIP16 and UKN1 were the most stably expressed, and HDC the least. As a result, the latter was the first to be excluded from the analysis (Figure 3A). Among the various developmental stages, SKIP16 and UKN1 remained the most stable, and CYP the least stable. ACT11 and UKN1 were the most highly ranked across the set of tissues at the various developmental stages, while ACT2/7 was the least stable. In response to the short day (SD) and long day (LD) treatments, ACT11 and TUA5 were the most stable genes, and HDC the least; while in response to blue light (BL) and red light (RL) treatment, TIP41 and UKN2 were the most stable, and HDC the least.

Figure 3

Gene expression stability and pairwise variation of the candidate genes as predicted by geNorm. A. Mean expression stability (M) following stepwise exclusion of the least stable gene across all treatment groups. The least stable genes are on the left, and the most stable on the right. B. The optimal number of reference genes required for effective normalization. The pairwise variation (Vn/Vn+1) was analyzed between the normalization factors NFn and NFn+1 by geNorm program to determine the optimal number of reference genes required for RT-qPCR data normalization.

To determine the optimal number of genes required for normalization, geNorm was used to calculate the pairwise variation (Vn/Vn+1) between sequential normalization factors (NF) (NFn and NFn+1) [17]. As reported by Vandesompele et al (2002), a threshold value of 0.15 was adopted [17]. In the SD/LD comparison, three genes was sufficient for normalization, since the V3/4 value was <<0.15 (Figure 3B). Differences in the expression stability of the candidate reference genes were less marked in the RL and BL photoperiodic treatment series, than in the other series (Figure 3). The V2/3 value for the RL/BL comparison was 0.091, so that TIP41 together with UKN2 would be sufficient for normalization purposes. Among the cultivars, the pair ACT11 and UKN2 produced a V2/3 value of 0.073. However, for the comparisons based on developmental stage and tissue type, four genes were necessary, since the V3/4 values lay above the threshold. When all the experimental samples were considered together, the V2/3 value was 0.196 and the V3/4 was 0.137, suggesting that the addition of a fourth gene did not improve the quality of the normalization (Figure 3B). Overall, the combination SKIP16, UKN1 and UKN2 was appropriate for all sets of samples.

Stability of expression was then re-analysed using the program NormFinder, which is based on a variance estimation approach [21], and ranks the genes according to their stability under a given set of experimental conditions. The ranking generated by this approach was slightly different from that determined by geNorm (Table 3). ACT11 and UKN1 were still ranked the highest for tissue samples, and ACT11 and UKN2 the highest for inter-cultivar comparisons. HDC, CYP and ACT2/7 ranked consistently poorly. Among developmental stages, EF1b and MTP emerged as the most stably expressed (ranked second and third by geNorm) (Figure 3). ACT11 and TUA5 were identified by both NormFinder and geNorm as being among the three most stable genes under SD and LD treatments. When evaluated across all the experimental samples, the same four genes were identified by both programs, although their rank order was slightly altered.

Table 3 Expression stability of the reference genes, as calculated by NormFinder.

Reference gene validation

The expression pattern of GmFTL3, a soybean FLOWERING LOCUS T (FT) ortholog, was analysed using the selected reference genes (Figure 4). In A. thaliana, FT acts as a floral promoter and an integrator of various flowering pathways [4347]. GmFTL3 has been proposed as a flowering promoter, since its ectopic over-expression in A. thaliana is associated with an extremely early flowering phenotype (unpublished data). Its pattern of expression was assessed at five distinct vegetative growth stages. When normalized using SKIP16, UKN1, MTP and EF1b as reference genes, transcript abundance gradually increased over time, peaking at the onset of flowering (the fourth trifoliolate leaf fully expanded) (Figure 4E). Similar expression patterns were generated when either three or two of the most stable genes (as identified by geNorm) were used for normalization (Figure 4C and 4D). When only one reference gene was employed, its expression was also rather similar to the above patterns (Figure 4A and 4B), but differences were evident in estimated transcript abundance, which was higher when normalized against SKIP16 than against UKN1, presumably because UKN1 transcript level was greater than that of SKIP16 (Figure 1). Normalization based on either of the less stable genes CYP or TUB4 produced a picture of GmFTL3 expression in which transcript level was constant during the vegetative growth stages (Figure 4F and 4G). Its relatively less abundant expression at the onset of flowering was a consequence of CYP and TUB4 up-regulation during this period. It suggested that not only the stability but also the abundance of a reference gene affected the normalized results.

Figure 4

Relative quantification of GmFTL3 expression using validated reference genes for normalization. A: SKIP16; B: UKN1; C: SKIP16 and UKN1; D: SKIP16, UKN1 and MTP; E: SKIP16, UKN1, MTP and EF1b; F: CYP; G: TUB4. The results are represented as a mean fold change in relative expression compared to the first sampling stage (U). cDNA samples taken from the same set used for gene expression stability analysis: U, T1, T2, T3 and T4 indicate, respectively, the aerial part of plants collected at the full expansion of the unifoliolate, the first trifoliolate, the second trifoliolate, the third trifoliolate and the fourth trifoliolate leaf.


Reference genes are routinely used as a means of quantifying gene expression. The ideal reference genes should be expressed at a constant level throughout the plant and not be influenced by exogenous treatment [1, 5]. Housekeeping genes, such as those involved in basic cellular processes (EF1α, UBQ and CYP) or cell structure maintenance (ACT, TUB), have been extensively used, but increasingly it has become apparent that their expression level is not as independent of experimental conditions as had been expected [68, 13, 14, 18, 48]. This implies a need to test in advance the expression stability of any proposed reference gene(s), a procedure which is often not followed in the literature. Normalization based on several reference genes has begun to become the standard, supported by the development of software such as geNorm and Normfinder [17, 21]. However, the prior validation of reference genes remains uncommon in plant research, although it is the norm in human and animal research [2225, 32, 4954].

Soybean has been used as a model plant for the study of photoperiod-induced floral induction [45], but the molecular mechanism underlying this induction remains poorly understood. In soybean, ACT, TUB and UBQ are the most frequently used reference genes (Additional file 1), but there is increasing evidence that their expression is not particularly stable under certain conditions. More recently, some alternative reference genes have emerged [36, 37]. Although four of these (SKIP16, MTP, PEPKR1 and UKN2) have been shown by RT-qPCR to be stably expressed under certain limited experimental conditions, no detailed validation has to date been carried out to test their suitability in experiments involving photoperiodic treatments.

In the present study, we used more subdivided samples to make the data more representative (Additional file 5). To our knowledge, this is the first systematic study of the expression stability of reference genes across such a large number of samples under varied light regimes (SD/LD/DD/LL, RL and BL) in soybean. The 14 reference genes in general out-performed the conventional housekeeping genes, and the poor performance of commonly used genes such as ACT2/7 and TUB4 was of particular note (Figure 3). SKIP16, UKN1 and UKN2 were overall the most stable and were good candidates for the normalization of general gene expression. But different sets of samples had their own best reference genes (Figure 3). For example, ACT11 is one of best reference genes for both different tissue and photoperiod samples, whereas TIP41 did better than ACT11 when studying samples harvesting from different quality light (blue and red light) and SKIP16 was the best reference for developmental material.

The weakness of ACT2 in soybean, rice, potato and sugarcane has been noted previously [32, 37, 39, 40], while ACT2/7 was seen to be rather variable in A. thaliana [9]. However, ACT2/7 was judged to be the most stable of a set of ten conventional housekeeping genes across 21 soybean samples, covering a range of developmental stages [36]. Similarly, TUB performed poorly as a reference gene in grape, potato and soybean [16, 36, 39]. UBQ10, which ranked poorly in the present experiments, was previously deemed unsatisfactory as a reference in soybean [36] and in grape [16], but enjoyed very stable expression in A. thaliana and Brachypodium sp. [9, 33]. EF1b was among the most stable genes both in this study and in a previous study of soybean [36], while in both potato and rice, EF1α was very stably expressed under conditions of biotic and abiotic stress [39]. The same gene was also identified as being highly stable in its expression across tissues of rice [31], but was unstable across tissues and organs of tomato at various developmental stages [38]. TUA5 was identified as being highly stable across development in soybean [36], while in poplar, TUA was very stably expressed across different tissues [41]. Here, TUA5 expression was hardly affected by changes in photoperiod. Globally, the best-performing genes were SKIP16, UKN1, UKN2 and TIP41, while the worst were PEPKR1 and HDC. TIP41 and UKN2 have been noted as showing stable expression across tissues and development in both tomato [38] and aspen [15]. However, TIP41 performed poorly during grape berry development [16], and in the roots and leaves of A. thaliana plants suffering cadmium or copper stress [30]. In aspen cambial cells, UKN2 expression was too unstable for the gene to be used for normalization [15]. Thus, overall, while certain reference genes are stably expressed in one plant species, they may not be well suited for use in others. As a consequence, prior validation of reference genes needs to be carried out under the specific experimental conditions to be applied in gene expression studies.

We report the application of various mathematical and statistical models to minimize bias in the quantification of gene expression in soybean. The first was a conventional statistical test to calculate the coefficient of variance (CV) of Cq values, which allowed an assessment of an individual gene's expression stability. But, due to its low sensitivity and reliability, this method can not clearly define the most stably expressed reference genes. The second exploited geNorm software [17], which showed that the stability of the various candidate reference genes varied considerably across the sets of samples (Figure 1). The third used the alternative program, NormFinder, which ranks the reference genes according to their expression stability [21]. The ranking of genes as revealed by NormFinder was mostly identical to that generated by geNorm (Table 3). Except for TUB4, all the candidate reference genes were represented in the Genevestigator database [55], and most of the expression patterns revealed by Genevestigator microarray data were consistent with the outputs of geNorm and NormFinder in the present data set (Additional file 6 and 7).

It has been argued that co-regulation of genes may confound geNorm analyses, because of the software's tendency to select the genes with a similar expression profile [21]. Among the set of genes tested, two pairs (TUA5/TUB4 and ACT2/7/ACT11) belong to a particular gene family, and thus may be prone to co-regulation. But the possibility that ACT and TUA may be co-regulated is unlikely in this study (Figure 3), given that ACT11 and TUA5 were consistently ranked above ACT2/7 and TUB4 except that TUB4 ranked above TUA5 in different cultivars.

The transcript abundance of many genes is, like GmFTL3, never very high, so any variation in their expression pattern is inevitably subtle. In this study, we normalized the expression of GmFTL3 with a total of seven normalization factors using individual or combinations of two, three and four control genes, and got similar patterns even though the levels of the abundance were different. But normalization with the combination of more genes resulted in improved accuracy. It suggests that the number of reference genes needed to be employed is dependent on the considerations of a researcher's purpose. That is, if one just wants to show a rough expression mode of genes, one reference gene may be enough if this reference gene was confirmed as a stable expressed gene. However, if the researcher hopes to compare the expression among different samples or to accurate the expression level, more reference genes (dependent on the geNorm threshold of 0.15) must be taken. This may be partially explained by that the geNorm threshold is not a strict cut-off and that the observed trend of changing pairwise variation values is equally informative [17, 33, 56].


In the present study, we have investigated the expression of 14 candidate reference genes across a large number of soybean samples in an attempt to identify those most suitable for normalizing gene expression. No gene was consistently superior to the others, but most novel genes were better than the conventionally used housekeeping genes in terms of their expression stability. A combination of the three genes SKIP16, UKN1 and UKN2 provided the most robust platform for transcript normalization across experimental conditions in this study.


Plant Materials

The soybean cultivar Kennong18 (KN18) was used for most experiments. Plants were grown in a growth chamber under short day conditions (8 h light/16 h dark) at a temperature 25°C - 28°C. Seedling tissues were harvested before the expansion of the unifoliolate leaf. The root, hypocotyl, epicotyl, cotyledon, unifoliolate leaf and shoot apex (including the apical meristem and immature leaves) were sampled when the unifoliolate leaves had become fully expanded (about two weeks after sowing). A further sample of the root, along with the stem, unifoliolate leaves, various trifoliolate and lateral leaves, the petiole and the flowers were harvested when the fourth trifoliolate had become fully expanded (45 days after sowing, flowering onset). Pods and seeds were sampled at seven, 14 and 21 days after flowering, and at maturity. The aerial part of plants was also harvested respectively when the unifoliolate, first, second, third trifoliolate, and fourth trifoliolate were fully expanded (Additional file 5, indicated in yellow and green). To study the effect of altering the photoperiod, seedlings were exposed to either a long day (LD, 18 h light/6 h dark) or a short day (SD, 8 h light/16 h dark) regime. Fully expanded unifoliolate leaves were collected at 4 h intervals over 48 h, then the seedlings were transferred to either constant white light (LD) or constant darkness (SD), and the unifoliolate leaves re-sampled at 4 h intervals over a further 48 h (Additional file 5, indicated in grey). The effect of exposure to either red (RL) or blue (BL) light was monitored in etiolated seedlings subjected to red (Red-LED, 658 nm) or blue (Blue-LED, 436 nm) light in a growth chamber under LD conditions. The unifoliolate leaves were harvested at 4 h intervals over 48 h (Additional file 5, indicated in red and blue). Six further soybean cultivars were included: Heihe 27 (HH27), Zhonghuang 13 (ZH13), Jidou 12 (JD12), Tiefeng 31(TF31), Suinong 14 (SN14) and Fudou 1 (FD1). These seedlings were grown under either SD or LD conditions and the unifoliolate leaves were sampled 30 min before the lights were turned off (Additional file 5, indicated in purple). Totally, the experimental samples comprised 44 at various stages of development, 60 exposed to various photoperiod treatments, and 12 involving six different cultivars (Additional file 5). All samples were immediately frozen in liquid nitrogen and stored at -80°C until required.

Total RNA isolation and cDNA synthesis

Total RNA was extracted using the TRIzol reagent (Invitrogen, CA, USA) according to the manufacturer's instructions. Alternatively, total RNA from the petioles was isolated by the CTAB method [57]. Only RNA preparations having an A260/A280 ratio of 1.8-2.0 and an A260/A230 ratio >2.0 were used for subsequent analysis. RNA integrity was verified by 2% agarose gel electrophoresis followed by SYBR Green staining. Before cDNA synthesis, the RNA was treated with RQ1 RNase-free DNase (Promega, Madison, WI, USA), according to the manufacturer's instructions, and first-strand cDNA synthesis was carried out using 4 μg RNA with the help of the RevertAid first strand cDNA synthesis kit (Fermentas, St. Leon-Roth, Germany) and oligo-dT primers, according to the manufacturer's protocol.

Selection of candidate soybean genes

A set of 14 candidate reference genes was selected. This comprised seven conventionally used housekeeping genes; the soybean orthologs of the A. thaliana reference genes TIP41 (At4G34270), HDC (At1G58050) and UKN2 (At4G33380); and SKIP16 (At1G06110), MTP (At2G41790), PEPKR1 (At1G12580) and UKN1 (At3G13410), which were identified as potential reference genes via a soybean microarray gene expression analysis [37].

PCR primer design and test of amplification efficiency

Primers were designed using Beacon Designer v7.0 (Premier Biosoft International, Palo Alto, California, USA) with melting temperatures 58-60°C, primer lengths 20-24 bp and amplicon lengths 60-134 bp. Experimental details are given in Table 2. Exon/intron boundaries were determined by aligning each cDNA sequence with its corresponding genomic sequence, downloaded from Phytozome Five primer pairs were directed to locate on different exons or directly spanning exon-exon junction of each cDNA (Table 2). For each primer pair, reaction efficiency estimates were derived from a standard curve generated from a serial dilution of pooled cDNA. Mean quantification cycle (Cq) values of each ten-fold dilution were plotted against the logarithm of the cDNA dilution factor. An estimate of PCR efficiency was derived from the expression [10(1/-S)-1] × 100%, where S represents the slope of the linear regression [58].

Real-time quantitative RT-PCR

RT-qPCR was conducted using an ABI StepOne Detection System (Applied Biosystems, USA), based on SYBR Premix Ex Taq polymerase (TaKaRa, Toyoto, Japan). Each 15 μl reaction comprised 4 μl template, 7.5 μl 2× SYBR Premix, 0.3 μl (200 nM) of each primer and 0.3 μl ROX. The reactions were subjected to an initial denaturation step of 95°C/10s, followed by 40 cycles of 95°C/5s and 60°C/60s. A melting curve analysis was performed at the end of the PCR run over the range 60-95°C, increasing the temperature stepwise by 0.5°C every 10s. Baseline and quantification cycle (Cq) were automatically determined using the StepOne Software v2.0. Zero template controls were included for each primer pair, and each PCR reaction was carried out in triplicate.

Statistical analysis

Cq values were converted into relative quantities via the delta-Cq method using the sample with the lowest Cq as calibrator and incorporating the calculated amplification efficiencies for each primer pair (Table 2). The stability of reference gene expression was analysed with the geNorm (v3.5) and NormFinder (v0.953) software packages [19, 20]. The former derives a stability measure (M), and via a stepwise exclusion of the least stable gene, creates a stability ranking. It also estimates the number of genes required to calculate a robust normalization factor (NF). NormFinder uses an ANOVA-based model to estimate intra- and inter-group variation, and combines these estimates to provide a direct measure of the variation in expression for each gene. All other statistical analyses were performed with SPSS (v13, SPSS Inc., Chicago, IL).

Microarray data analysis

The stability of the reference gene set was validated using the 3,092 Genevestigator soybean genome microarray dataset, available at [55]. The Meta-Profile Analysis tool was used to represent each reference gene's expression stability according to its UniGene IDs (see Table 1).



quantitative real-time reverse transcriptase PCR


quantification cycle


glyceraldehyde-3-phosphate dehydrogenase









EF1b :

eukaryotic translation elongation factor-1 β

UBQ10 :

ubiquitin 10

SKIP16 :

SKP1/ASK-interacting protein 16




phosphoenolpyruvate carboxylase-related kinase 1


helicase domain containing

TIP41 :

TIP41-like gene

UKN1 :

UKN2:genes of unknown function


coefficient of variation


analysis of variance


normalization factor.


  1. 1.

    Bustin SA: Quantification of mRNA using real-time reverse transcription PCR (RT-PCR): trends and problems. J Mol Endocrinol. 2002, 29 (1): 23-39. 10.1677/jme.0.0290023

    Article  CAS  PubMed  Google Scholar 

  2. 2.

    Bustin SA, Benes V, Nolan T, Pfaffl MW: Quantitative real-time RT-PCR--a perspective. J Mol Endocrinol. 2005, 34 (3): 597-601. 10.1677/jme.1.01755

    Article  CAS  PubMed  Google Scholar 

  3. 3.

    Gachon C, Mingam A, Charrier B: Real-time PCR: what relevance to plant studies?. J Exp Bot. 2004, 55: 1445-1454. 10.1093/jxb/erh181

    Article  CAS  PubMed  Google Scholar 

  4. 4.

    Walker NJ: Tech. Sight. A technique whose time has come. Science. 2002, 296 (5567): 557-559. 10.1126/science.296.5567.557

    Article  CAS  PubMed  Google Scholar 

  5. 5.

    Huggett J, Dheda K, Bustin S, Zumla A: Real-time RT-PCR normalisation; strategies and considerations. Genes Immun. 2005, 6 (4): 279-284. 10.1038/sj.gene.6364190

    Article  CAS  PubMed  Google Scholar 

  6. 6.

    Radonic A, Thulke S, Mackay IM, Landt O, Siegert W, Nitsche A: Guideline to reference gene selection for quantitative real-time PCR. Biochem Biophys Res Commun. 2004, 313 (4): 856-862. 10.1016/j.bbrc.2003.11.177

    Article  CAS  PubMed  Google Scholar 

  7. 7.

    Suzuki T, Higgins PJ, Crawford DR: Control selection for RNA quantitation. Biotechniques. 2000, 29 (2): 332-337.

    CAS  PubMed  Google Scholar 

  8. 8.

    Thellin O, Zorzi W, Lakaye B, De Borman B, Coumans B, Henne G, Grisar T, Igout A, Heinen E: Housekeeping genes as internal standards: use and limits. J Biotechnol. 1999, 75 (2-3): 197-200. 10.1016/S0168-1656(99)00163-7..

    Article  Google Scholar 

  9. 9.

    Czechowski T, Stitt M, Altmann T, Udvardi MK, Scheible WR: Genome-wide identification and testing of superior reference genes for transcript normalization in Arabidopsis. Plant Physiol. 2005, 139 (1): 5-17. 10.1104/pp.105.063743

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  10. 10.

    Gutierrez L, Mauriat M, Pelloux J, Bellini C, Van Wuytswinkel O: Towards a systematic validation of references in real-time rt-PCR. Plant Cell. 2008, 20 (7): 1734-1735. 10.1105/tpc.108.059774

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  11. 11.

    Dheda K, Huggett JF, Bustin SA, Johnson MA, Rook G, Zumla A: Validation of housekeeping genes for normalizing RNA expression in real-time PCR. Biotechniques. 2004, 37 (1): 112-114. 116, 118-119.

    CAS  PubMed  Google Scholar 

  12. 12.

    Ruan W, Lai M: Actin, a reliable marker of internal control?. Clin Chim Acta. 2007, 385 (1-2): 1-5. 10.1016/j.cca.2007.07.003

    Article  CAS  PubMed  Google Scholar 

  13. 13.

    Selvey S, Thompson EW, Matthaei K, Lea RA, Irving MG, Griffiths LR: Beta-actin an unsuitable internal control for RT-PCR. Mol Cell Probes. 2001, 15 (5): 307-311. 10.1006/mcpr.2001.0376

    Article  CAS  PubMed  Google Scholar 

  14. 14.

    Thorrez L, Van Deun K, Tranchevent LC, Van Lommel L, Engelen K, Marchal K, Moreau Y, Van Mechelen I, Schuit F: Using ribosomal protein genes as reference: a tale of caution. PLoS ONE. 2008, 3 (3): e1854- 10.1371/journal.pone.0001854

    PubMed Central  Article  PubMed  Google Scholar 

  15. 15.

    Gutierrez L, Mauriat M, Guenin S, Pelloux J, Lefebvre JF, Louvet R, Rusterucci C, Moritz T, Guerineau F, Bellini C, et al: The lack of a systematic validation of reference genes: a serious pitfall undervalued in reverse transcription-polymerase chain reaction (RT-PCR) analysis in plants. Plant Biotechnol J. 2008, 6 (6): 609-618. 10.1111/j.1467-7652.2008.00346.x

    Article  CAS  PubMed  Google Scholar 

  16. 16.

    Reid KE, Olsson N, Schlosser J, Peng F, Lund ST: An optimized grapevine RNA isolation procedure and statistical determination of reference genes for real-time RT-PCR during berry development. BMC Plant Biol. 2006, 6: 27- 10.1186/1471-2229-6-27

    PubMed Central  Article  PubMed  Google Scholar 

  17. 17.

    Vandesompele J, De Preter K, Pattyn F, Poppe B, Van Roy N, De Paepe A, Speleman F: Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 2002, 3 (7): RESEARCH0034- 10.1186/gb-2002-3-7-research0034

    PubMed Central  Article  PubMed  Google Scholar 

  18. 18.

    Zhu J, He F, Song S, Wang J, Yu J: How many human genes can be defined as housekeeping with current expression data?. BMC Genomics. 2008, 9: 172- 10.1186/1471-2164-9-172

    PubMed Central  Article  PubMed  Google Scholar 

  19. 19.


  20. 20.


  21. 21.

    Andersen CL, Jensen JL, Orntoft TF: Normalization of real-time quantitative reverse transcription-PCR data: a model-based variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets. Cancer Res. 2004, 64 (15): 5245-5250. 10.1158/0008-5472.CAN-04-0496

    Article  CAS  PubMed  Google Scholar 

  22. 22.

    Boda E, Pini A, Hoxha E, Parolisi R, Tempia F: Selection of reference genes for quantitative real-time RT-PCR studies in mouse brain. J Mol Neurosci. 2009, 37 (3): 238-253. 10.1007/s12031-008-9128-9

    Article  CAS  PubMed  Google Scholar 

  23. 23.

    Coulson DT, Brockbank S, Quinn JG, Murphy S, Ravid R, Irvine GB, Johnston JA: Identification of valid reference genes for the normalization of RT qPCR gene expression data in human brain tissue. BMC Mol Biol. 2008, 9: 46- 10.1186/1471-2199-9-46

    PubMed Central  Article  PubMed  Google Scholar 

  24. 24.

    Hoogewijs D, Houthoofd K, Matthijssens F, Vandesompele J, Vanfleteren JR: Selection and validation of a set of reliable reference genes for quantitative sod gene expression analysis in C. elegans. BMC Mol Biol. 2008, 9: 9- 10.1186/1471-2199-9-9

    PubMed Central  Article  PubMed  Google Scholar 

  25. 25.

    Infante C, Matsuoka MP, Asensio E, Canavate JP, Reith M, Manchado M: Selection of housekeeping genes for gene expression studies in larvae from flatfish using real-time PCR. BMC Mol Biol. 2008, 9: 28- 10.1186/1471-2199-9-28

    PubMed Central  Article  PubMed  Google Scholar 

  26. 26.

    Perez R, Tupac-Yupanqui I, Dunner S: Evaluation of suitable reference genes for gene expression studies in bovine muscular tissue. BMC Mol Biol. 2008, 9: 79- 10.1186/1471-2199-9-79

    PubMed Central  Article  PubMed  Google Scholar 

  27. 27.

    Pilbrow AP, Ellmers LJ, Black MA, Moravec CS, Sweet WE, Troughton RW, Richards AM, Frampton CM, Cameron VA: Genomic selection of reference genes for real-time PCR in human myocardium. BMC Med Genomics. 2008, 1: 64- 10.1186/1755-8794-1-64

    PubMed Central  Article  PubMed  Google Scholar 

  28. 28.

    Tang R, Dodd A, Lai D, McNabb WC, Love DR: Validation of zebrafish (Danio rerio) reference genes for quantitative real-time RT-PCR normalization. Acta Biochim Biophys Sin (Shanghai). 2007, 39 (5): 384-390. 10.1111/j.1745-7270.2007.00283.x

    Article  CAS  Google Scholar 

  29. 29.

    Spinsanti G, Panti C, Lazzeri E, Marsili L, Casini S, Frati F, Fossi CM: Selection of reference genes for quantitative RT-PCR studies in striped dolphin (Stenella coeruleoalba) skin biopsies. BMC Mol Biol. 2006, 7: 32- 10.1186/1471-2199-7-32

    PubMed Central  Article  PubMed  Google Scholar 

  30. 30.

    Remans T, Smeets K, Opdenakker K, Mathijsen D, Vangronsveld J, Cuypers A: Normalisation of real-time RT-PCR gene expression measurements in Arabidopsis thaliana exposed to increased metal concentrations. Planta. 2008, 227 (6): 1343-1349. 10.1007/s00425-008-0706-4

    Article  CAS  PubMed  Google Scholar 

  31. 31.

    Jain M, Nijhawan A, Tyagi AK, Khurana JP: Validation of housekeeping genes as internal control for studying gene expression in rice by quantitative real-time PCR. Biochem Biophys Res Commun. 2006, 345 (2): 646-651. 10.1016/j.bbrc.2006.04.140

    Article  CAS  PubMed  Google Scholar 

  32. 32.

    Kim BR, Nam HY, Kim SU, Kim SI, Chang YJ: Normalization of reverse transcription quantitative-PCR with housekeeping genes in rice. Biotechnol Lett. 2003, 25 (21): 1869-1872. 10.1023/A:1026298032009

    Article  CAS  PubMed  Google Scholar 

  33. 33.

    Hong SY, Seo PJ, Yang MS, Xiang F, Park CM: Exploring valid reference genes for gene expression studies in Brachypodium distachyon by real-time PCR. BMC Plant Biol. 2008, 8: 112- 10.1186/1471-2229-8-112

    PubMed Central  Article  PubMed  Google Scholar 

  34. 34.

    Paolacci AR, Tanzarella OA, Porceddu E, Ciaffi M: Identification and validation of reference genes for quantitative RT-PCR normalization in wheat. BMC Mol Biol. 2009, 10 (1): 11- 10.1186/1471-2199-10-11

    PubMed Central  Article  PubMed  Google Scholar 

  35. 35.

    Faccioli P, Ciceri GP, Provero P, Stanca AM, Morcia C, Terzi V: A combined strategy of "in silico" transcriptome analysis and web search engine optimization allows an agile identification of reference genes suitable for normalization in gene expression studies. Plant Mol Biol. 2007, 63 (5): 679-688. 10.1007/s11103-006-9116-9

    Article  CAS  PubMed  Google Scholar 

  36. 36.

    Jian B, Liu B, Bi Y, Hou W, Wu C, Han T: Validation of internal control for gene expression study in soybean by quantitative real-time PCR. BMC Mol Biol. 2008, 9: 59- 10.1186/1471-2199-9-59

    PubMed Central  Article  PubMed  Google Scholar 

  37. 37.

    Libault M, Thibivilliers S, Bilgin D, Radwan O, Benitez M, Clough S, Stacey G: Identification of four soybean reference genes for gene expression normalization. The Plant Genome. 2008, 1: 44-54. 10.3835/plantgenome2008.02.0091..

    Article  CAS  Google Scholar 

  38. 38.

    Exposito-Rodriguez M, Borges AA, Borges-Perez A, Perez JA: Selection of internal control genes for quantitative real-time RT-PCR studies during tomato development process. BMC Plant Biol. 2008, 8: 131- 10.1186/1471-2229-8-131

    PubMed Central  Article  PubMed  Google Scholar 

  39. 39.

    Nicot N, Hausman JF, Hoffmann L, Evers D: Housekeeping gene selection for real-time RT-PCR normalization in potato during biotic and abiotic stress. J Exp Bot. 2005, 56 (421): 2907-2914. 10.1093/jxb/eri285

    Article  CAS  PubMed  Google Scholar 

  40. 40.

    Iskandar HM, Simpson RS, Casu RE, Bonnett GD, MacLean DJ, Manners JM: Comparison of reference genes for quantitative real-time polymerase chain reaction analysis of gene expression in sugarcane. Plant Mol Biol Rep. 2004, 22: 325-337. 10.1007/BF02772676..

    Article  CAS  Google Scholar 

  41. 41.

    Brunner AM, Yakovlev IA, Strauss SH: Validating internal controls for quantitative plant gene expression studies. BMC Plant Biol. 2004, 4: 14- 10.1186/1471-2229-4-14

    PubMed Central  Article  PubMed  Google Scholar 

  42. 42.

    Soybean Genome.

  43. 43.

    Jaeger KE, Wigge PA: FT protein acts as a long-range signal in Arabidopsis. Curr Biol. 2007, 17 (12): 1050-1054. 10.1016/j.cub.2007.05.008

    Article  CAS  PubMed  Google Scholar 

  44. 44.

    Li C, Zhang K, Zeng X, Jackson S, Zhou Y, Hong Y: A cis element within flowering locus T mRNA determines its mobility and facilitates trafficking of heterologous viral RNA. J Virol. 2009, 83 (8): 3540-3548. 10.1128/JVI.02346-08

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  45. 45.

    Liu H, Wang H, Gao P, Xu J, Xu T, Wang J, Wang B, Lin C, Fu YF: Analysis of clock gene homologs using unifoliolates as target organs in soybean (Glycine max). J Plant Physiol. 2009, 166 (3): 278-289. 10.1016/j.jplph.2008.06.003

    Article  CAS  PubMed  Google Scholar 

  46. 46.

    Mathieu J, Warthmann N, Kuttner F, Schmid M: Export of FT protein from phloem companion cells is sufficient for floral induction in Arabidopsis. Curr Biol. 2007, 17 (12): 1055-1060. 10.1016/j.cub.2007.05.009

    Article  CAS  PubMed  Google Scholar 

  47. 47.

    Notaguchi M, Abe M, Kimura T, Daimon Y, Kobayashi T, Yamaguchi A, Tomita Y, Dohi K, Mori M, Araki T: Long-distance, graft-transmissible action of Arabidopsis FLOWERING LOCUS T protein to promote flowering. Plant Cell Physiol. 2008, 49 (11): 1645-1658. 10.1093/pcp/pcn154

    Article  CAS  PubMed  Google Scholar 

  48. 48.

    Robinson TL, Sutherland IA, Sutherland J: Validation of candidate bovine reference genes for use with real-time PCR. Vet Immunol Immunopathol. 2007, 115 (1-2): 160-165. 10.1016/j.vetimm.2006.09.012

    Article  CAS  PubMed  Google Scholar 

  49. 49.

    Ahn K, Huh JW, Park SJ, Kim DS, Ha HS, Kim YJ, Lee JR, Chang KT, Kim HS: Selection of internal reference genes for SYBR green qRT-PCR studies of rhesus monkey (Macaca mulatta) tissues. BMC Mol Biol. 2008, 9: 78- 10.1186/1471-2199-9-78

    PubMed Central  Article  PubMed  Google Scholar 

  50. 50.

    Cicinnati VR, Shen Q, Sotiropoulos GC, Radtke A, Gerken G, Beckebaum S: Validation of putative reference genes for gene expression studies in human hepatocellular carcinoma using real-time quantitative RT-PCR. BMC Cancer. 2008, 8: 350- 10.1186/1471-2407-8-350

    PubMed Central  Article  PubMed  Google Scholar 

  51. 51.

    Fernandes JM, Mommens M, Hagen O, Babiak I, Solberg C: Selection of suitable reference genes for real-time PCR studies of Atlantic halibut development. Comp Biochem Physiol B Biochem Mol Biol. 2008, 150 (1): 23-32. 10.1016/j.cbpb.2008.01.003

    Article  PubMed  Google Scholar 

  52. 52.

    He JQ, Sandford AJ, Wang IM, Stepaniants S, Knight DA, Kicic A, Stick SM, Pare PD: Selection of housekeeping genes for real-time PCR in atopic human bronchial epithelial cells. Eur Respir J. 2008, 32 (3): 755-762. 10.1183/09031936.00129107

    Article  PubMed  Google Scholar 

  53. 53.

    Jung M, Ramankulov A, Roigas J, Johannsen M, Ringsdorf M, Kristiansen G, Jung K: In search of suitable reference genes for gene expression studies of human renal cell carcinoma by real-time PCR. BMC Mol Biol. 2007, 8: 47- 10.1186/1471-2199-8-47

    PubMed Central  Article  PubMed  Google Scholar 

  54. 54.

    Langnaese K, John R, Schweizer H, Ebmeyer U, Keilhoff G: Selection of reference genes for quantitative real-time PCR in a rat asphyxial cardiac arrest model. BMC Mol Biol. 2008, 9: 53- 10.1186/1471-2199-9-53

    PubMed Central  Article  PubMed  Google Scholar 

  55. 55.


  56. 56.

    Perez S, Royo LJ, Astudillo A, Escudero D, Alvarez F, Rodriguez A, Gomez E, Otero J: Identifying the most suitable endogenous control for determining gene expression in hearts from organ donors. BMC Mol Biol. 2007, 8: 114- 10.1186/1471-2199-8-114

    PubMed Central  Article  PubMed  Google Scholar 

  57. 57.

    Suzuki Y, Mae T, Makino A: RNA extraction from various recalcitrant plant tissues with a cethyltrimethylammonium bromide-containing buffer followed by an acid guanidium thiocyanate-phenol-chloroform treatment. Biosci Biotechnol Biochem. 2008, 72 (7): 1951-1953. 10.1271/bbb.80084

    Article  CAS  PubMed  Google Scholar 

  58. 58.

    Ginzinger DG: Gene quantification using real-time quantitative PCR: an emerging technology hits the mainstream. Exp Hematol. 2002, 30 (6): 503-512. 10.1016/S0301-472X(02)00806-8

    Article  CAS  PubMed  Google Scholar 

Download references


This work was supported in part by Transgenic program (Nos 2008ZX08009-001, 2008ZX08004-005, 2008ZX08010-004, and 2009ZX08009-133B), the Chinese National Key Basic Research "973" Program (2010CB125906), the Chinese National "863" Program (Nos 2006AA10Z107, 2006AA10A111, and 2007AA10Z119), the Chinese National Science Foundation (30671245), and the Key Technology R&D Program (2007BAD59B02).

Author information



Corresponding author

Correspondence to Yong-Fu Fu.

Additional information

Authors' contributions

RH performed all the experimental procedures, data analysis and drafted the manuscript. CF participated in the statistical analysis and helped to draft the manuscript. HL and QZ provide the samples and participated in RNA and cDNA preparation. YF designed the project, supervised the study and critically revised the manuscript. All authors read and approved the final manuscript.

Electronic supplementary material


Additional file 1: List of reference genes used for gene expression studies in soybean. The list comprises 54 hits from a search (January 2001 to March 2009) of PubMed, using "soybean" and "gene expression" as keywords. (PDF 32 KB)


Additional file 2: The transcription profiles of individual reference genes given as absolute Cq values across all samples. The scatter plots show the expression levels of the various reference genes. Values are given in the form of quantification cycle numbers (Cq values). (JPEG 333 KB)


Additional file 3: Representative amplification plots and melting curves obtained in the RT-qPCR efficiency test. Four to five ten-fold serial dilutions were plotted against the logarithm of cDNA template concentration. Amplification plots and melting curve images were collected using StepOne software v2.0 (Applied Biosystems). (JPEG 521 KB)

RT-qPCR primer efficiency plots

Additional file 4: . Mean quantification cycle (Cq) values of each set of ten-fold serial dilution plotted against the logarithm of cDNA template concentration. The reaction efficiency (E) is given by [10(1/-S)-1] × 100%, where S represents the slope of the linear regression line. (JPEG 448 KB)


Additional file 5: Tissue/organ sample sets used for the analysis of gene expression. See Methods section for details. (JPEG 1017 KB)


Additional file 6: Expression profiling of six conventional housekeeping genes, using microarray data derived from Genevestigator. The Meta-Profile Analysis tool was used to produce expression profiling from representative UniGene IDs. No probes available for TUB4. (PDF 227 KB)


Additional file 7: Expression profiling of seven new reference genes tested from Genevestigator microarray data. The Meta-Profile Analysis tool was used to produce expression profiling from representative UniGene IDs. (PDF 228 KB)

Authors’ original submitted files for images

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Hu, R., Fan, C., Li, H. et al. Evaluation of putative reference genes for gene expression normalization in soybean by quantitative real-time RT-PCR. BMC Molecular Biol 10, 93 (2009).

Download citation


  • Reference Gene
  • Expression Stability
  • Stable Gene
  • Candidate Reference Gene
  • Potential Reference Gene