# Cfse Case Study Examples

In Section 1 we describe the theory underlying the parameter estimation and in Section 2 we validate it using synthetic datasets. In Section 3 we describe how to deal with statistical issues that may arise with the application of the method to experimental data, and illustrate this with an analysis of data from an *in vitro *T cell proliferation experiment.

### 1. Cell kinetics as a branching process

#### Calculating the probability distribution of cell counts

To apply a maximum likelihood method to estimate parameters of a stochastic model of cell division and death from CFSE data, we need to characterise the probability distribution of cell counts predicted by the model. In this section we outline this calculation for a general branching process model in discrete time, or a Galton-Watson process [23].

In these models, during each timestep a cell can do one of the following: divide, with probability *γ*; survive without dividing, with probability *δ*; or die, with probability 1 - *γ *- *δ *(Figure 1). A particular model of the kinetics of a cell population specifies these probabilities, which in the simplest case might be assumed to be constant. In general they may depend on either the number of divisions the cell has undergone (which we refer to as the generation number), explicitly on time, or both. The key assumptions are that all cells act independently, their offspring generate their own branching processes according to the same rules, and that cells retain no memory of events in previous timesteps other than the total number of divisions they have undergone.

The parameters of biological interest are usually *γ *and *α *(the probabilities of division and death). However, in the formalism we use here it proves simpler to work with the quantities *γ *and *δ *(the probability of survival without division). The probability of death *α *can then be calculated from 1 - *γ *- *δ*. A particular branching process model of cell division is specified by a choice of timestep, a starting condition – the number of cells in each generation at a given time, usually all in generation 0 – and a set of parameters that determine the probabilities *γ*_{i }(*t*) and *δ*_{i }(*t*) for each generation at each subsequent timestep.

Let the state of the cell population at timestep *t *be the vector , where the components are random variables that represent the number of live cells that have divided *i *times. The maximum division number *n *is chosen to be at the limit of detectability on a CFSE profile, or the maximum division number of interest. Given a model and a dataset consisting of the cell counts in each generation at two or more timepoints, we wish to estimate the model parameters. To do this we use the data and the joint probability distribution of **Z**_{t }at each timepoint to construct a likelihood. Maximising this with respect to the model parameters and the timestep provides us with best-fit estimates.

We use a probability-generating function (pgf) approach, described in detail in Methods, which allows us to calculate the moments of the distribution of cell numbers in each generation at one timestep given knowledge of their numbers in the previous timestep. Derivatives of the pgf are used to construct a transition matrix **M **which maps a measured set of cell counts **Z**_{t }to their expected values *E *(**Z**_{t+1}) at the following timestep. For stationary (time-independent) parameters, we show in the Methods section that given any set of initial cell counts

*E *(**Z**_{t}|**Z**_{0}) = **Z**_{0}**M**^{t},

where

and the entries in **M **are the probabilities of a cell in generation *i *dividing (*γ*_{i}) or surviving without dividing (*δ*_{i}), and *γ*_{i }+ *δ*_{i }≤ 1. Typically an experiment begins with a population of undivided cells and so **Z**_{0 }= (*N*_{0}, 0, ..., 0).

This stochastic approach also provides the covariance matrix of cell counts in each generation at time *t*, **V**_{t}, in terms of **Z**_{0}, the *E *(**Z**_{t}) and **M **(see Methods). The framework is easily extended to calculate the quantities *E *(**Z**_{t}) and **V**_{t }when the parameters governing cell kinetics are also functions of time. In the analyses we present below, we used Mathematica [24] to generate *E *(**Z**_{t}) and **V**_{t }given initial cell counts **Z**_{0 }and a set of parameters that specify a branching process model – *i.e*., how the probabilities *γ *and *δ *vary with division and/or time.

This approach can also be applied to a qualitatively different class of models, Markovian branching processes in continuous time. In these models cells have exponentially distributed lifetimes, at the end of which they either divide or die. We describe this in Appendix 1. Indeed the method we discuss in the following section applies to any stochastic model which provides the quantities *E *(**Z**_{t}) and **V**_{t }given a set of initial cell counts **Z**_{0}.

#### Parameter estimation using quasi-likelihood

In principle a likelihood can be computed exactly for any branching process and a dataset. While this is feasible for small cell populations or one or two divisions, with the cell numbers encountered in most experimental situations this becomes intractable for combinatorical reasons (see Appendix 2 for a discussion). As a solution, we take a Quasi Likelihood (QL) approach which requires only the first two moments of the cell counts [25]. QL yields consistent parameter estimates, (that is, the estimates converge to their true values for large sample sizes or large numbers of cells) with minimal confidence intervals [26]. Given the large numbers of cells typically observed in experiments, one might intuitively expect that by the central limit theorem the distribution of cell counts might be well specified by their means and covariances alone.

Let the model parameters be components of the vector ** β**, at let

**Y**be the observed cell counts obtained from a CFSE fluorescence profile at one time point. Let

**(**

*μ***) =**

*β**E*(

**Z**

_{t}) and

**V**(

**) be respectively the expectation values and covariances of the cell counts at that timepoint, expressed as functions of the parameters. Then the following (the 'quasi score function') has properties in common with the derivative of a log-likelihood:**

*β*These properties are *E *(**U**) = 0, cov(**U**) = **D**^{T }**V**^{-1}**D **≡ **i **(** β**) and

*E*(

*∂*U

_{i }(

**)/**

*β**∂β*

_{j}) = -

**i**(

**). A QL estimator of**

*β***,**

*β**** is located at a zero of**

*β***U**. The system

**U**(

**) =**

*β***0**is a system of

*r*nonlinear equations for the

*r*components of the maximum QL estimate of the parameter vector

***. We use an iteratively re-weighted least squares (IRLS) algorithm, or a quasi-Newton step using Fisher scoring (that is, using the information matrix**

*β***i**as an approximation to the Hessian of

**U**) to search for

*** given an initial guess ;**

*β*We find convergence with this algorithm is robust to the choice of initial guess. To speed convergence, particularly with complex models, we select an initial condition by randomly generating a large sample of candidate parameter vectors and choose the one that maximises the likelihood as defined in the following section.

This estimation scheme is easily generalised to use a series of CFSE profiles obtained at multiple timepoints. This overcomes the intrinsic limitation of single CFSE timepoints, which can provide at most 8 or 9 data points, and so increases our confidence in fitted models and ability to discriminate between them. Suppose the experimental data consists of cell counts **Y**_{t }from independent experiments at each of a set of timepoints labeled by the index *t*, and we have a model that provides the corresponding expected cell numbers *μ*_{t }and the covariances **V**_{t}. Since the data at each timepoint are independent they can be used additively to construct the score function. Then if **D**_{t }is the matrix of derivatives of the expected values *μ*_{t }with respect to the parameters ** β**, equation (2) holds with

and

We can extend this further to deal with multiple populations present in unknown proportions, with different kinetics. Take a model in which the total initial cell numbers are known and are thought to comprise *m *distinct subpopulations, present at initial (unknown) frequencies *p*^{(i)}. Each subpopulation labelled by index *i *then has its own expected cell numbers and covariances . We construct the quantities

and use these in the expressions above, with the parameter vector ** β **now including the independent unknowns

*p*

^{(1)}, ...,

*p*

^{(m - 1)}.

The covariance matrix of the parameter estimates cov (** β***) is asymptotically the inverse of the information matrix

**i**(

**). Since**

*β***U**is (asymptotically) the derivative of a log likelihood,

**i**

^{-1 }(

**) is an estimate of the curvature of the log likelihood surface in parameter space. This provides confidence intervals directly if we assume no error in the cell counts**

*β***Y**

_{t }– that is, if all uncertainty in our parameter estimates comes from the underlying stochasticity of cell behaviour expressed by the model. These confidence intervals are typically rather small given the large numbers of cells usually observed in proliferation assays.

We also note that when the observations are generated by a true branching process the weighting to datapoints provided by the covariance structure is not required for generating point estimates of parameters, since the fitting procedure is essentially a minimisation of a sum of squared residuals, each of which is non-negative and is strictly zero (along with the score function) at the QL estimate of the parameters. The covariance structure is important, however, for the correct estimation of confidence intervals on branching process parameters using the information matrix, and for model discrimination using likelihood ratio tests (see below).

A Mathematica notebook which implements the calculation of the mean and covariances of the cell counts, the generation of the initial parameter estimate and the QL estimation procedure is available on request from the authors (AY and CC).

#### Model comparison

Typically there may be several candidate branching process models that might describe the biology and we want to assess the relative support for each. Again, assuming no measurement error in the observed cell counts **Y**_{t}, the usual procedure for comparing two nested models *A *and *B*, *A *with *n *additional parameters is to use the residual deviance [25], defined as twice the difference between the maximum achievable log likelihood given the data and the log likelihood at the QL estimate of the parameters -

*D *(**Y**; ** μ**) = 2

*L*(

**Y**;

**Y**) - 2

*L*(

**Y**;

**),**

*μ*where *L *(**Y**; ** μ**) is the logarithm of the likelihood of a model with expected cell counts

**generating the observations**

*μ***Y**. The quantity

*D*

_{A }–

*D*

_{B }for models

*A*and

*B*is asymptotically

*χ*

^{2}-distributed with

*n*degrees of freedom. This is the standard likelihood ratio test.

The obvious approach would be to integrate the score function **U **(** β**) (eqn. (1)) to obtain an estimate of

*L*. However,

**U**(

**) cannot be expressed as the gradient of a scalar function, and so the quasi-log likelihood is not uniquely specified by the parameters (see refs. [25,27] for a discussion). Instead, to compare models we propose using a log likelihood based on the generalised Pearson statistic for correlated measurements [28], which is simply the residual sum of squares weighted by the predicted covariances:**

*β*The sum is over each independent timepoint and the expectation values *μ*_{t }and covariance matrices **V**_{t }are evaluated at the QL parameter estimates. We note that the derivative of this quantity with respect to the parameters is the score function (1) if we neglect the terms proportional to the derivative of the covariance matrix with respect to the parameters. These terms are second order in the difference between the data and the QL prediction provided by the model. We then calculate a 'surrogate' log likelihood *ℒ* using the relation

*ℒ* = −½*X*^{2}

This is essentially a multivariate normal approximation to the true log likelihood.

To compare non-nested models, the simplest approach is to compare the absolute values of likelihoods (see, for example, [20]) or to use the Akaike Information Criterion. This is necessary when comparing the fits with different timesteps, of which there are usually a restricted set of discrete choices; these are dictated by the maximum division number observed at each timepoint, and the intervals between these timepoints. It can also be used to compare members of a family of models with the same number of parameters – for example, when division or death probabilities are assumed to change at a given, but unknown, division number.

### 2. Validation of the method

#### Testing the validity of the QL estimator

A condition for consistency and normality of the QL estimate ** β*** is that cell numbers in all generations are large. As a preliminary test of the method, and to confirm that QL estimates are reliable when used with the numbers of cells encountered in experimental situations, we used a Monte Carlo procedure to examine the properties of the estimator. We generated synthetic CFSE profiles with repeated numerical simulations of branching processes with three different models, each starting with 10

^{4 }cells. These cell numbers are lower than those typically used in proliferation assays. The models are described in detail in Figure 2 (also see table 1). In model 1, parameters change after the first division; in model 2, the parameters change after the first timestep, and in model 3 we include two populations, one with division-dependent probabilities of division and death, and the other with constant probabilities. For each simulation we calculated the QL estimate of the parameters and their associated confidence intervals assuming asymptotic normality. We then calculated the proportion of simulations in which the predicted confidence intervals contained the true value of each parameter. The close agreement of true and estimated parameters and the accuracy of the predicted 95% and 99% confidence intervals validates our use of QL to estimate parameters with large populations of cells.

Figure 2

**Validation of the quasi-likelihood estimation procedure with artificial datasets**. We generated simulated CFSE datasets using numerical realisations of three different branching processes models of cell kinetics, and tested our estimation procedure by**...**

Table 1

Parameter estimates with synthetic data.

#### Validation of the method in the presence of measurement noise

As a more stringent test we examined how well the QL method could recover branching process parameters in the presence of measurement error (Figure 3). Using model 1 (in which division and death probabilities per timestep changed after the first division) we again used simulated branching processes to generate multiple realisations of a single CFSE timepoint, comprising the cell numbers in 6 generations after 5 timesteps. We then added Gaussian noise of varying amplitudes to (i) the cell counts in each generation (Figure 3, open circles), or (ii) the total cell count (filled circles), preserving the proportions of the population in each generation. The latter scenario is commonly encountered in *in vivo *studies in which recovered cell numbers may be subject to significant uncertainty but the frequencies of cells in each CFSE peak may show little variation between experiments.

Figure 3

**Quasi-likelihood estimation in the presence of noise**. Synthetic CFSE datasets were generated with branching process Model 1, in which probabilities of division and death change after the first division; parameter values were *γ*_{0 }= 0.1, *γ***...**

We make three simple observations here. First, the uncertainty in parameters scales approximately linearly with the amplitude of the noise, and a given fractional uncertainty *σ *in cell counts translates into a comparable fractional uncertainty in parameter estimates. Second, the division probabilities strongly in fluence the shape of the CFSE profile and so in general are estimated more accurately when total counts are subject to noise than when cell counts in each generation are subject to independent error. Third, the division and death probabilities that apply to more CFSE peaks or measurements (in this example, *γ*_{1+ }and *α*_{1+}, which determine the division and death probabilities for all cells in generations 1 and above) can be estimated more accurately than those constrained by fewer measurements (here, *γ*_{0 }and *α*_{0 }for undivided cells). This effect is again more pronounced when the proportions of cells in each generation are known more accurately than the total numbers.

#### Relation of parameters to more complex models

As described in the introduction, the branching process is perhaps a minimal description of cell kinetics. To investigate how and under what conditions its parameters can be related to those of more detailed models, we used synthetic CFSE datasets generated with the homogeneous Smith-Martin model. In this model cells spend exponentially distributed times in the A-phase (G0/G1), with mean 1/*λ*. Cells triggered to divide then transit through a B-phase (S/G2/M) with duration Δ before generating two daughter cells and returning to the A-phase. We assume death is independent of division and occurs at rate *μ *in both A-or B-phases. In Figure 4 we show that the QL procedure identifies a homogeneous model as the best description of the data.

Figure 4

**Using branching processes to describe data generated with the Smith-Martin model of cell kinetics**. Fitting discrete-time branching process (BP) models to a dataset generated with the homogeneous Smith-Martin (SM) model. The dataset comprise 10_{4} cells**...**

The parameters in the branching process (BP) and Smith-Martin (SM) models can be related with some approximations. In this instance of the SM model the probability of a cell dying during a finite interval *τ*, the branching process parameter *α*, is independent of the cell being in the A or B phase and so we predict that the QL estimate *α *should be given by

*α *= 1 – *e*^{-μτ}.

To divide during an interval *τ*, a cell must complete a B-phase during that interval. If Δ <*τ *< 2 Δ, the expected proportion of cells to divide and survive is approximately

We tested the validity of the approximations (5) and (6) by fitting BP models to a series of datasets generated by varying the division rate *λ *in the SM model. For each we compared the quasi-likelihood estimates of the BP parameters *γ *and *α *with their approximations. The results are shown in Figure 5.

Figure 5

**Relating parameters in branching process and Smith-Martin models**. Synthetic CFSE datasets were generated using the homogeneous Smith-Martin model with different division rates *λ *and *μ *= 0.2 day^{-1 }and Δ = 12 hours. Each dataset**...**

The QL procedure identifies the homogeneous model correctly and the estimated death probability *α *agrees closely with the predicted value for all division rates. The QL estimate of division probability *γ *agrees well with the predicted value (6) when the SM division rate *λ *is low, but the two diverge as *λ *increases. The discrete time process does not specify the true (continuous) distribution of interdivision times, but instead 'coarse-grains' this distribution by allowing division at any time within each timestep. For constant probabilities of division and death, this generates a geometric distribution in discrete time, such that (in the absence of cell death) the probability that a given cell observed since *t *= 0 divides during the interval *t*' = *nτ *and *t *= (*n *+ 1) *τ *is *P *(*n*) = *γ *(1 - *γ*)^{n}; while for the SM model with constant parameters the probability density for the interdivision time *t*, *P *(*t*), is exponential with a delay, or *P *(*t*) = 0 for 0 ≤ *t *≤ Δ and *P *(*t*) = *λ *exp (-*λ *(*t *- Δ)) for *t *> Δ. These distributions converge for *t *= *nτ *when division rates are low; that is, when the timestep *τ *is smaller than the average time spent in the A-phase (*τ *<< 1/*λ*) and when the average time spent in the A-phase is much longer than the B-phase (11/*λ *>> Δ).

### 3. Dealing with experimental CFSE data

An important issue when quantifying the dynamics of CFSE-labeled cells is assessing our confidence in the observed cell counts **Y**_{t}. In this section we discuss how to deal with various sources of uncertainty in the cell counts and how these impact on model fitting and comparison. Another significant source of disagreement between model and observations, of course, is that the underlying model may not represent the biology well. With this in mind, what we discuss here applies not only to the discrete time branching models we describe here but also to any stochastic model of cell division that can be used to provide likelihood-based parameter estimates.

#### Uncertainties in the assignment of cells to generations from CFSE profiles

The process of assigning a division number to cells in a CFSE profile can be a significant source of error, particularly if the peaks corresponding to cells in one generation are ill-defined. The distributions of neighbouring peaks usually overlap significantly, and cells in the tails of these distributions may be mis-assigned to neighbouring generations. Further, the factor difference in median fluorescence intensity of adjacent peaks is typically not exactly 2, and this error can amount to uncertainties of as much as a whole division for cells that have divided multiple times. This is particularly noticeable in CFSE profiles which contain distinct subpopulations of cells separated by several divisions and with few cells to mark the location of intermediate generations. In many circumstances, then, the 'gating' or assignment of cells to different divisions is itself a process of inference.

We used a standard algorithm to perform this, based on the Expectation-Maximisation (EM) algorithm [29]. EM is a bounded optimisation technique for the computation of maximum likelihoods typically used in incomplete-data problems. CFSE histograms generated in experiments (*i.e*., the plot of event counts against the logarithm of fluorescence intensity) can usually be approximated well by normal mixtures (*i.e*. a superposition of Gaussian distributions) and estimating the parameters for such a normal mixture is a standard application of the EM algorithm. In practice, we find that the algorithm works well only if we provide good initial conditions for the modes (maxima) of each normal component in the mixture, as well as some constraint on the variance of each component. Initial locations for modes are found by first specifying the data range which contains 99% of the total events, then calculating the offset (alignment of entire fit) and stride (the average fold reduction in fluorescent intensity between peaks) that produce the average largest event count. This works well because the inter-peak distances for CFSE profiles tend to be similar, as we would expect if CFSE is equally distributed between daughter cells. As a result, the initial modes are regularly spaced; however, the EM algorithm is then free to adjust the modes to produce the best fit. We heuristically set a constraint such that the variance of each component is less than or equal to that of the component with the tallest peak. Counts are then estimated using the relative area under each normal component scaled by the total number of cells.

We propose that the uncertainty in the assignment of cells to divisions can be used with a Monte Carlo procedure to assign confidence intervals to maximum-likelihood model parameter estimates from a single CFSE dataset. The method is as follows.

1. Use the EM method to identify a maximum-likelihood set of log-normal profiles from a raw CFSE profile containing *N*_{0 }cells. We refer to the resulting set of counts of cells in each generation as **Y**^{(0)}, where the sum of the elements of **Y**^{(0) }equals *N*_{0}.

2. Using **Y**^{(0) }and a model characterised by a set of parameters ** β**, calculate a best-fit (QL) set of parameter estimates

*β*_{0}.

3. Generate *P *artificial CFSE profiles, as follows. For each generation or peak *k *in the original profile, draw random numbers from the log-normal probability distribution used to fit that peak. This generates a population of *N*_{0 }cells with fluorescent intensities drawn from the predicted distributions. Use this to re-estimate the numbers of cells in each division using the EM method. Repeat this *P *times. This generates a set of new, artificial CFSE fluorescence profiles (**Y**^{(1)}, **Y**^{(2)}, ..., **Y**^{(P)}) derived from the original counts **Y**^{(0)}.

4. For each artificial dataset **Y**^{(i) }calculate a parameter set estimate *β*_{i}.

5. We now have *P *samples from a probability distribution of parameter estimates representing our uncertainty in the assignment of division numbers to cells in the original CFSE profile. Calculate confidence limits on *β*_{0 }from this distribution.

As noted above, if the procedure provides estimates of the division and quiescence probabilities *γ *and *δ*, probabilities of death *α *can be calculated using *α *= 1 - *γ *- *δ*. It is then straightforward to calculate confidence intervals on *α *given the distribution of estimates of *γ *and *δ*.

We also note that each estimate *β*_{i }comes with its own confidence limits, stemming from the stochasticity of the branching process. We thus have at least two independent sources of uncertainty in parameters – one that stems from the uncertainty in the assignment of cells to different generations, which we estimate with the Monte Carlo procedure above; and the other from the underlying stochasticity of the branching processes – that is the range of parameter values that could reasonably (*i.e*. with some significant probability) have generated each of the datasets (**Y**^{(0)}, **Y**^{(1)}, ..., **Y**^{(P)}).

This procedure assumes high levels of confidence in the measured total cell numbers. If only a single experimental replicate is available, one may have little *a priori *knowledge of the uncertainty in total cell counts and its effect on parameter estimates. This may be significant in *in vitro *experiments, but is particularly important when tracking CFSE-labeled cells *in vivo*. For example, if labeled cells are transferred to an animal and recovered blood and/or lymphoid tissues at a later timepoint, there may be both loss of cells in the recovery procedure as well as uncertainties in the number transferred successfully (*e.g*. the initial 'take' after intravenous transfer). We suggest that in the absence of experimental replicates, one approach to this problem is to make a heuristic estimate of the error in total counts, and then apply noise at this level to the total cell counts in the Monte Carlo procedure described above. We describe this in the example that follows.

#### Application to an experimental dataset

To illustrate our method of estimation with branching processes, we apply it to an experimental CFSE dataset (Figure 6). We modeled the response of a polyclonal population of CD8^{+ }T cells to stimulation *in vitro *with anti-CD3 and and anti-CD28 antibodies, in the presence of IL-2 (a growth factor). CFSE profiles from independent cultures were obtained at days 1–4. Little cell death or division was observed in the first 24 h so the 24 h timepoint was taken as the initial condition. The majority of T cells were expected to respond to this stimulus and so we modeled the system as a single population with division or death parameters varying (possibly) with time and/or generation number.

Figure 6

**Estimating parameters from T cell proliferation data**. The best fit of a heterogeneous discrete-time branching process model to a CFSE timecourse obtained by *in vitro *stimulation of 2.5 × 10^{4 }human CD8^{+ }T cells with anti-CD3 and anti-CD28 in saturating**...**

We fitted a variety of models to this data, allowing parameters to vary with time and/or division. The optimal timestep for all models (as measured by the absolute value of the likelihood) was 12 h, and assuming no divisions took place before 36 h. A reasonable fit was obtained with a four-parameter model that allowed undivided cells (generation 0) and divided cells (generations 1+) to have distinct probabilities of division and death; an extension to six parameters allowed different division and death probabilities in generations 0, 1–3 and 4+. The extended model gave a significantly better fit (*χ*^{2 }test on the difference in log likelihoods, on 2 degrees of freedom, *p *< 10^{-6}). The best fit using the six-parameter model and the corresponding parameter estimates are shown in Figure 6 and Table 2.

Table 2

Parameter estimates for the best fit description of the T cell proliferation data.

These results suggest slow recruitment of undivided cells into division after 36 hours, with a significant probability of apoptosis in the undivided population. Cells that have divided once divide again with approximately 40% probability in each 12 h interval, with increased susceptibility to apoptosis; division slows significantly in the fourth generation. Thus the method identifies the slow first division commonly observed in T cell proliferation assays; it also suggests that cells dividing rapidly have an associated high probability of death.

We quote confidence intervals on the parameter estimates using (i) the asymptotic properties of the QL estimator; (ii) the Monte Carlo (MC) method, taking into account the uncertainty of assigning cells to CFSE peaks, and (iii) the more conservative MC method, applying an additional estimate of measurement error (5% Gaussian noise applied to total cell numbers) to each of the MC replicates. We note that the parameters governing the 4th division are not well constrained as their estimation depends on the single measurement of the cell counts in generation 5 at 96 h.

#### Comparing models using estimation of measurement error

An alternative approach with single experimental datasets is to incorporate a contribution **Λ **to the covariance matrices **V**_{t }which represents the combined effects of our uncertainty in the assignment of generation numbers to cells and in total cell counts. The noise is then described by parameters to be estimated directly, and can be considered in the comparison of the fit of different models. Perhaps the simplest reasonable form for **Λ **is

where the next-to-diagonal elements *ρ *represent the misassignment between generations, and the diagonal elements *σ *represent the combination of misassignment and error in total cell counts, if any. The matrix **Λ **may also be expected to vary between timepoints (**Λ **= **Λ**_{t}). We refer to the parameters that characterize these matrices collectively as ** η**.

We cannot apply our QL procedure as it stands to estimate these additional parameters, since they do not appear in the expressions for the expected values of cell counts. Instead, we suggest that the entire parameter set (** β, η**) might be estimated by direct maximisation of a full multivariate normal approximation to the log likelihood,

where now the sum is over all timepoints *t *and over all replicates (Monte Carlo or experimental), *i*.

This quantity can be used directly for model comparison, either with likelihood ratio tests or information criteria statistics such as the AIC [30], although obtaining the estimate of *ℒ* by numerical maximisation of (7) may be difficult for complex models. This has the flavour of a mixed-effects approach [31

As discussed in lesson 6 , PI and BRDU staining provide detailed information regarding cellular state but very limited information regarding the kinetics of the cell cycle. CFSE provides significantly greater kinetic information.

**CFSE - The Cell Division Maker**

CarboxyFluoroscein Succinimidyl Ester, or CFSE, is a vital stain which is, generally, not harmful to cells. Upon entering cells, it undergoes esterase cleavage and diffuses throughout the cytoplasm. As cells divide, the CFSE is split equally between the daughter cells resulting in diminished CFSE signal detection. This division, and resultant signal diminishment occurs with each subsequent cell division. This technique is useful both in vivo and in vitro. The in vivo technique is particularly interesting in that it allows for the harvesting and staining of cells with CFSE followed by the injection of treated cells back into the animal. The cells can then be reharvested at a later date and analyzed for CFSE to indicate how many divisions have occurred in a given time frame. Such analysis are often coupled with drug treatment studies to see how treatment affects the frequency of cell division.

In the images above, you can see an example of CFSE analysis via manual gating. Notice in both images, the Initial Population (IP) is the brightest population with each resulting cell division (1-5) showing reduced CFSE signal. Notice also, that the CFSE analysis shown is of a sub-population of cells as identified by PE counterstaining with a selected antibody.

**Peak Fitting**

In cases where the investigator needs to know not only the number of divisions but also the frequency of cells in each division state, manual gating can be of limited utility due to peak overlap. As with PI and cell cycle, this method tends to produce *qualitative* rather than *quantitative* data. When higher accuracy is desired, it is often necessary to use a peak-fitting algorithm. Peak-fitting applies mathematical modeling to the histogram and produces a quantitative analysis, again complete with internal statistical analysis. Many cytometry packages, such as Weasel (with which we are all familiar), FlowJo and Modfit, include peak-fitting routines. The above image shows a Weasel-based analysis. The instructions for this analysis can be found in the Weasel Help file.

**Wrapping Up**

In order to asses your understanding of the material thus far, please email the answers to the following exercise to the address below. After receiving this I will provide you with access to the next module.

- Practice analyzing the CFSE (FL1 axis) with these data files. As with the above example these cells are counterstained in the PE channel (FL2 axis). Choose a data file and tabulate your results. Be sure to include the method of analysis (curve fitting or manual gating), why you chose the method of analysis, the software used for the analysis, and your overall impressions of the technique. If you have trouble with the an analysis, a detailed example can be found here.

*Special thanks to Sean Linkes for providing the CFSE data and graphics. Sean may be reached at the Flow Cytometry Core Main Facility.*

Return to the Training Home Page.

davadams@umich.edu*Last updated: March 8, 2005*

## 0 thoughts on “Cfse Case Study Examples”

-->