doi:10.1162/neco.2007.19.6.1503

A Method for Selecting the Bin Size of a Time Histogram

Hideaki Shimazaki and Shigeru Shinomoto
Department of Physics, Kyoto University, Kyoto 606-8502, Japan

Neural Computation Vol. 19(6) 1503-1527 (2007)

Abstract:

The time-histogram method is the most basic tool for capturing a time-dependent rate of neuronal spikes. Generally in the neurophysiological literature, the bin size that critically determines the goodness of the fit of the time histogram to the underlying spike rate has been subjectively selected by individual researchers. Here, we propose a method for objectively selecting the bin size from the spike-count statistics alone, so that the resulting bar- or line-graph time histogram best represents the unknown underlying spike rate. For a small number of spike sequences generated from a modestly fluctuating rate, the optimal bin size may diverge, indicating that any time histogram is likely to capture a spurious rate. Given a paucity of data, the present method can nevertheless suggest how many experimental trials should be added in order to obtain a meaningful time-dependent histogram with the required accuracy.

Introduction

Neurophysiological studies are based upon the idea that information is transmitted between cortical neurons by spikes [Johnson, 1996,Dayan and Abbott, 2001]. A number of filtering algorithms have been proposed for estimating the instantaneous activity of an individual neuron or the joint activity of multiple neurons [DiMatteo et al., 2001,Wiener and Richmond, 2002,Sanger, 2002,Kass et al., 2003,Brockwell et al., 2004,Kass et al., 2005,Brown et al., 2004]. The most basic and frequently used tool for spike-rate estimation is the time-histogram method. For instance, one aligns spike sequences at the onset of stimuli repeatedly applied to an animal, and describes the response of a single neuron with a peri-stimulus time histogram (PSTH) or the responses of multiple neurons with a joint PSTH [Adrian, 1928,Gerstein and Kiang, 1960,Gerstein and Perkel, 1969,Abeles, 1982].

The shape of a PSTH is largely dependent on the choice of the bin size. With a bin size that is too large, one cannot represent the time-dependent spike rate. On the other hand, with a bin size that is too small, the time histogram fluctuates largely and one cannot discern the underlying spike rate. There is an appropriate bin size for each set of spike sequences, which is based on the goodness of the fit of the PSTH to the underlying spike rate. For most previously published PSTHs, however, the bin size has been subjectively selected by the authors.

For data points distributed compactly, there are classical theories about how the optimal bin size scales with the total number of data points $n$. It was proven that the optimal bin size scales as $n^{-1/3}$ with regard to the bar-graph-density estimator [Révész, 1968,Scott, 1979]. It was recently found that for two types of infinitely long spike sequences, whose rates fluctuate either smoothly or jaggedly, the optimal bin sizes exhibit different scaling relations with respect to the number of sequences, time scale, and amplitude of rate modulation [Koyama and Shinomoto, 2004].

Though interesting, the scaling relations are valid only for a large amount of data, and are of limited use in selecting a bin size. We devised a method of selecting the bin size of a time histogram from the spike data. In the course of our study, we realized that a theory on the empirical choice of the histogram bin size for a probability density function was presented by Rudemo in 1982 (Scandinavian Journal of Statistics 9: 65-78). Although applicable to a Poisson point process, this theory appears to have rarely been used by neurophysiologists in the analyses of PSTHs. In the actual procedure of neurophysiological experiments, the number of trials (spike sequences) plays an important role in determining the resolution of a PSTH and thus in designing experiments. Therefore it is preferable to have a theory that accords with the common protocol of neurophysiological experiments in which a stimulus is repeated to extract a signal from a neuron. Given a set of experimental data, we wish to not only determine the optimal bin size, but also estimate how many more experimental trials should be performed in order to obtain a resolution we deem sufficient.

For a small number of spike sequences derived from a modestly fluctuating rate, the estimated optimal bin size may diverge, implying that by constructing a PSTH, it is likely that one obtains spurious results for the spike-rate estimation [Koyama and Shinomoto, 2004]. Because a shortage of data underlies this divergence, one can carry out more experiments to obtain a reliable rate estimation. Our method can suggest how many sequences should be added in order to obtain a meaningful time histogram with the required accuracy. As an application of this method, we also show that the scaling relations of the optimal bin size that appears for a large number of spike sequences can be examined from a relatively small amount of data. The degree of the smoothness of an underlying rate process can be estimated by this method. In addition to a bar-graph (piecewise constant) time histogram, we also designed a method for creating a line-graph (piecewise linear) time histogram, which is superior to a bar-graph in the goodness of the fit to the underlying spike rate and in comparing multiple responses to different stimulus conditions.

These empirical methods for the bin size selection for a bar- and a line-graph histogram, estimation of the number of sequences required for the histogram, and estimation of the scaling exponents of the optimal bin size were corroborated by theoretical analysis derived for a generic stochastic rate process. In the next section, we develop the bar-graph (peri-stimulus) time histogram (Bar-PSTH) method, which is the most frequently used PSTH. In Appendix, we develop the line-graph (peri-stimulus) time histogram (Line-PSTH) method.

Optimization of the bar-graph time histogram

We consider sequences of spikes repeatedly recorded from a single neuron under identical experimental conditions. A recent analysis revealed that in vivo spike trains are not simply random, but possess inter-spike-interval distributions intrinsic and specific to individual neurons [Shinomoto et al., 2003,Shinomoto et al., 2005]. However, spikes accumulated from a large number of spike trains are in the majority mutually independent, and can be regarded as being derived from a time-dependent Poisson point process [Snyder, 1975,Daley and Vere-Jones, 1988,Kass et al., 2005].

It would be natural to assess the goodness of the fit of the estimator $\hat \lambda_t$ to the underlying spike rate $\lambda_t$ over the total observation period $T$ by the mean integrated squared error (MISE),

\begin{displaymath}
\mbox{MISE} \equiv \frac{1}{T}\int_0^T E ( \hat{\lambda}_t - \lambda_t)^2  dt,
\end{displaymath} (1)

where $E$ refers to the expectation over different realizations of point events, given $\lambda_t$. We begin with a bar-graph time histogram as $\hat \lambda_t$, and explore a method to select the bin size that minimizes the MISE. The difficulty of the present problem comes from the fact that the underlying spike rate $\lambda_t$ is not known.

Figure 1: The Bar-PSTH. A: an underlying spike rate, $\lambda_t$. The horizontal bars indicate the time averaged rates $\theta$ for individual bins of width $\Delta $. B: sequences of spikes derived from the underlying rate. C: a time histogram for the sample sequences of spikes. The estimated rate $\hat {\theta }$ is the total number of spikes $k$ that entered each bin, divided by the number of sequences $n$ and the bin size $\Delta $.
\includegraphics[width=4.25in]{2BarSchematic}

A bar-graph time histogram is constructed simply by counting the number of spikes that belong to each bin of width $\Delta $. For an observation period $T$, we obtain $N = \lfloor T/\Delta \rfloor$ intervals. The number of spikes accumulated from all $n$ sequences in the $i$th interval is counted as $k_i$. The bar height at the $i$th bin is given as $k_i / n \Delta$. Figure 1 shows the schematic diagram for the construction of a bar-graph time histogram.

Given a bin of width $\Delta $, the expected height of a bar graph for $t \in [0, \Delta]$ is the time-averaged rate,

\begin{displaymath}
\theta = \frac{1}{\Delta }\int_0^\Delta {\lambda _t  dt}.
\end{displaymath} (2)

The total number of spikes $k$ from $n$ spike sequences that enter this bin of width $\Delta $ obeys the Poisson distribution
\begin{displaymath}
p(k  \vert  n \Delta \theta) =\frac{\left( n \Delta \theta \right)^{k}}{k!} e^{-n \Delta \theta}   ,
\end{displaymath} (3)

whose expectation is $n\Delta \theta $. The unbiased estimator for $\theta$ is denoted as $\hat {\theta}=k/(n\Delta )$, which is the empirical height of the bar graph for $t \in [0, \Delta]$.

Selection of the bin size

By segmenting the total observation period $T$ into $N$ intervals of size $\Delta $, the MISE defined in Eq. 1 can be rewritten as

\begin{displaymath}
\mbox{MISE} = \frac{1}{\Delta }\int_0^\Delta \frac{1}{N} \su...
...hat \theta_i - \lambda _{t+(i-1)\Delta} }  )^2 \right\}  dt,
\end{displaymath} (4)

where $\hat{\theta}_i \equiv k_i/(n\Delta )$. Hereafter we denote the average over the segmented rate $\lambda _{t+(i-1)\Delta}$ as an average over an ensemble of (segmented) rate functions $\{ \lambda _{t} \}$ defined on an interval of $t \in [0, \Delta]$, as
\begin{displaymath}
\mbox{MISE} =\frac{1}{\Delta }\int_0^\Delta \left\langle {E   (  {\hat \theta - \lambda _t }  )^2 } \right\rangle dt.
\end{displaymath} (5)

The expectation $E$ refers to the average over the spike count, or $\hat {\theta}=k/(n\Delta )$, given a rate function $\lambda _{t}$, or its mean value $\theta$.

The MISE can be decomposed into two parts,

$\displaystyle \mbox{MISE} = \left\langle {E {( {\hat \theta - \theta } )^2 } } ...
...ta {\left\langle {\left( {\lambda _t - \theta } \right)^2 } \right\rangle dt} .$
(6)

The first and second terms are, respectively, the stochastic fluctuation of the estimator $\hat {\theta }$ around the expected mean rate $\theta$, and the averaged temporal fluctuation of $\lambda_t$ around its mean $\theta$ over an interval of length $\Delta $.

The second term of Eq. 6 can further be decomposed into two parts,

\begin{displaymath}
\frac{1}{\Delta }\int_0^\Delta {\left\langle ( \lambda _t - ...
...t( {\theta - \langle\theta\rangle} \right)^2 } \right\rangle .
\end{displaymath} (7)

The first term in Eq. 7 represents a mean squared fluctuation of the underlying rate $\lambda_t$ from the mean rate $\left\langle \theta \right\rangle$, and is independent of the bin size $\Delta $, because
\begin{displaymath}
\frac{1}{\Delta }\int_0^\Delta \left\langle {\left( {\lambda...
...T \left( {\lambda _t - \langle\theta\rangle } \right)^2   dt.
\end{displaymath} (8)

We define a cost function by subtracting this term from the original MISE,
$\displaystyle C_n(\Delta)$ $\textstyle \equiv$ $\displaystyle \mbox{MISE} - \frac{1}{T} \int_0^T \left( {\lambda _t - \langle\theta\rangle } \right)^2   dt$  
  $\textstyle =$ $\displaystyle \left\langle E(\hat \theta - \theta)^2 \right\rangle - \left\langle {\left( {\theta - \langle\theta\rangle} \right)^2 } \right\rangle.$ (9)

This cost function corresponds to the `risk function' in the report by Rudemo, (Eq. 2.3), obtained by direct decomposition of the MISE [Rudemo, 1982]. The second term in Eq. 9 represents the temporal fluctuation of the expected mean rate $\theta$ for individual intervals of period $\Delta $. As the expected mean rate $\theta$ is not an observable quantity, we have to replace the fluctuation of the expected mean rate with that of the observable estimator $\hat {\theta }$. Using the decomposition rule for an unbiased estimator ( $E\hat {\theta }=\theta )$,
\begin{displaymath}
\left\langle E ( {\hat \theta - \langle E\hat \theta \rangle...
...( {\theta - \langle \theta \rangle} \right)^2 } \right\rangle,
\end{displaymath} (10)

the cost function is transformed into
\begin{displaymath}
C_n\left( \Delta \right) =
2 \left\langle E (\hat \theta - \...
...hat \theta - \langle E\hat \theta \rangle } )^2 \right\rangle.
\end{displaymath} (11)

Due to the assumed Poisson nature of the point process, the number of spikes $k$ counted in each bin obeys a Poisson distribution; the variance of $k$ is equal to the mean. For the estimated rate defined as $\hat {\theta}=k/(n\Delta )$, this variance-mean relation corresponds to

\begin{displaymath}
E (\hat \theta - \theta)^2
= \frac{1}{n\Delta}E {\hat \theta} .
\end{displaymath} (12)

By incorporating Eq. 12 into Eq. 11, the cost function is given as a function of the estimator $\hat {\theta }$,
\begin{displaymath}
C_n\left( \Delta \right) =
\frac{2}{{n\Delta}} \left\langle ...
...hat \theta - \langle E\hat \theta \rangle } )^2 \right\rangle.
\end{displaymath} (13)

The optimal bin size is obtained by minimizing the cost function $C_n (\Delta )$, as
\begin{displaymath}
\Delta^{*} \equiv \arg\min_{\Delta}C_n(\Delta).
\end{displaymath} (14)

By replacing the expectation with the sample spike count, the cost function (Eq. 13) is converted into a useful recipe, summarized as `Algorithm 1.'

Algorithm 1: A method for bin size selection for a Bar-PSTH
width
(i)
Divide the observation period $T$ into $N$ bins of width $\Delta $, and count the number of spikes $k_i$ from all $n$ sequences that enter the $i$th bin.
(ii)
Construct the mean and variance of the number of spikes $\{k_i\}$ as,

\begin{displaymath}\bar{k} \equiv \frac{1}{N} \sum_{i=1}^{N} k_i \mbox{, and } \displaystyle v \equiv \frac{1}{N} \sum_{i=1}^{N} (k_i-\bar{k})^2. \end{displaymath}

(iii)
Compute the cost function,

\begin{displaymath}C_n(\Delta) = \frac{2 \bar{k} - v}{(n \Delta)^2}.\end{displaymath}

(iv)
Repeat i through iii while changing the bin size $\Delta $ to search for $\Delta^{*}$ that minimizes $C_n (\Delta )$.

Extrapolation of the cost function

With the method developed in the preceding subsection, we can determine the optimal bin size for a given set of experimental data. In this section, we develop a method to estimate how the optimal bin size decreases when more experimental trials are added to the data set: Given $n$ sequences, the method provides the cost function for $m (\neq n)$ sequences.

Assume that we are in possession of $n$ spike sequences. The fluctuation of the expected mean rate $\left\langle ( \theta -\left\langle \theta \right\rangle )^2 \right\rangle$ in Eq. 10 is replaced with the empirical fluctuation of the time histogram $\mathop {\hat {\theta}}\nolimits_n$ using the decomposition rule for the unbiased estimator $\hat \theta_n $ satisfying $E\hat \theta_n =\theta $,

\begin{displaymath}
\left\langle E ( {\hat \theta_n - \langle E\hat \theta_n \ra...
...gle { ( {\theta - \langle \theta \rangle} )^2 } \right\rangle.
\end{displaymath} (15)

The expected cost function for $m$ sequences can thus be obtained by substituting the above equation into Eq. 9, yielding
\begin{displaymath}
C_{m}\left( \Delta \vert n \right) =
\left\langle E (\hat \t...
...\theta_n - \langle E\hat \theta_n \rangle } )^2 \right\rangle.
\end{displaymath} (16)

Using the variance-mean relation for a Poisson distribution, Eq. 12, and
\begin{displaymath}
E (\hat \theta_m - \theta )^2
= \frac{1}{m\Delta}E {\hat \theta_m }
= \frac{1}{m\Delta}E {\hat \theta_n } ,
\end{displaymath} (17)

we obtain
\begin{displaymath}
C_{m}\left( \Delta \vert n \right) =
\left( \frac{1}{m}-\fra...
...e E {\hat \theta_n } \right\rangle
+ C_n\left( \Delta \right),
\end{displaymath} (18)

where $C_n \left( \Delta \right)$ is the original cost function (Eq. 13) computed using the estimators $\hat \theta_n $. By replacing the expectation with sample spike count averages, the cost function for $m$ sequences can be extrapolated with this formula, using the sample mean $\bar{k}$ and variance $v$ of the numbers of spikes, given $n$ sequences and the bin size $\Delta $. The extrapolation method is summarized as `Algorithm 2.'

Algorithm 2: A method for extrapolating the cost function for a Bar-PSTH
width
(a)
Construct the extrapolated cost function,

\begin{displaymath}C_{m}\left( \Delta \vert n \right) = \left(\frac{1}{m}-\frac{1}{n}\right)\frac{\bar{k}}{n\Delta^2}+ C_n(\Delta), \end{displaymath}

using the sample mean $\bar{k}$ and variance $v$ of the number of spikes obtained from $n$ sequences of spikes. $C_n (\Delta )$ is the cost function computed for $n$ sequences of spikes with the Algorithm 1.
(b)
Search for $\Delta^{*}_m$ that minimizes $C_{m}\left( \Delta \vert n \right)$.
(c)
Repeat A and B while changing $m$, and plot $1/\Delta^{*}_m$ vs $1/m$ to search for the critical value $1/m=1/\hat n_c$ above which $1/\Delta^{*}_m$ practically vanishes.

It may come to pass that the original cost function $C_n (\Delta )$ computed for $n$ spike sequences does not have a minimum, or have a minimum at a bin size comparable to the observation period $T$. Because of the paucity of data, one may consider carrying out more experiments to obtain a reliable rate estimation. The critical number of sequences $n_{c}$ above which the cost function has a finite bin size $\Delta ^\ast $ may be estimated in the following manner. With a large $\Delta $, the cost function can be expanded as

\begin{displaymath}
C_n(\Delta) \sim \mu \left( {\frac{1}{n} - \frac{1}{{n_c }}} \right)\frac{1}{\Delta} + u\frac{1}{\Delta^2},
\end{displaymath} (19)

where we have introduced $n_c $ and $u$, which are independent of $n$. The optimal bin size undergoes a phase transition from the vanishing $1/\Delta^\ast $ for $n<n_c $ to a finite $1/\Delta^\ast $ for $n>n_c $. The inverse optimal bin size is expanded in the vicinity of $n_c $ as $1/\Delta ^\ast \propto \left( {1/n-1/n_c } \right)$. We can estimate the critical value $\hat n_c$ (see Fig. 3) by applying this asymptotic relation to the set of $\hat \Delta_m^\ast$ estimated from $C_m (\Delta \vert n)$ for various values of $m$:
\begin{displaymath}
\frac{1}{\Delta^{*}_m} \propto \left( {\frac{1}{m} - \frac{1}{{\hat n_c }}} \right).
\end{displaymath} (20)

The minimum number of sequences required for the construction of a Bar-PSTH is estimated from $\lceil \hat n_c \rceil$. It should be noted that there are cases that the optimal bin size exhibits a discontinuous divergence from a finite value. Even in such cases, the plot of $\{1/m, 1/\Delta^{*}\}$ could be useful in exploring the discontinuous transition from nonvanishing values of $1/\Delta^{*}$ to practically vanishing values.

In the opposite extreme, with a sufficiently large number of spike sequences, our method selects a small bin size. It is known that the optimal bin size exhibits a power-law scaling with respect to the number of sequences $n$. The exponent of the scaling relation depends on the smoothness of the underlying rate [Koyama and Shinomoto, 2004]. Given a large number of spike sequences, the presently developed method for extrapolating the cost function can also be utilized to estimate the optimal bin size for a larger number of spike sequence $m > n$, and further estimate the scaling exponent representing the smoothness of the underlying rate.

Theoretical analysis on the optimal bin size

To verify the above mentioned empirical methods, we obtain the `theoretical' cost function of a Bar-PSTH directly from a process with a known underlying rate. Note that this theoretical cost function is not available in real experimental conditions in which the underlying rate is not known.

The present estimator $\hat {\theta }\equiv k/(n\Delta )$ is a uniformly minimum variance unbiased estimator (UMVUE) of $\theta$, which achieves the lower bound of the Cramér-Rao inequality [Blahut, 1987,Cover and Thomas, 1991],

\begin{displaymath}
E (\hat \theta - \theta)^2 =
\left[ - \sum_{k=0}^{\infty} p\...
...\partial \theta^2 } \right]^{-1}
= \frac{\theta }{{n\Delta }}.
\end{displaymath} (21)

Inserting this into Eq. 9, the cost function is represented as
$\displaystyle C_n\left( \Delta \right)$ $\textstyle =$ $\displaystyle \frac{\left\langle\theta\right\rangle}{n \Delta} - \left\langle {\left( {\theta - \langle\theta\rangle} \right)^2 } \right\rangle$  
  $\textstyle =$ $\displaystyle \frac{\mu}{n \Delta} - \frac{1}{{\Delta ^2 }}\int_0^\Delta {\int_0^\Delta {\phi \left( {t_1 - t_2 } \right) dt_1 dt_2 } },$ (22)

where $\mu = \langle \theta \rangle$ is the mean rate, and $\phi (t_1-t_2) = \langle (\lambda_{t_1} - \mu)(\lambda_{t_2} - \mu)\rangle$is the autocorrelation function of the rate fluctuation, $\lambda_t-\mu $. To obtain the last equation, we used
$\displaystyle \left\langle {\left( {\theta - \langle\theta\rangle} \right)^2 } \right\rangle$ $\textstyle =$ $\displaystyle \left\langle {\left\{ {\frac{1}{\Delta }\int_0^\Delta {\left( {\lambda _t - \mu } \right)dt} } \right\}^2 } \right\rangle$  
  $\textstyle =$ $\displaystyle \frac{1}{{\Delta ^2 }}\int_0^\Delta {\int_0^\Delta {\left\langle ...
...\right)\left( {\lambda _{t_2 } - \mu } \right)} \right\rangle  dt_1 } dt_2 } .$ (23)

The cost function with a large bin size can be rewritten as

$\displaystyle C_n\left( \Delta \right)$ $\textstyle =$ $\displaystyle \frac{\mu}{n \Delta} - \frac{1}{\Delta^2} \int_{-\Delta}^{\Delta} (\Delta-\vert t\vert) \phi(t)  dt$  
  $\textstyle \sim$ $\displaystyle \frac{\mu}{n \Delta} - \frac{1}{\Delta} \int_{-\infty}^{\infty} \...
...,dt + \frac{1}{{\Delta ^2 }} \int_{-\infty}^{\infty} \vert t\vert \phi(t)  dt,$ (24)

based on the symmetry $\phi (t) = \phi (-t)$ for a stationary process. Eq. 24 can be identified with Eq. 19 with parameters
$\displaystyle n_c$ $\textstyle =$ $\displaystyle \frac{\mu}{ \int_{-\infty }^\infty \phi(t)  dt } ,$ (25)
$\displaystyle u$ $\textstyle =$ $\displaystyle \int_{-\infty}^{\infty} \vert t\vert \phi(t)  dt.$ (26)

For the process giving the correlation of the form $\phi \left( t \right)=\sigma ^2e^{-\vert t\vert /\tau }$ and the process giving a Gaussian correlation $\phi (t)=\sigma ^2e^{-t^2/\tau ^2}$ used in the simulation in the next section, the critical numbers are obtained as $n_c = \mu / 2 \sigma^2 \tau$ and $n_c = \mu / \sigma^2 \tau \sqrt{\pi}$, respectively.

Based on the same theoretical cost function Eq. 22, we can also derive scaling relations of the optimal bin size, which is achievable with a large number of spike sequences. With a small $\Delta $, the correlation of rate fluctuation $\phi (t)$ can be expanded as $\phi \left( t \right)=\phi \left( 0 \right)+{\phi}'\left( {0_+ } \right)\vert t...
...tstyle{1 \over 2}{\phi }''\left( 0 \right)t^2+O\left( {\vert t\vert ^3} \right)$, and we obtain an expansion of Eq. 22 with respect to $\Delta $,

\begin{displaymath}
C_n(\Delta) = \frac{\mu}{n \Delta} - \phi \left( 0 \right) -...
...- \frac{1}{12}\phi ''\left( 0 \right)\Delta ^2 + O(\Delta ^3).
\end{displaymath} (27)

The optimal bin size $\Delta ^\ast $ is obtained from $dC_n \left( \Delta \right)/d\Delta =0$. For a rate that fluctuates smoothly in time, the correlation function is a smooth function of $t$, resulting in ${\phi}'(0)=0$ due to the symmetry $\phi (t) = \phi (-t)$. In this case, we obtain the scaling relation

\begin{displaymath}
\Delta ^ * \sim \left( { - \frac{6 \mu }{{\phi ''\left( 0 \right)n}}} \right)^{1/3}.
\end{displaymath} (28)

For a rate that fluctuates in a zigzag pattern, in which the correlation of rate fluctuation has a cusp at $t=0$ ( ${\phi }'(0_+ )<0)$, we obtain the scaling relation

\begin{displaymath}
\Delta ^* \sim \left(- \frac{3 \mu}{\phi'(0_{+}) n}\right)^{1/2} ,
\end{displaymath} (29)

by ignoring the second order term. Examples that exhibit this type of scaling are the Ornstein-Uhlenbeck process and the random telegraph process [Kubo and Hashitsume, 1985,Gardiner, 1985].

The scaling relations Eq. 28 and Eq. 29 are the generalization of the ones found in Koyama & Shinomoto (2004) for two specific time-dependent Poisson processes.

Application of the method to spike data

In this section, we apply the presently developed methods for the bar-graph (peri-stimulus) time histogram (Bar-PSTH) and the line-graph (peri-stimulus) time histogram (Line-PSTH) to (1) spike sequences (point events) generated by the simulations of time dependent Poisson processes and (2) spike sequences recorded from a neuron in area MT. The methods for a Line-PSTH are summarized in Appendix.

Application to simulation data

We applied the method for estimating the optimal bin size of a Bar-PSTH to a set of spike sequences derived from a time-dependent Poisson process. Figure 2A shows the `empirical' cost function $C_n (\Delta )$ computed from a set of $n$ spike sequences. The empirical cost function is in good agreement with the `theoretical' cost function computed from the mean spike rate $\mu $ and the correlation $\phi (t)$ of the rate fluctuation, according to Eq. 22. In Figure 2D, a time histogram with the optimal bin size is compared with those with non-optimal bin sizes, demonstrating the effectiveness of optimizing the bin size.

Figure 2: Construction of the optimal Bar-PSTH from spike sequences. A: (Dots): an `empirical' cost function, $C_n (\Delta )$, computed from spike data according to the Algorithm 1. Here, 50 sequences of spikes for an interval of 20 [s] are derived from a time-dependent Poisson process characterized by the mean rate $\mu $ and the correlation of the rate fluctuation $\phi (t)=\sigma ^2e^{-t^2/\tau ^2}$, with $\mu =30$ [sp/s], $\sigma =10$ [sp/s], and $\tau =0.1$ [s]. (Solid line): the `theoretical' cost function computed directly from the underlying fluctuating rate using Eq. 22. B: the underlying fluctuating rate $\lambda_t$. C: spike sequences derived from the rate. D: time histograms made using three types of bin sizes: too small, optimal, and too large.
\includegraphics[width=4.25in]{2BarCostFunction}

The Algorithm 2 provides an estimate of how many sequences should be added for the construction of a Bar-PSTH with the resolution we deem sufficient. In this method, the cost function of $m$ sequences is obtained by modifying the original cost function, $C_n \left( \Delta \right)$, computed from the spike count statistics of $n$ sequences of spikes. In the subsequent applications of this extrapolation method, the original cost function, $C_n \left( \Delta \right)$, was obtained by averaging over the initial partitioning positions.

We applied this extrapolation method for a Bar-PSTH to a set of spike sequences derived from the smoothly regulated Poisson process. Figure 3A depicts the extrapolated cost function $C_m \left( {\Delta \vert n} \right)$ for several values of $m$, computed from a given set of $n(=30)$ spike sequences. Figure 3B represents the dependence of an inverse optimal bin size $1/\Delta_m^\ast $ on $1/m$. The inverse optimal bin size, $1/\Delta_m^\ast $, stays near $0$ for $1/m>1/\hat n_c$, and departs from $0$ linearly with $m$ for $1/m\le 1/\mathop {\hat {n}}\nolimits_c $. By fitting the linear function, we estimated the critical number of sequences $\hat n_c$. In Figure 3C, the critical number of sequences $\hat n_c$ estimated from a smaller or larger $n$ is compared to the theoretical value of $n_c $ computed from Eq. 25. It is noteworthy that the estimated $\hat n_c$ approximates the theoretical $n_c $ well even from a fairly small number of spike sequences, with which the estimated optimal bin size diverges.

Figure: The extrapolation method. A: extrapolated cost functions $C_m \left( {\Delta \vert n} \right)$ of a Bar-PSTH for several values of $m$, computed from 30 sequences of spikes derived from a Poisson process characterized by the mean rate $\mu $ and the correlation of the rate fluctuation $\phi (t)=\sigma ^2e^{-t^2/\tau ^2}$, with $\mu =30$, $\sigma =2$, and $\tau =0.1$. The modestly fluctuating rate with $\sigma =2$ is used as compared with the rate with $\sigma =10$ used in Fig. 1. B: the inverse optimal bin size $1/\Delta_m^\ast $ plotted against $1/m$. A solid line is a linear function fitted to the data. C: the critical number of sequences $\hat n_c$ extrapolated from a smaller or larger number of spike sequences. The horizontal axis represents the number of spike sequences $n$ used to obtain the extrapolated cost function $C_m \left( {\Delta \vert n} \right)$. The vertical axis represents an extrapolated critical number $\hat n_c$. The dashed lines represent a theoretical value $n_c $ computed using Eq. 25, and a diagonal.
\includegraphics[width=4.25in]{2BarCriticality}

We also constructed a method for selecting bin size of a Line-PSTH, which is summarized as `Algorithm 3' in Appendix. Figure 4 compares the optimal Bar-PSTH and the optimal Line-PSTH obtained from the same set of spike sequences, demonstrating the superiority of the Line-PSTH to the Bar-PSTH in the sense of the MISE. In addition, the Line-PSTH is suitable for the comparison of multiple time-dependent rates, as is the case for filtering methods. The extrapolation method for a Line-PSTH is summarized as `Algorithm 4' in Appendix.

Figure 4: The comparison of the optimal Bar-PSTH and the optimal Line-PSTH. The spike data used are the same as the data used in Fig. 2. A: the optimal Bar-PSTH constructed from the spike sequences derived from an underlying fluctuating rate process (dashed line). B: the optimal Line-PSTH constructed from the same set of spike sequences.
\includegraphics[width=4.25in]{2LineBar}

With the extrapolation method for a Bar-PSTH (Algorithm 2), one can also estimate how much the optimal bin size decreases (i.e., the resolution increases) with the number of spike sequences. Figure 5A shows log-log plots of the optimal bin sizes $\Delta_m^\ast $ versus $m$ with respect to two rate processes that fluctuate either smoothly according to the stochastic process characterized by the mean spike rate $\mu $ and the correlation of the rate process $\phi \left( t \right)=\sigma ^2e^{-t^2/\tau ^2}$ or in a zigzag pattern according to the Ornstein-Uhlenbeck process characterized by the mean rate $\mu $ and the correlation of the rate process $\phi \left( t \right)=\sigma ^2e^{-\vert t\vert /\tau }$. These plots exhibit power-law scaling relations with distinctly different scaling exponents. The estimated exponents ($-0.34\pm 0.04$ for the smooth rate process and $-0.56\pm 0.04$ for the zigzag rate process) are close to the exponents of $m^{-1/3}$ and $m^{-1/2}$ that were obtained analytically as in Eqs. 28 and 29 (see also ref. Koyama & Shinomoto, 2004). In this way, we can estimate the degree of smoothness of the underlying rate from a reasonable amount of spike data.

Figure 5: Scaling relations for the Bar-PSTH and the Line-PSTH. The optimal bin sizes $\Delta_m^\ast $ are estimated from the extrapolated cost function $C_m \left( {\Delta \vert n} \right)$, computed from $100$ sequences of spikes. The optimal bin sizes $\Delta_m^\ast $ versus $m$ is shown in log-log scale. Two types of rate processes were examined; one that fluctuates smoothly, which is characterized by the correlation of the rate fluctuation $\phi \left( t \right)=\sigma ^2e^{-t^2/\tau ^2}$, and the other that fluctuates jaggedly, which is characterized by the correlation of the rate fluctuation $\phi \left( t \right)=\sigma ^2e^{-\vert t\vert /\tau }$. The parameter values are the same as the ones used in Fig. 2. A: log-log plots of the optimal bin size with respect to $m$ for a Bar-PSTH. Lines are fitted to the data in an interval of $m\in [50,500]$, whose regression coefficients, $-0.34\pm 0.04$ and $-0.56\pm 0.04$, correspond to the scaling exponents for a Bar-PSTH. B: log-log plots of the optimal bin size with respect to $m$ for a Line-PSTH. Lines are fitted to the data in an interval of $m\in [50,500]$, whose regression coefficients, $-0.24\pm 0.04$ and $-0.50\pm 0.05$, correspond to the scaling exponents for a Line-PSTH.
\includegraphics[width=4.25in]{2Scaling}

With the extrapolation method for a Line-PSTH (Algorithm 4 in Appendix), the scaling relations for a Line-PSTH can be examined in a similar manner. Figure 5B represents the optimal bin sizes computed for two rate processes that either fluctuate smoothly or in a zigzag pattern. The estimated exponents are $-0.24\pm 0.04$ for the smooth rate process and $-0.50\pm 0.05$ for the zigzag rate process. The exponents obtained by the extrapolation method are similar to the analytically obtained exponents, $m^{-1/5}$ and $m^{-1/2}$ respectively (see Eqs. 41 and 42). Note that for the smoothly fluctuating rate process, the scaling relation for the Line-PSTH is $m^{-1/5}$, whereas the scaling relation for a Bar-PSTH is $m^{-1/3}$. In contrast, for the rate process that fluctuates jaggedly, the exponents of the scaling relations for both a Bar-PSTH and a Line-PSTH are $m^{-1/2}$.

Application to experimental data

We also applied our method for optimizing a Bar-PSTH to publicly available neuronal spike data [Britten et al., 2004]. The details of the experimental methods are described in the reference articles [Newsome et al., 1989,Britten et al., 1992]. It was reported that the neurons in area MT exhibit highly reproducible temporal responses to identical visual stimuli [Bair and Koch, 1996]. The estimation of a time-dependent rate by a PSTH is, however, sensitive to the choice of the bin size. Therefore it would be preferable that such an appraisal of the reproducibility is tested with our objective method of optimizing the bin size. To make a reliable estimate of the optimal bin size from the spike sequences of a short observation period, we took an average of the cost functions computed under different partitioning positions.

Figure 6: The Bar-PSTHs for the spike sequences recorded from a MT neuron (w052 in nsa2004.1, Britten et al., 2004). (Left): the cost function (averaged over different partitioning positions) (Right): spike sequences sampled and the optimal Bar-PSTH for the first 1 [s] of the recording period 2 [s]. A: the first 5 spike sequences. B: the first 20 spike sequences that include 5 sequences of A. C: the total 50 spike sequences that include 20 sequences of B.
\includegraphics[width=4.25in]{2Experimental}

We examine here the data recorded from a neuron under the repeated application of a random dot visual stimulus with 3.2% of coherent motion (w052, nsa2004.1, Britten et al., 2004). The method for a Bar-PSTH was applied to the data, with close attention paid to how the optimal bin size changes with the number of spike sequences sampled. Figure 6 depicts the results for the first $n=5$, the first $n=20$ and the total $n=50$ sequences of spikes.

The optimal bin size practically diverges ($\Delta^*=1000$ [ms]) for the $n=5$ sequences, implying that with this small amount of data a discussion about the time-dependent response does not make sense, as far as we rely on the histogram method. Even in this stage, it is possible to extrapolate the cost function by means of Algorithm 2. The critical number of trials, above which the optimal bin size is finite, was estimated as $\hat n_c \approx 12$. By increasing the number of spikes to $n=20$, we obtained the optimal bin size $\Delta^*=33$ [ms], implying that a discussion about the neuronal response is possible based on this amount of data. The critical number of trials estimated from this data set ($n=20$) is $\hat n_c \approx 10$. The optimal bin size for the total $50$ sequences is $\Delta^*=23$ [ms]. The critical number of sequences estimated from the total $50$ sequences is $\hat n_c \approx 12$.

Algorithm 1 confirms that the temporal rate modulation can be discussed with a sufficiently large number of sequences, such as $n=20$ or $n=50$. With the Algorithm 2, three sets of sequences with $n=5$, $20$ and $50$ suggest the critical number of sequence to be $\hat n_c \approx 10$-$12$.

Discussion

We have developed a method for selecting the bin size, so that the Bar-PSTH (Section 2) or the Line-PSTH (Appendix) best represents the (unknown) underlying spike rate. The suitability of this method was demonstrated by applying it to not only model spike sequences generated by time-dependent Poisson processes, but also real spike sequences recorded from cortical area MT of a monkey.

For a small number of spike sequences derived from a modestly fluctuating rate, the cost function does not have a minimum ( $C_n (T)<C_n (\Delta )$ for any $\Delta <T)$, or has a minimum at a bin size $\Delta $ that is comparable to the observation period $T$, implying that the time-dependent rate can not be captured by means of the time histogram method. Our method can nevertheless extrapolate the cost function for any number of spike sequences, and suggest how many sequences should be added in order to obtain a meaningful time histogram with the resolution we require. The presently examined model data and real data illustrated that the optimal bin size may `diverge' ( $\Delta ^\ast =O(T)$), and even under such conditions our extrapolation method works reasonably well.

In the present study, we adopted the mean integrated squared error (MISE) as measuring the goodness of the fit of a PSTH to the underlying spike rate. There were studies of the density estimation based on other standards such as the Kullback-Leibler divergence [Hall, 1990] and the Hellinger distance [Kanazawa, 1993]. To our knowledge, however, there are yet no practical algorithms based on these standards that optimize the bin size solely with raw data. It is interesting to explore practical methods based on other standards to the estimation of the time dependent rate, and compare them with the presently developed MISE criterion.

Use of regular (equal) bin size is a constraint particular to the present method. The regular histogram is suitable for stationary time-dependent rates. In practical applications, however, the rate does not necessarily fluctuate in a stationary fashion, but can change abruptly during a localized period of time. In such a situation, a histogram with variable bin width can better fit to the underlying rate than a regular histogram can. In order to make a better estimate of the underlying rate in the sense of the MISE, it is desirable to develop an adaptive method that adjusts the bin size over time. It is also desirable to develop the present optimization method for the Bar-PSTH and the Line-PSTH into a method for higher order spline fittings that can be compared with filtering methods. Nevertheless, as long as the Bar-PSTHs or the Line-PSTHs are used as conventional rate-estimation tools, the present method for selecting the bin size should be used for their construction.

Acknowledgements

The authors thank Shinsuke Koyama, Takeaki Shimokawa and Kensuke Arai for helpful discussions. The authors also acknowledge K. H. Britten, M. N. Shadlen, W. T. Newsome, J. A. Movshon who have made their data available to the public. This study is supported in part by Grants-in-Aid for Scientific Research to SS from the Ministry of Education, Culture, Sports, Science and Technology of Japan (16300068, 18020015) and the 21st Century COE ``Center for Diversity and Universality in Physics". HS is supported by the Research Fellowship of the Japan Society for the Promotion of Science for Young Scientists.

Appendix: Optimization of the line-graph time histogram

A line graph can be constructed by simply connecting top-centers of adjacent bar graphs of the height $k_i /(n\Delta )$. Figure 7 schematically shows the construction of a line-graph time histogram. For the same set of spike sequences, the optimal bin size (window size) of a Line-PSTH is, however, different from that of a Bar-PSTH. We develop here a method for selecting the bin size for a Line-PSTH.

Figure 7: The Line-PSTH. A: an underlying spike rate, $\lambda_t$. The horizontal bars indicate the original bar graphs $\theta^{-}$ and $\theta^{+}$ for adjacent intervals $[-\Delta, 0]$ and $[0, \Delta]$. The expected line graph $L_t$ is a line connecting the two top-centers of adjacent expected bar graphs $(-\frac{\Delta}{2}, \theta^{-})$ and $(\frac{\Delta}{2}, \theta^{+})$. B: sequences of spikes derived from the underlying rate. C: a time histogram for the sample sequences of spikes. The empirical line graph $\hat L_t$ is a line connecting the two top-centers of adjacent empirical bar graphs $(-\frac{\Delta}{2},\hat \theta^{-})$ and $(\frac{\Delta}{2},\hat \theta^{+})$.
\includegraphics[width=4.25in]{2LineSchematic}

The expected heights of adjacent bar graphs for intervals of $[-\Delta, 0]$ and $[0, \Delta]$ are

\begin{eqnarray*}
\theta^{-} &\equiv& \frac{1}{\Delta }\int_{ - \Delta }^0 {\lam...
...a^{+} &\equiv& \frac{1}{\Delta }\int_0^\Delta {\lambda _t  dt}.
\end{eqnarray*}

The expected line graph $L_t$ in an interval of $[-\textstyle{\Delta \over 2},\textstyle{\Delta \over 2}]$ is a line connecting top-centers of these bar graphs,
\begin{displaymath}
L_t = \frac{{\theta^{+} + \theta^{-} }}{2} + \frac{{\theta^{+} - \theta^{-} }}{\Delta }t.
\end{displaymath} (30)

The unbiased estimators of the original bar graphs are $\hat {\theta}^-=k^-/(n\Delta )$ and $\hat {\theta }^+=k^+/(n\Delta )$, where $k^-$ and $k^+$ are the numbers of spikes from $n$ spike sequences that enter the intervals $[-\Delta, 0]$ and $[0, \Delta]$, respectively. The empirical line graph $\hat L_t$ in an interval of $[-\textstyle{\Delta \over 2},\textstyle{\Delta \over 2}]$ is a line connecting two top-centers of adjacent empirical bar graphs,
\begin{displaymath}
\hat L_t = \frac{{\hat \theta^{+} + \hat \theta^{-} }}{2} + \frac{{\hat \theta^{+} - \hat \theta^{-} }}{\Delta }t.
\end{displaymath} (31)

Selection of the bin size

The time average of the MISE (Eq. 1) is rewritten by the average over the $N$ segmented rates,

\begin{displaymath}
\mbox{MISE} = \frac{1}{\Delta }\int_{ - \Delta /2}^{ \Delta...
...\langle E ( {\hat L_t - \lambda _t } )^2 \right\rangle  dt}.
\end{displaymath} (32)

The MISE can be decomposed into two parts,
\begin{displaymath}
\mbox{MISE} = \frac{1}{\Delta }\int_{ - \Delta /2}^{ \Delta ...
...gle{\left( {\lambda _t - L_t } \right)^2 }\right\rangle  dt .
\end{displaymath} (33)

The first term of Eq. 33 is the stochastic fluctuation of the empirical linear estimator $\hat L_t$ due to the stochastic point events, which can be computed as

\begin{displaymath}
\frac{1}{\Delta }\int_{ - \Delta /2}^{ \Delta /2} \left\lang...
...\langle E( {\hat \theta ^ + - \theta ^ + } )^2 \right\rangle .
\end{displaymath} (34)

Here we have used the relations, $E{( {\hat {\theta }^+-\theta ^+} )( {\hat {\theta }^-\theta ^-} )} =0$ and $\left\langle {E( {\hat \theta ^ + - \theta ^ + } )^2 } \right\rangle = \left\langle {E( {\hat \theta ^ - - \theta ^ - } )^2 } \right\rangle$.

The second term of Eq. 33 is the temporal fluctuation of $\lambda_t$ around the expected linear estimator $L_t$. We expands the second term by inserting $\mu=\left\langle {\theta ^-} \right\rangle
=\left\langle {\theta ^+} \right\rangle$, and obtain

$\displaystyle \frac{1}{\Delta }\int_{ - \Delta /2}^{ \Delta /2} \left\langle\left( {\lambda _t - L_t } \right)^2\right\rangle  dt$ $\textstyle =$ $\displaystyle \frac{1}{\Delta }\int_{ - \Delta /2}^{\Delta /2} {\left\langle \left( {\lambda _t - \mu } \right)^2 \right\rangle dt}$  
 
$\displaystyle - \frac{2}{\Delta }\int_{ - \Delta /2}^{\Delta /2} {\left\langle ...
...a /2}^{\Delta /2} {\left\langle \left( {L_t - \mu } \right)^2 \right\rangle dt}$
 
    $\textstyle =$ $\displaystyle \frac{1}{\Delta }\int_{ - \Delta /2}^{\Delta /2} {\left\langle \left( {\lambda _t - \mu } \right)^2 \right\rangle dt}$  
 
$\displaystyle - \left\{ 2\left\langle {\left( {\theta ^ + - \mu } \right)\left(...
...ft\langle {\left( {\theta ^ + - \mu } \right)\theta ^* } \right\rangle \right\}$
 
 
$\displaystyle + \left\{ \frac{2}{3}\left\langle {( {\theta ^ + - \mu } )^2 } \r...
...langle {( {\theta ^ + - \mu } )( {\theta ^ - - \mu } )} \right\rangle \right\},$
(35)
where

\begin{eqnarray*}
\theta^{0} &\equiv& \frac{1}{\Delta }\int_{ - \Delta /2}^{\Del...
...}{\Delta^2 }\int_{ - \Delta /2}^{\Delta /2} {t \lambda _t  dt}.
\end{eqnarray*}

To obtain the above equation, we have used relations, $\left\langle {\theta ^-} \right\rangle =\left\langle {\theta ^+} \right\rangle =\left\langle {\theta ^0} \right\rangle =\mu $, $\left\langle {\theta ^\ast } \right\rangle =0$, $\left\langle {( {\theta ^ + - \mu } )^2 } \right\rangle = \left\langle {( {\theta ^ - - \mu } )^2 } \right\rangle$, $\left\langle {\left( {\theta ^ + - \mu } \right)\left( {\theta ^0 - \mu } \righ...
...ft( {\theta ^ - - \mu } \right)\left( {\theta ^0 - \mu } \right)} \right\rangle$, and $\left\langle {\left( {\theta ^ + - \mu } \right)\theta ^* } \right\rangle = - \left\langle {\left( {\theta ^ - - \mu } \right)\theta ^* } \right\rangle$.

The first term of Eq. 35 represents a mean squared fluctuation of the underlying rate, and is independent of the choice of the bin size $\Delta $ (See Eq. 8). We introduce the cost function by subtracting the variance of the underlying rate from the original MISE:

$\displaystyle C_n\left( \Delta \right)$ $\textstyle \equiv$ $\displaystyle \mbox{MISE} - \frac{1}{T} \int_{0}^{T} (\lambda_t - \mu)^2  dt$  
  $\textstyle =$ $\displaystyle \frac{2}{3}\left\langle {E {({\hat {\theta }^+-\theta ^+} )}^2 } ...
...} \right\rangle -2\left\langle {( {\theta ^+-\mu })\theta ^\ast } \right\rangle$  
    $\displaystyle + \frac{2}{3}\left\langle {E {( {\theta ^+-\mu } )}^2 } \right\ra...
...c{1}{3}\left\langle {( {\theta ^+ -\mu } )( {\theta ^-\mu } )} \right\rangle ,$ (36)

The first term of Eq. 36 can be estimated from the data, using the variance-mean relation (Eq. 12). The last four terms of Eq. 36 are the covariances of the expected rates $\theta ^-,\theta ^+,\theta ^0$, and $\theta ^\ast $, which are not observables. We can estimate them by using the covariance decomposition rule for unbiased estimators:

$\displaystyle \left\langle {( {\theta ^+-\langle {\theta ^+} \rangle } ) ( {\theta ^p-\langle {\theta ^p} \rangle } )} \right\rangle$ $\textstyle =$ $\displaystyle \left\langle {E\left[ {\left( {\hat {\theta }^+-\left\langle {E {...
...t\langle {E {\hat {\theta }^p}} \right\rangle } \right)} \right]} \right\rangle$  
    $\displaystyle - \left\langle {E\left[ {\left( {\hat {\theta }^+-\theta ^+} \right)\left( {\hat {\theta }^p-\theta ^p} \right)} \right]} \right\rangle .$ (37)

where $p$ denotes $-,+,0$, or $\ast $. The first term of Eq. 37 can be computed directly from the data, using the relation $\left\langle {E\hat {\theta }^-} \right\rangle =\left\langle {E\hat {\theta }^+} \right\rangle =\left\langle {E\hat {\theta }^0} \right\rangle =\mu $, and $\left\langle {E\hat {\theta }^\ast} \right\rangle =0$. Unlike the Bar-PSTH, however, the mean-variance relation for the Poisson statistics is not directly applicable to the second term. We suggest estimating these covariance terms from multiple sample sequences, as in `Algorithm 3.'.

Algorithm 3: A method for bin size selection for a Line-PSTH
width
(i)
Divide the observation period $T$ into $N+1$ bins of width $\Delta $.
Count the number of spikes $k_{i}(j)$ that enter $i$th bin from $j$th sequence.
Define $k_{i}^{(-)}(j) \equiv k_{i}(j)$ and $k_{i}^{(+)}(j) \equiv k_{i+1}(j)$ $(i=1,2,\cdots,N)$.
Divide the period $[\Delta/2, T-\Delta/2]$ into $N$ bins of width $\Delta $.
Count the number of spikes $k_{i}^{(0)}(j)$ that enter $i$th bin from $j$th sequence.
In each bin, compute $k_{i}^{(*)}(j) \equiv 2 \sum_{\ell} t_{i}^{\ell}(j)/ \Delta$, where $t_{i}^{\ell}(j)$ is the time of the $\ell$th spike that enters the $i$th bin, measured from the center of each bin.
(ii)
Sum up $k_{i}^{(p)}(j)$ for all sequences: $\displaystyle k_i^{(p)} \equiv \sum_{j = 1}^n k_{i}^{(p)}(j)$, where $p=\{-, +, 0, *\}$.
Average those spike counts with respect to all the bins: $\displaystyle \bar k^{(p)} \equiv \frac{1}{N}\sum_{i = 1}^{N} {k_i^{(p)} }$.
Compute covariances of $k_i^{(+)}$ and $k_i^{(p)}$:

\begin{displaymath}c^{(+, p)} \equiv \frac{1}{{N}}\sum_{i = 1}^{N} {\left( {k_i...
... k^{(+)} } \right)\left( {k_i^{(p)} - \bar k^{(p)} } \right)}. \end{displaymath}

Compute covariances of $k_{i}^{(+)}(j)$ and $k_{i}^{(p)}(j)$ with respect to sequences,
and average over time:

\begin{displaymath}\bar c^{(+, p)} \equiv \frac{1}{{N}}\sum_{i = 1}^{N} \frac{1...
...right)\left( {k_{i}^{(p)} (j) - \frac{k_i^{(p)}}{n} } \right). \end{displaymath}

Finally, compute

\begin{displaymath}\sigma^{(+, p)} \equiv \frac{c^{(+, p)}}{ (n \Delta)^2} - \frac{\bar c^{(+, p)}}{n \Delta^2}. \end{displaymath}

(iii)
Compute the cost function:

\begin{displaymath}C_n(\Delta)=\frac{2}{3} \frac{\bar k^{(+)}}{(n \Delta)^2} - 2...
...} + \frac{2}{3}\sigma^{(+, +)} + \frac{1}{3}\sigma^{(+, -)}. \end{displaymath}

(iv)
Repeat i through iii while changing $\Delta $ to search for $\Delta^*$ that minimizes $C_n (\Delta )$.

Extrapolation of the cost function

As in the case of the Bar-PSTH, the cost function for any $m$ sequences of spike trains can be extrapolated using the variance-mean relation for the Poisson statistics,

$\displaystyle C_m\left( \Delta \vert n \right) =
\frac{2}{3 \Delta} \left(\frac...
...ht)\left\langle {E \hat \theta^{+}_n} \right\rangle
+ C_n\left( \Delta \right),$
(38)

where $C_n \left( \Delta \right)$ is the original cost function (Eq. 36). With the original cost function $C_n \left( \Delta \right)$, we can easily estimate a cost function for $m$ sequences with `Algorithm 4'.

Algorithm 4: A method for extrapolating the cost function for a Line-PSTH
width
(a)
Construct the extrapolated cost function

\begin{displaymath}C_{m}\left( \Delta \vert n \right) = \frac{2}{3} \left(\frac{...
...\frac{1}{n}\right)\frac{\bar k^{(+)}}{n\Delta^2}+ C_n(\Delta), \end{displaymath}

where $C_n (\Delta )$ is the cost function for the line-graph time histogram computed for $n$ sequences of spikes with the Algorithm 3.
(b)
Search for $\Delta^{*}_m$ that minimizes $C_{m}\left( \Delta \vert n \right)$.
(c)
Repeat A and B while changing $m$, and plot $1/\Delta^{*}_m$ vs $1/m$ to search for the critical value $1/m=1/\hat n_c$ above which $1/\Delta^{*}_m$ practically vanishes.

Scaling relations of the optimal bin sizes

We can obtain a `theoretical' cost function of a Line-PSTH directly from the mean spike rate $\mu $ and the correlation $\phi (t)$ of the rate fluctuation. According to the mean-variance relation based on the Cramér-Rao (in)equality (Eq. 21), the cost function (Eq. 36) is given by

$\displaystyle C_n\left( \Delta \right)$ $\textstyle =$ $\displaystyle \frac{2 \mu}{3 n \Delta} - \frac{2}{{\Delta ^2 }} \int_0^\Delta {...
... 1+\frac{2t_2}{\Delta} \right) \phi \left( {t_1 - t_2 } \right)  dt_1 dt_2 } }$  
    $\displaystyle + \frac{2}{{3 \Delta ^2 }} \int_0^\Delta {\int_0^\Delta {\phi \le...
...nt_0^\Delta {\int_{-\Delta}^0 {\phi \left( {t_1 - t_2 } \right) dt_1 dt_2 } }.$ (39)

By expanding the correlation of the rate fluctuation,

\begin{displaymath}\phi \left( t \right) = \phi \left( 0 \right) + \phi '\left( ...
...hi''''\left( 0 \right) t^4 + O\left( {\vert t\vert^5 } \right),\end{displaymath}

we obtain
 

(40)\begin{displaymath}
C_n(\Delta) = \frac{2 \mu}{3 n \Delta} - \phi\left( 0 \right...
...rac{49}{2880}\phi''''\left( 0 \right)\Delta ^4 + O(\Delta ^5).
\end{displaymath}

Unlike the Bar-PSTH, the line graph successfully approximates the original rate to first order in $\Delta $, and therefore the $O(\Delta ^2)$ term in the cost function vanishes for a Line-PSTH.

The optimal bin size $\Delta ^\ast $ is obtained from $dC_n \left( \Delta \right)/d\Delta =0$. For a rate process that fluctuates smoothly in time, the correlation function is a smooth function, resulting in ${\phi}'(0)=0$ and ${\phi }'''(0)=0$ due to the symmetry $\phi (t) = \phi (-t)$, and we obtain the scaling relation

\begin{displaymath}
\Delta ^ * \sim \left( { \frac{1280 \mu }{{181 \phi ''''\left( 0 \right)n}}} \right)^{1/5}.
\end{displaymath} (41)

The exponent $-1/5$ for a Line-PSTH is different than the exponent $-1/3$ for a Bar-PSTH (Eq. 28).

If the correlation of rate fluctuation has a cusp at $t=0$ ( ${\phi }'(0_+ )<0)$, we obtain the scaling relation

\begin{displaymath}
\Delta ^* \sim \left( - \frac{96 \mu}{37 \phi'(0_{+}) n} \right)^{1/2},
\end{displaymath} (42)

by ignoring the higher order terms. The exponent $-1/2$ for a Line-PSTH is the same as the exponent for a Bar-PSTH (Eq. 29).

Bibliography

Abeles, 1982
Abeles, M. (1982).
Quantification, smoothing, and confidence-limits for single-units histograms.
Journal of Neuroscience Methods, 5(4):317-325.

Adrian, 1928
Adrian, E. (1928).
The Basis of Sensation: The Action of the Sense Organs.
W.W. Norton, New York.

Bair and Koch, 1996
Bair, W. and Koch, C. (1996).
Temporal precision of spike trains in extrastriate cortex of the behaving macaque monkey.
Neural Computation, 8(6):1185-1202.

Blahut, 1987
Blahut, R. (1987).
Principles and practice of information theory.
Addison-Wesley, Reading, Mass.

Britten et al., 1992
Britten, K. H., Shadlen, M. N., Newsome, W. T., and Movshon, J. A. (1992).
The analysis of visual-motion - a comparison of neuronal and psychophysical performance.
Journal of Neuroscience, 12(12):4745-4765.

Britten et al., 2004
Britten, K. H., Shadlen, M. N., Newsome, W. T., and Movshon, J. A. (2004).
Responses of single neurons in macaque MT/V5 as a function of motion coherence in stochastic dot stimuli.
The Neural Signal Archive, page nsa2004.1.
http://www.neuralsignal.org.

Brockwell et al., 2004
Brockwell, A. E., Rojas, A. L., and Kass, R. E. (2004).
Recursive bayesian decoding of motor cortical signals by particle filtering.
Journal of Neurophysiology, 91(4):1899-1907.

Brown et al., 2004
Brown, E. N., Kass, R. E., and Mitra, P. P. (2004).
Multiple neural spike train data analysis: state-of-the-art and future challenges.
Nature Neuroscience, 7(5):456-461.

Cover and Thomas, 1991
Cover, T. M. and Thomas, J. A. (1991).
Elements of Information Theory.
John Wiley & Sons, Inc., New York.

Daley and Vere-Jones, 1988
Daley, D. and Vere-Jones, D. (1988).
An Introduction to the Theory of Point Processes.
Springer-Verlag, New York.

Dayan and Abbott, 2001
Dayan, P. and Abbott, L. (2001).
Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems.
The MIT Press, Cambridge.

DiMatteo et al., 2001
DiMatteo, I., Genovese, C. R., and Kass, R. E. (2001).
Bayesian curve-fitting with free-knot splines.
Biometrika, 88(4):1055-1071.

Gardiner, 1985
Gardiner, C. (1985).
Handbook of stochastic methods: for physics, chemistry and the naural sciences.
Springer-Verlag, Berlin.

Gerstein and Kiang, 1960
Gerstein, G. L. and Kiang, N. Y. S. (1960).
An approach to the quantitative analysis of electrophysiological data from single neurons.
Biophysical Journal, 1(1):15-28.

Gerstein and Perkel, 1969
Gerstein, G. L. and Perkel, D. H. (1969).
Simultaneously recorded trains of action potentials - analysis and functional interpretation.
Science, 164(3881):828-830.

Hall, 1990
Hall, P. (1990).
Akaikes information criterion and kullback-leibler loss for histogram density-estimation.
Probability Theory and Related Fields, 85(4):449-467.

Johnson, 1996
Johnson, D. H. (1996).
Point process models of single-neuron discharges.
Journal of Computational Neuroscience, 3(4):275-299.

Kanazawa, 1993
Kanazawa, Y. (1993).
Hellinger distance and akaike information criterion for the histogram.
Statistics & Probability Letters, 17(4):293-298.

Kass et al., 2005
Kass, R. E., Ventura, V., and Brown, E. N. (2005).
Statistical issues in the analysis of neuronal data.
Journal of Neurophysiology, 94(1):8-25.

Kass et al., 2003
Kass, R. E., Ventura, V., and Cai, C. (2003).
Statistical smoothing of neuronal data.
Network-Computation in Neural Systems, 14(1):5-15.

Koyama and Shinomoto, 2004
Koyama, S. and Shinomoto, S. (2004).
Histogram bin width selection for time-dependent poisson processes.
Journal of Physics A-Mathematical and General, 37(29):7255-7265.

Kubo and Hashitsume, 1985
Kubo, R. Toda, M. and Hashitsume, N. (1985).
Nonequilibrium statistical mechanics.
Springer-Verlag, Berlin.

Newsome et al., 1989
Newsome, W. T., Britten, K. H., and Movshon, J. A. (1989).
Neuronal correlates of a perceptual decision.
Nature, 341(6237):52-54.

Révész, 1968
Révész, P. (1968).
The laws of large numbers.
Academic Press, New York.

Rudemo, 1982
Rudemo, M. (1982).
Empirical choice of histograms and kernel density estimators.
Scandinavian Journal of Statistics, 9(2):65-78.

Sanger, 2002
Sanger, T. D. (2002).
Decoding neural spike trains: Calculating the probability that a spike train and an external signal are related.
Journal of Neurophysiology, 87(3):1659-1663.

Scott, 1979
Scott, D. W. (1979).
Optimal and data-based histograms.
Biometrika, 66(3):605-610.

Shinomoto et al., 2005
Shinomoto, S., Miyazaki, Y., Tamura, H., and Fujita, I. (2005).
Regional and laminar differences in in vivo firing patterns of primate cortical neurons.
Journal of Neurophysiology, 94(1):567-575.

Shinomoto et al., 2003
Shinomoto, S., Shima, K., and Tanji, J. (2003).
Differences in spiking patterns among cortical neurons.
Neural Computation, 15(12):2823-2842.

Snyder, 1975
Snyder, D. (1975).
Random Point Processes.
John Wiley & Sons, Inc., New York.

Wiener and Richmond, 2002
Wiener, M. C. and Richmond, B. J. (2002).
Model based decoding of spike trains.
Biosystems, 67(1-3):295-300.