Each row of this matrix, , is called the state change vector, and consists of elements , , that represent the stoichiometric change of species due to reaction. This condition can be used to mimic, for instance, the non-limiting availability of some chemical resources, or the execution of in vitro buffering experiments, in which an adequate supply of some species is introduced in in order to keep their quantity constant [45]. The traditional way to calculate the stochastic temporal evolution of consists in solving the so-called Chemical Master Equation CME , which describes the probability distribution function associated to [46].

Numerical solution algorithms for the CME are usually based on matrix descriptions of the discrete-state Markov process [47] ; anyway, these methods are computationally expensive and not always feasible, especially for systems consisting of many molecular species, for which the number of reachable states is huge or even countably infinite. Several analytical solution algorithms for the CME exist, for instance those based on uniformization methods [48] — [50] , finite state projection algorithms [51] , [52] or the sliding window method [53] ; other methods were also introduced for special reaction systems characterized by particular initial conditions see, e.

A different strategy to solve the CME consists in generating trajectories of the underlying Markov process. A method of this type is the stochastic simulation algorithm SSA [20] , [55] , which provides exact realizations of the associated continuous time, discrete state space jump Markov process of a biochemical system , whose initially conditioned density function is determined by the CME itself; as such, SSA is logically equivalent to the CME [21].

- Stochastic Simulation: Algorithms and Analysis (Stochastic Modelling and Applied Probability, 100).
- Stochastic Simulation Algorithms Analysis by Peter Glynn Søren Asmussen - AbeBooks.
- Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (Screenplay).
- Boundaries: When to Say Yes, How to Say No to Take Control of Your Life.
- Table of contents!
- The Tycoons Make-Believe Fiancée.

Briefly, starting from the system state , SSA determines which reaction will be executed during the next time interval , by calculating the probability of each reaction to occur in the next infinitesimal time step. This probability is proportional to , being the propensity function of reaction , where is the number of distinct combinations of the reactant molecules occurring in and is a stochastic constant encompassing the physical and chemical properties of [55].

The time before a reaction takes place is chosen according to the following equation: where is a random value sampled in [0,1] with a uniform probability, and. The index of the reaction to be executed is the smallest integer in such that where is a random value sampled in [0,1] with a uniform probability. In [25] an approximate but faster version of SSA, called tau-leaping, was introduced for the purpose of reducing the computational burden typical of SSA. SSA and tau-leaping share the characteristic that, even starting from the same initial state of the system, repeated executions of the algorithms will produce usually quantitative, but potentially also qualitative different temporal dynamics, thus reflecting the inherent noise of the system.

- Navigation menu.
- Stochastic simulation;
- Epub Stochastic Simulation: Algorithms And Analysis 2007.
- The Ultimate MBA: Meaningful Biblical Analogies for Business!
- Employment Policies in the Soviet Union and Eastern Europe!
- Black Hand Over Europe;
- Modern Information Systems!
- You may also be interested in...!
- Shadow of the Thylacine: One Mans Epic Search for the Tasmanian Tiger.
- Apoptosis and Cell Proliferation.
- Smart Power: Toward a Prudent Foreign Policy for America.

These two algorithms, anyway, differ with respect to the way reactions are applied at each step: in SSA, only one reaction is applied, while with tau-leaping several reactions can be applied. We present here the main features of tau-leaping, that are beneficial to illustrate the choices at the basis of the GPU implementation proposed in this work. We refer to [25] , [56] for further details and, especially, to [27] , which describes the improved version of the tau-leaping algorithm considered here.

Given a state of the system , let denote the exact number of times that a reaction would be fired in the time interval ; denotes the probability distribution vector having as elements.

## Epub Stochastic Simulation: Algorithms And Analysis 2007

For arbitrary values of , the computation of the values can be as difficult as solving the corresponding CME. On the contrary, if is small enough so that the change in the state during is so slight that no propensity function will suffer an appreciable change in its value this is called the leap condition , then it is possible to evaluate a good approximation of by using the Poisson random variables with mean and variance. So doing, the stochastic temporal evolution of the system is no longer exact as in the case of SSA ; however, the accuracy of tau-leaping can be fixed a priori by means of an error control parameter , which is involved in the computation of the changes in the propensity functions and of the time increment.

The propensity functions change as a consequence of the modification in the molecular amounts of the reactant species, therefore the leap condition must be verified after each state update. This is achieved by evaluating an additional quantity for each species , which is related to the highest order of the reactions in which is involved as a reactant see [27] for details. This information, along with the number of molecules of involved in all highest-order reactions given by the system state , is then used to bound the relative change of.

Starting from the state and choosing a value that satisfies the leap condition, the state of the system at time is updated according to 2 where denotes an independent sample of the Poisson random variable with mean and variance equal to. Note that the execution of many reactions per step could lead to negative amounts of the molecular species in [25].

To be more precise, if the reactions executed during a step consume a number of reactant molecules greater than those occurring in the system, then negative species amounts would be generated; therefore, the simulation step cannot be executed. To avoid these situations, some reactions are considered as critical : a reaction is marked as critical if there are not sufficient reactant molecules to fire it at least times in the next time interval.

In this work we use the threshold , as suggested in [27]. At each iteration of tau-leaping, all reactions are partitioned into the sets of non-critical reactions and critical reactions. Only a single reaction belonging to — selected following the SSA procedure — is allowed to fire during. The length of the step satisfying the leap condition is calculated as 3 where is the set of indices of reactant species not involved in critical reactions, and the values and are computed as follows: 4. If the execution of a tau-leaping step would lead to negative amounts of some species, then the value is halved and the number of reactions to execute is sampled ex novo.

Finally, if is smaller than a multiple of — which corresponds to the average time increment of SSA — then a certain number of SSA steps is executed because, given the actual state of the system, this will be more accurate and efficient than a tau-leaping step.

The emerging field of GPGPU computing allows developers to exploit the great computational power of modern multi-core GPUs, by giving access to the underlying parallel architecture that was conceived for speeding up real-time three-dimensional computer graphics. CUDA automatically handles the control flow divergence, that is, threads can take different execution paths in a transparent way for the programmer.

Nevertheless, conditional branches should be avoided whenever possible as they cause a reduction of performances, due to the serialization of the execution until reconvergence. For this reason, the tau-leaping algorithm required a major reconstruction in order to reduce the need for conditional branches, as will be described in the Results section.

Threads can be organized in three-dimensional structures named blocks which, in turn, are contained in three-dimensional grids. Whenever the host runs a kernel, the GPU creates the corresponding grid and automatically schedules each block on one streaming multiprocessor SM available on the GPU, a solution that allows a transparent scaling of performances on different devices see Figure 1 , left side. CUDA poses limitations on the number of threads a block may contain: up to threads can be distributed in the three dimensions, and each dimension must not exceed threads.

The SM organizes scheduled blocks in batches consisting in 32 parallel threads, called warps. Since more than one block can be assigned at once to the same SM, a warp scheduler manages the execution of warps. Schematic representation of CUDA threads and memory hierarchy. Left side. Thread organization: a single kernel is launched from the host the CPU and is executed in multiple threads on the device the GPU ; threads can be organized in three-dimensional structures named blocks which can be, in turn, organized in three-dimensional grids.

The dimensions of blocks and grids are explicitly defined by the programmer. Right side. Memory hierarchy: threads can access data from many different memories with different scopes; registers and local memories are private for each thread. Shared memory let threads belonging to the same block communicate, and has low access latency. All threads can access the global memory, which suffers of high latencies but is cached since the introduction of Fermi architecture.

Texture and constant memories can be read from any thread and feature a cache as well. Threads can read and write data from different kinds of memories Figure 1 , right side : the global memory visible from all threads , the shared memory accessible from threads belonging to the same block , and the local memory registers and arrays, accessible from the owner thread.

Furthermore, all threads can read data from two cached memories: the constant memory and the texture memory. CUDA offers other types of memory, like the page-locked memory, portable memory, and mapped memory; as our implementation does not exploit these additional features, they go beyond the scope of the present paper and will not be further described here. The global memory is generally very large up to thousands MBs , but suffers from high access latencies, whilst the shared memory is faster but much smaller tens of thousands KBs for each SM. Being a very small resource on each multiprocessor, the shared memory poses constraints on the blocks size, thus limiting the number of simultaneous threads that can be executed at once.

However, in order to achieve the best performances, the shared memory represents a precious resource that must be exploited as much as possible. These considerations are central in our implementation of tau-leaping and will be described in more detail in the Results section. Since the introduction of the Fermi architecture, the global memory features a small amount of cache, which makes the use of the texture memory counterproductive [57]. This cache resides on the same on-chip memory 64 KB for each SM that is used for both cache and shared memories, and gives the programmer the opportunity to balance the two memory amounts.

Our GPU implementation of tau-leaping was optimized for the Fermi architecture, since it heavily relies on the availability of shared memory for performance reasons. Stochastic simulation algorithms exploit random numbers generators RNGs. Nvidia's software development kit [58] contains several libraries and utilities that help developers in the process of creating software for this architecture; CURAND is a RNG library which allows the GPU-based generation of random deviates that can be used both by the host via memory copy or directly by the device. MT was not used for the implementation of cuTauLeaping because it has three drawbacks: at most threads per block can operate simultaneously [59] , the memory footprint is larger than the other generators [63] , it is much slower than the other two algorithms.

In this section we describe the development of cuTauLeaping and its application to perform parallel stochastic simulations in a massively parallel way, by running multiple independent simulations as parallel CUDA threads. We introduce our GPU-oriented design of tau-leaping, consisting in a four phases workflow, and present the data structures, the memory allocation strategies and the advanced functions exploited on the Fermi architecture.

We compare the computational performances of cuTauLeaping with the CPU implementation of tau-leaping provided in the software COPASI, a well known application for the simulation and the analysis of biochemical networks [66]. To this aim, we exploit as benchmarks four stochastic models of biological systems of increasing complexity, formally described in Text S1. In addition, to analyze the influence of the size of the model i. Finally, we show the advantages of using cuTauLeaping to investigate the effects of systematic perturbations on the system dynamics. Within these ranges, the numerical values of each varied parameter were determined with a linear sampling for the amounts of molecular species; a logarithmic sampling was instead considered for stochastic constants if not stated otherwise , in order to uniformly span over many orders of magnitude.

All PSA were performed by generating a set of different initial conditions — corresponding to different parameterizations of the model under investigation — and then automatically executing the parallel stochastic simulations with cuTauLeaping. In cuTauLeaping, the workflow of the traditional tau-leaping algorithm is partitioned in different phases, which altogether allow a better exploitation of the parallel architecture of the GPU than a monolithic implementation.

The rationale behind this choice is that the resources on each SM are limited, thus they would be quickly consumed by the data structures employed by tau-leaping, causing a low occupancy of the GPU that would then result in worse performances. Therefore, the partitioning of tau-leaping workflow in different phases allows a faster execution of the simulations, thanks to the reduced memory footprint, which yields a higher level of parallelism. In this section we describe in detail the design and the implementation of the different CUDA kernels that stand at the basis of cuTauLeaping.

The four phases that constitute cuTauLeaping, schematized in Figure 2 , are:. The phases are iterated until all threads have reached , a termination criterion verified during phase P4. Each thread proceeds by applying tau-leaping or SSA steps, which are mutually exclusive, according to the value of a vector , where for each the element is set to if the -th thread must execute SSA, if the -th thread must execute tau-leaping, while the value corresponds to the signal of terminated simulation.

In cuTauLeaping the first two phases are implemented in a single kernel, so that the tau-leaping step can be executed right after the calculation of the value, without the need for a global memory write e. In particular, during the first two phases, after the computation of the putative value for non-critical reactions, a second putative time step value related to critical reactions is calculated, and the smallest one is used in the current tau-leaping step.

If the first putative value is used, then only non-critical reactions are sampled from the Poisson distributions and applied; otherwise, besides non-critical reactions, also one critical reaction is selected and applied as described in [27]. The four phases are implemented in the following kernels, which are executed in a sequential manner by each thread : kernel : if , then terminate the kernel; otherwise, calculate the value for non-critical reactions.

If , then and terminate the kernel; else and execute a tau-leaping step updating the system state according to Equation 2 , by executing a set of non-critical reactions and, possibly, one critical reaction and the global simulation time by setting. If , then and terminate the kernel;. In Figure 3 we report the pseudocode of the host side procedure devoted to invoke the CUDA kernels; in Figures 4 , 5 , 6 we present the pseudocodes of kernels , , , respectively.

Kernels are iteratively repeated until for all threads. This termination criterion is efficiently verified by kernel that exploits two advanced CUDA functionalities introduced with the Fermi architecture: synchronizations with predicate evaluation and atomic functions. Synchronization functions are generally used to coordinate the communication between threads, but CUDA allows to exploit these functions to evaluate a predicate for all threads in a block; atomic functions allow to perform read-modify-write operations without any interference from any other thread, therefore avoiding the race condition.

A combination of these functionalities allows to determine whether all threads have terminated their execution i. In addition, since both functionalities are hardware-accelerated, the resulting computational complexity is , making them more efficient than other equivalent methodologies, e. Host-side pseudocode of cuTauLeaping. As a first step, the stoichiometric information of the reactions is exploited to pre-calculate the data structures needed by the algorithm; all matrices are flattened during this process.

Then, once the support memory areas are allocated e. Device-side pseudocode of kernel in cuTauLeaping, implementing the subdivision of threads according to the value and the execution of a tau-leaping step. The kernel starts by loading the vectors and — which correspond to the current state of the system and to the values of stochastic constants, respectively — from the global memory areas that contain these data for all threads. Since these information are frequently accessed, they are immediately copied into the faster shared memory as vectors x and c , respectively.

The kernel continues by verifying that the value for the running thread is not equal to the signal of terminated execution i. Then, it calculates the propensity functions of all reactions and accumulates their values in ; if , the remaining time instants where the dynamics of the system is sampled are set to the current state and the simulation is terminated. The kernel concludes the phase P1 by calculating a putative value for the tau-leaping step: if is smaller than , then thread is halted and is set to 0, so that it will perform the SSA steps during the next phase.

Otherwise, the tau-leaping algorithm is performed by executing a set of non-critical reactions and possibly one critical reaction and, if the simulation has overrun one of the sampling time instants, the state stored in is determined by linear interpolation. The kernel continues by verifying that the value for the running thread is equal to the signal corresponding to SSA i. Then, it performs a fixed number of SSA steps in our default setting , where a single reaction is executed at each step , storing the system state at the sampled time instants.

Device-side pseudocode of kernel in cuTauLeaping, implementing the verification of the termination of all simulations. The verification is performed by means of CUDA's hardware accelerated synchronization and counting features, which allow to count the threads of a block which satisfy a specific predicate.

By exploiting CUDA's atomic functions, we accumulate the total number of threads which satisfy the predicate : if it is equal to the number of threads, the execution of all parallel simulations is completed. In order to further improve the performance of the simulation execution, it is better not to code the stoichiometric information by means of matrices. In cuTauLeaping, we flattened the stoichiometric matrices , and — which are typically sparse matrices — by packing their non-zero elements into arrays of CUDA vector types, named , whose components are accessed by means of , , , and ; the vectors corresponding to these matrices are named , respectively.

Since both and vectors can assume negative numbers, we use an offset to store their values as unsigned chars , and subtract the offset during the calculations on the GPU to yield back the correct negative numbers. An example of this implementation strategy is shown in Figure 7 which schematizes, for the Michaelis-Menten model, the conversion of the matrix into the corresponding flattened representation. By using this strategy, the complexity of the calculations needed for both SSA and tau-leaping decreases from to , where is the number of non-zero entries in.

For each non-zero entry, we store into the and components the corresponding row and column indices of , respectively; the component is used to store the stoichiometric value. Note that, even though the component is left unused, it is more efficient to employ the vector type rather than , because the former is 4-aligned and takes a single instruction to fetch the whole entry, while the latter is 1-aligned, and would require three memory operations to read each entry of the flattened vector.

It is worth noting that the use of an unsigned char data type implies that cuTauLeaping could deal with models with up to reactions and molecular species; for larger systems, data types with greater size must be exploited. Anyway, the maximum size of a model is also limited by the shared memory available on the GPU; we provide a detailed analysis of this issue in the Discussion section.

The stoichiometry of chemical reactions is generally represented by usually sparse matrices, corresponding to the variation of the species appearing either as reactants or products; however, both tau-leaping and SSA exploit only the non-zero values of these matrices. Each stoichiometric matrix can be pre-processed to identify its non-zero values and discard the remaining ones, thus reducing the number of reading operations required by the two stochastic algorithms. Our strategy to reduce the size of these matrices consists in flattening each matrix as a vector of triples , where is the row index, is the column index and is the non-zero value in.

In our implementation, both and indices are 0-based and triples are stored using vectors of CUDA's data types, that have the advantage of requiring a single instruction to fetch an entry. Note that only four cells of this matrix have non-zero values; the bottom part of the figure shows the corresponding vector.

These values are related to each species , and represent an estimate of the change of the propensity functions, based on all possible reactions in which the species is involved. For this reason, the flattened representation of the matrix cannot be exploited here; therefore, to obtain an efficient calculation of these values, we introduced the flattened transposed stoichiometric matrix. The first optimization consists in keeping the register pressure low, in order to avoid the register spilling into global memory and to increase the occupancy of the GPU.

This is achieved by partitioning the tau-leaping algorithm into multiple kernels, allowing a strong reduction of the consumption of hardware resources i. This CUDA optimization technique, known as branch splitting , was shown to achieve a relevant gain in performances [68]. Another typical optimization of CUDA is to ensure coalesced access to data, i. In our implementation we granted coalescence to all data structures that are private to each thread, that is, the system state , the stochastic constants , and so on. However, there is some shared information that is not inherently coalesced:.

The data structures used to store the stoichiometric information are not modified during the simulation and are common to all threads, and can be conveniently loaded into the constant memory. This peculiar CUDA memory is immutable and cached, so that the uncoalesced access pattern does not have any impact on the performances.

### Recommended for you

Note that, in contrast to other GPU implementations of tau-leaping [39] , cuTauLeaping exploits the vector to correctly evaluate the propensity functions of the reactions whose reactant species appear also as products in the same reaction: in cases like this, the net balance of the consumed and produced chemicals that is stored in vector would not carry sufficient information to distinguish between reactants and products.

For instance, given , it is impossible to establish whether this state change vector corresponds to a reaction of the form or ; on the contrary, the information stored in allows to discriminate between the two cases. Both and vectors, anyway, can be calculated offline by preprocessing the stoichiometric matrices and while they are loaded. In addition, both kernels and exploit the following three vectors:. Kernel exploits four additional vectors:. These vectors are coalesced, but frequently exploited by tau-leaping and SSA.

In order to minimize the latencies due to the frequent access to the global memory, for each thread we allocate , , , , and into the shared memory. Being an on-chip memory, latencies of the shared memory are about two orders of magnitude lower than that of the global memory; the use of shared memory allows a reduction of the global bandwidth usage [69] and provides a relevant performance boost. In contrast, we memorize into the global memory, since its values are used only twice during the simulation step to determine the value.

Since cuTauLeaping was specifically designed to be embedded into other applications — in particular, the computational tools for PE, PSA and RE that we previously developed [6] , [7] , [35] , which rely on the execution of a large number of simulations — we also copy the stochastic constants into the shared memory vector. To obtain a more efficient implementation, an additional strategy consisted in restructuring the tau-leaping algorithm in order to avoid the conditional branches as much as possible.

In cuTauLeaping, branches were removed by unrolling loops and by allowing redundant calculations in favor of a uniform control flow. The storage of the entire temporal evolution of all species, associated to each thread on the GPU, cannot be realized, since we cannot determine a priori how many steps each simulation will take. Indeed, whenever a kernel is launched, the required amount of memory must be statically pre-allocated from the host; it is therefore fundamental to set the number of time instants in which the dynamics is sampled, before each simulation starts.

Therefore, we make use of four additional global memory vectors:. Finally, in order to fully exploit the SM cache, also the values of , , , , , , and the size of the flattened vectors and , are stored as constants into the constant memory of the GPU. The definition of each model, as well as the values of the initial molecular amounts and of the stochastic constants used to run simulations, are given in Text S1.

COPASI has been recently integrated with a server-side tool, named Condor [33] , that handles COPASI jobs, automatically splits them in sub-jobs and distributes the calculations on a cluster of heterogeneous machines; in the present work we do not use this possibility as we are interested in COPASI as a single-node CPU-bound reference implementation, which is currently single-threaded and does not exploit the physical and logical cores of the CPU.

The value of the error control parameter of tau-leaping was set to , as suggested in [27]. Table 1 clearly show the advantage of cuTauLeaping as the number of simulations increases. Interestingly, because of the architectural differences and the different clock rates, a single run of cuTauLeaping may be slower than the CPU counterpart, and it becomes fully profitable only by running multiple simulations. Thus, when less than simulations of this specific pathway are needed, the use of a CPU implementation may be more convenient. Nonetheless, statistical analyses of stochastic temporal evolutions of biological systems require large batches of simulations usually to derive statistically significant measures of the analyzed system dynamics.

Note that, in general, the analytical determination of the break-even for an arbitrary model is a hard task, because it depends on its size the number of reactions and molecular species as well as on its parameterization that might lead to stiffness phenomena, able to affect the running time of the used simulation algorithm. Moreover, if a biological system is characterized by multistability or very large fluctuations in the dynamics of some molecular species e.

For instance, if two threads simulating the same system reach very different system states, they can take different branches within the code e. Finally, in order to investigate the influence of the size of the simulated system on cuTauLeaping performances, we executed several tests on randomly generated synthetic models RGSM.

In particular, we analyzed six distinct parameterizations of different RGSM, each one consisting of 35 reactions and 33 species. RGSM were generated according to the methodology proposed by Komarov et al. The results of these tests are given in Table 2 where, for each synthetic model, the average running times given in seconds were evaluated by executing parallel simulations with a. In tests 1 and 2 we randomly selected the values of stochastic constants of the RGSM with a uniform probability in ; the initial molecular amounts were set to test 1 and test 2 for all species appearing in the systems.

In addition, we observe that the initial molecular amounts considered in the parameterizations actually influence the results; in general, higher quantities lead to higher average running times. In tests 3 to 6 we exploited a modified strategy to select the values of stochastic constants, which were logarithmically sampled in the given range in order to uniformly span over different orders of magnitude. Moreover, we observe that the running time is mainly influenced by the initial molecular amounts rather than the values of the stochastic constants used in the different parameterizations.

Bistability is a capacity exhibited by many biological systems, consisting in the possibility of switching between two different stable steady states in response to some chemical signaling see, e. This system is characterized by the fact that, starting from the same initial conditions, its dynamics can reach either the low or the high steady state; switches between the two steady states can also occur due to stochastic fluctuations.

Generally speaking, this kind of investigation allows the implicit identification of attractors and multiple steady states of a system, and helps to empirically determine the probability of reaching a particular state during the dynamical evolution of the system itself.

## Stochastic Simulation: Algorithms and Analysis | Mathematical Association of America

In particular, to detect the initial jump either to the low or to the high steady states, that takes place in the first time instants of the simulations, we performed simulations with a. We used the results of the simulations to calculate the histograms of the molecular amount of , that were then exploited to realize a heatmap showing the frequency distribution of this species between the two stable steady states. Figure 9b shows the frequency distribution of reaching either the low or the high steady state around and molecules of species , respectively , highlighting a larger variance concerning the high steady state.

The frequency distribution was calculated according to simulations, where the dynamics was sampled at the single time instant a.

The total running time to execute this PSA-1D was Figure 10 shows that increasing values of induce a decrease increment, respectively in the frequency distribution of concerning the low high, respectively steady state, whereas for intermediate values of the system is characterized by an effective bistable behavior.

Each frequency distribution is calculated according to simulations executed by cuTauLeaping, measuring the amount of the molecular species at the time instant a. The figure shows that increasing values of induce a decrease increment in the frequency distribution of concerning the low high steady state, with intermediate values of characterized by an effective bistable behavior.

The values of the three stochastic constants were uniformly sampled in a three-dimensional lattice; for each sample, we executed simulations for a total of simulations and evaluated the frequency distribution of the amount of species at the time instant a. This set of values was then partitioned according to the reached low or high stable steady state; in Figure 11 , the red blue, respectively region corresponds to the parameterizations of the model which yield the high low, respectively steady state most frequently.

The green region represents a set of conditions whereby both steady states are equally reached. The values of the stochastic constants were uniformly sampled in a three-dimensional lattice; for each sample, we executed simulations with cuTauLeaping for a total of simulations and evaluated the frequency distribution of the amount of the molecular species at the time instant a. This set of values was then partitioned according to the reached low or high stable steady state; in the plot, the red blue region corresponds to the parameterizations of the model which yield the high low steady state most frequently.

Indeed, stochastic simulations of the functioning of this pathway showed that yeast cells might be able to respond appropriately to an alteration of some basic components — such as the intracellular amount of pivotal proteins, that can be related to the stress level [78] , [79] — fostering the maintenance of stable oscillations during the signal propagation. This behavior might suggest a stronger adaptation capability of yeast cells to various environmental stimuli or endogenous variations.

In [6] , [44] , in particular, we showed that the intracellular pool of guanine nucleotides GTP, GDP , as well as the molecular amounts of protein Cdc25 — that positively regulates the activation of Ras protein, and that is negatively regulated by PKA — are both able to govern the establishment of oscillatory regimes in the dynamics of the second messengers cAMP and of protein PKA.

In turn, this behavior can influence the dynamics of downstream targets of PKA, such as the periodic nucleocytoplasmic shuttling of the transcription factor Msn2 [78] , [80]. To this aim, we performed a PSA-2D to simulate the system dynamics in perturbed conditions, where we simultaneously varied the amount of GTP in the interval molecules corresponding to a reduced nutrient availability, up to a normal growth condition and the amount of Cdc25 in the interval molecules ranging from the deletion to a 2-fold overexpression of these regulatory proteins.

A total of different initial parameterizations were uniformly distributed over this bidimensional parameter space. In Figure 12a we plot the amplitude of cAMP oscillations in each of these initial conditions, where an amplitude value equal to zero corresponds to a non oscillating dynamics; the amplitude values of cAMP oscillations were calculated as described in [44].

This figure shows that oscillatory regimes are established for basically any value of GTP when the amount of Cdc25 is at normal condition or slightly lower, while if the amount of Cdc25 increases, no oscillations of cAMP occur when GTP is high, but oscillatory regimes are still present if GTP is low. The figure shows the amplitude of cAMP oscillations, evaluated as described in [44] ; an amplitude value equal to zero corresponds to a non oscillating dynamics.

The two batches of parallel and sequential simulations were executed with a comparable computational time. These computational results can suggest possible interesting behaviors of the biological system under investigation. To reduce the computational costs related to the analyses of mathematical models of real biological systems, two conceptually simple ways can be considered to parallelize stochastic simulations. The easiest solution consists in generating multiple threads on multi-core workstations, but it immediately turns out to be undersized, since the number of cores on high-end machines can be far lower than the number of simulations required for computational analysis as PE, PSA, SA and RE.

The other way consists in distributing the stochastic simulations on a cluster of machines, which may as well result inadequate for several problems. First of all, it is economically expensive and very power-demanding; secondly, it takes a dedicated software infrastructure to handle workload balancing, network communication and the possible errors due to nodes downtime or server-node communication issues; thirdly, if the nodes of the cluster are heterogeneous, the slowest machines may represent a bottleneck for the whole task.

In addition, a cluster implementation may not always scale well because of two problems: on the one hand, the speedup is approximately proportional to the number of independent simulations that run on a dedicated node i. The first half of the book focuses on general methods; the second half discusses model-specific algorithms.

Exercises and illustrations are included. It covers both the technical aspects of the subject, such as the generation of random numbers, non-uniform random variates and stochastic processes, and the use of simulation. Stochastic Simulation: Algorithms and Analysis. Raisins and Almonds. Noble Endeavours : The life of two countries, England and Germany, in many stories. The Little Book of Jam Tips. Brushstroke and Free-style Alphabets : Complete Fonts.