M3C 2017 Final Project

Due date/time: 15/12/2017, 11:59pm GMT

Getting Started: If needed, update your local copy of the course repo (sync and pull). There will be a project directory with files needed for this assignment.

Using software version control (2 pts)

You should develop the code in a private git repository named, m3cproject. You will submit this project by providing me with read-access to your bitbucket repo. You will also have to provide a commit id setting the version of the project that should be marked. You should periodically add/commit updates to your project. The repository should contain three sub-directories, part1, part2, and part3, containing relevant files as described below. The final version of the repository should also contain a readme file with a list of new subroutines/functions/modules added to the template codes accompanied with short 1-sentence descriptions of the new routines.

Part 1: Global bracket-descent (18 pts)

In Part 1, you will develop a global optimization tool based on the bracket-descent method from homework 2. The basic approach will be to carry out \(n^2\) bracket-descents simultaneously (where n is a positive integer). After all triangles have completed one b-d step, the positions of the vertices are shifted towards the centroid with lowest cost: \(\mathbf{x^{(a)}_i} = \mathbf{x^{(a)}_i} - \Delta \mathbf{x^{(a)}}\). Here, \(a=1,2,...,n^2\) , i =1,2, or 3, and \(\mathbf{x^{(a)}_i}\) is the position of the \(i^{th}\) vertex of the \(a^{th}\) triangle. The shift for the \(a^{th}\) triangle, \(\mathbf{\Delta x^{(a)}}\), is defined as, \(\mathbf{\Delta x^{(a)}} = \alpha \frac{(J^a-J^*)}{max(J^b-J^* + 1e-12)}(\mathbf{\tilde{x}^{(a)}} - \mathbf{\tilde{x}^*})\), where \(\mathbf{\tilde{x}^{(a)}}\) is the centroid of triangle a, \(\mathbf{\tilde{x}^*}\) is the centroid with lowest cost, and \(J^a\) and \(J^*\) are costs evaluated at \(\mathbf{\tilde{x}^{(a)}}\) and \(\mathbf{\tilde{x}^*}\), respectively. The max operator is the maximum over the \(n^2\) centroids. Additionally, the parameter \(\alpha\) controls how ‘aggressive’ the shift is and is constrained to satisfy, \(0 \le \alpha \le 1\). So, one iteration of global b-d consists of a b-d step and a shift for all triangles. At the end of an iteration, each triangle’s centroid has its cost tested using the convergence test from homework 2. If the criteria is satisfied, that individual triangle remains frozen for any subsequent iterations. The global method terminates when either all triangles have frozen or when itermax iterations have been completed, whichever occurs first.

1. Implement this method in the subroutine, bdglobal in p11_dev.f90. The initial \(n^2\) triangles are arranged on an n x n grid centered at xguess (so there are in fact \(n^2\) initial guesses). The spacing between the centroids of adjacent triangles in this grid is d in both the \(x_1\) and \(x_2\) directions. The code setting up this configuration is included in the subroutine. Store the values of \(\mathbf{x^*}\) and \(\mathbf{J^*}\) at the end of each iteration in the module variables xpath and jpath. The subroutine should also store the final values of xpath and jpath in the output variables, xf and jf.

2. Analyze the effectiveness of bdglobal for the cost function provided in cost.f90 (in costj). In p1_dev.py, add code to bdg_analyze to generate 1 or more figures illustrating the most important qualitative trends; discuss these trends in the docstring of the function. You should constrain d such that initial centroids satisfy \(0 \le x_1 \le 1000\) and \(0 \le x_2 \le 1000\).

3. In p12_dev.f90, develop bdglobal_mpi so that it implements a version of this method that is parallelized with mpi. The initial distribution of the \(n^2\) nodes to numprocs processes has been set up for you. Each process should iterate triangles istart to iend where istart and iend are set for each process by the MPE_DECOMP1D routine which has been provided. You may assume that the total number of triangles is much larger than the number of processes. You do not need to set xpath and jpath. After iterations have terminated, the final output variables xf and jf should be collected on process 0, and the main program, p12, will write these variables to a file named, p12.dat. Concisely describe your approach to parallelization in a comment below the header of the subroutine.

Part 2: Pedestrian motion (40 pts)

Consider the following model for the motion of n 1st-year Imperial students:

\[ \begin{align}\begin{aligned}\mathbf{x}_{i+1} = \mathbf{x}_i + \mathbf{u}_{i+1}\\\mathbf{y}_{i+1} = \mathbf{y}_i + \mathbf{v}_{i+1}\end{aligned}\end{align} \]

where \(\mathbf{x}_i, \mathbf{y}_i, \mathbf{u}_i, \mathbf{v}_i\) are n-element column vectors corresponding to the (x,y) positions and (u,v) velocities of each student at time, i.

The velocity vector for an individual student, \((u,v)\) has amplitude, \(s_0\) (a constant), and direction defined by the polar angle:

\[\theta_{i+1} = tan^{-1}(\overline{v}_{i}/\overline{u}_{i}) + R_i\]

where \(\overline{u}_i\) is the average horizontal component of the velocity at time i of all students within a distance, d, of the student (including the student herself); \(\overline{v}_i\) is the analogous average vertical velocity component. The parameter, \(R_i\) is a number sampled from a uniform distribution in the range, \([-\alpha \pi, \alpha \pi]\) and \(\alpha\) is a parameter between 0 and 1 (inclusive) which controls the strength of the noise.

Simulations are carried out in a square, \(-L \le x \le L\), \(-L \le y \le L\), and periodic boundary conditions are enforced if a student crosses a boundary. For example, if a student’s position at the end of a step is at \(x = L+h\), with h positive, his position is shifted to \(x=-L+h\). Analogous shifts take place at the other three boundaries: \(x = -L-h \rightarrow L-h\), \(y = L+h \rightarrow -L+h\), \(y =-L-h \rightarrow L-h\).

We will assume that the total number of students is \(n = m^2\) for a positive integer, m. The students are initially placed on an m x m square grid of size L/4 x L/4 centered at the origin (so there is one student at each of \(m^2\) gridpoints). The initial direction of motion should be a random number between \(-\pi\) and \(\pi\) (sampled from a uniform distribution).

A useful measure of synchronization at step, i, is the parameter, \(\Psi_i = \frac{|\sum_{j=1}^n (u^{(j)}_i, v^{(j)}_i)|}{n s_0}\). Here, \(|(u,v)| = \sqrt{(u^2+v^2)}\)

1) Complete the routine simulate in p2.f90 and simulate this model for nt steps and \(m^2\) walkers moving with speed \(s_0\) with noise level, \(\alpha\). The routine should store the paths of all walkers in the output variables, x and y (including the initial condition). Additionally, the synchronization parameter, \(\Psi\), should be computed and stored at each step (including the initial conditions) in the output variable, psi.

2) i) First consider the case with no noise, \(\alpha=0\) , \(s_0=0.1\), and L=25. Complete the function analyze1 in p2.py so that it analyzes how \(\Psi\) depends on time. The function should generate one or more figures illustrating the most important trends. Add a discussion of these trends to the docstring. Your discussion and analysis should focus on general qualitative trends, you are not required to consider explicit functional time dependence(s). It may be convenient to start with the parameters, nt=200, m=10, d=1.

ii) Analyze the effect of noise on \(\Psi\). You should consider the case, m=10, but feel free to include other population sizes. For some cases, you should find that there is an initial transient where \(\Psi\) varies rapidly with time which is then followed by relatively stationary behavior (\(\Psi\) will still vary with time). Focus your analysis on \(\Psi\) after this initial transient has passed. Add code to analyze2 in p3.py generating one or more figures illustrating the most important trends. Add a discussion of these trends to the docstring.

You are encouraged but not required to generate one or two animations (which can take the place of or complement figures) illustrating the model dynamics.

If you are running with very large problem sizes, you may restructure the code to store x and y once very isample steps rather than every time step.

3) i) Complete the subroutine simulate_omp in p2.f90. This routine should have the same functionality as simulate, however it should be carefully parallelized with OpenMP. Parallel regions should run with numthreads threads (numthreads is an input variable).

ii) Complete the function, performance1 in p2.py so that it analyzes the performance of simulate_omp as the number of walkers is varied. Concisely describe your approach to parallelization in a comment below the header of the subroutine.

4) Estimate the number of floating point operations required for distance calculations during one step of your simulation. In the function docstring, discuss both how you have constructed this estimate and its effect on performance. Read about cell lists (e.g. https://en.wikipedia.org/wiki/Cell_lists) and discuss the cost/benefit of using such an approach. You do not need to implement cell lists in your code.

Part 3: Wave propagation on a fractal network (40 pts)

Consider the fractal networks generated by rwnet in rwnetmod.f90. This subroutine is the same as the one used in homework 3. Here, we will interpret these networks as graphs with nodes and links, and investigate wave propagation on these graphs. Consider the following model for waves propagating on a graph with n nodes,

\[ \begin{align}\begin{aligned}\frac{du_i}{dt} = v_i,\\\frac{dv_i}{dt} = c^2 \sum_{j=1}^n \left(A_{ij}(u_j - u_i) \right),\\~ i=1,2,...,n\end{aligned}\end{align} \]

Here, \(u_i(t)\) is the wave amplitude at time, t, of the wave at node, i, and \(A_{ij}\) is the adjacency matrix of the graph. We will set the initial condition to be \(v_i=0\) for all i, and \(u_i=0\) at all i, except at i=n (the last node added to the network) where \(u_n=1\). It will be useful to analyze the local and global wave energies, \(e_i = u_i^2\), \(E = \sum_{i=1}^n e_i\).

In rwnetmod, you have been provided with rwnet for generating a network with X-Y node coordinates. The module also contains routines for 1) counting the total number of nodes in the network ignoring nodes with zero links 2) counting the total number of links 3) generating X-Y positions with zero-link nodes removed, 4) generating the adjacency matrix for input X-Y coordinates and 5) generating the edgelist for input X-Y coordinates. The subroutine wave_init in p31.f90 illustrates how these routines should be used sequentially.

1. Complete the subroutine wave in p31.f90 so that it uses rk4 time marching (routine provided) to simulate wave propagation for nt time steps of size, dt, for a network with height, h, and ntotal node coordinates. Note: the graph on which the simulation will run will typically have n < ntotal. The subroutine rk4 calls the Fortran function RHS four times. You will also have to complete this function so that is returns the right-hand-sides to the 2n ODEs described above. The routine wave should store \(u_i(t)\) in the module variable, u, and \(E(t)\) in the output variable energy.

2. Complete wave_analyze in p3.py so that it calls wave, and analyzes wave propagation from node n to the highest-degree node in the graph. In cases where multiple nodes share the highest degree, select the node that is closest to node n (in terms of path lengths on the graph). You should consider both the expected speed of the wave and shortest paths in the graphs. You are only required to consider cases with h=100, ntotal=200, dim=2 though you should feel free to consider other cases to justify your conclusions. Note: In this and all other rwnet calculations, you should choose ntotal so that nodes are not expected to be added to the network at height, h; networks where such nodes are added should be discarded. Generate one or more figures illustrating the most important trends, add a discussion of these trends to the docstring.

3. Now, you will develop a wave simulation code parallelized with mpi. Time marching will use the explicit Euler method rather than rk4. Browse through the template code, p32.f90. Now, parameters are read in through the input file, data.in and set in the main program. The subroutine euler_mpi will advance the model equations forward in time using the explicit Euler method. First, the n nodes are distributed to numprocs processes using MPE_DECOMP1D. Then, each process should solve the model above for nodes istart to iend using the explicit Euler method. Due to the coupling across links in the graph, each process must typically exchange information with one or more neighbors during each time step.

i) Complete euler_mpi so the 2n ODEs are distributed across the numprocs processes and solved in parallel. The sum in the model equations will have to be very carefully constructed and MPI directives will be needed when components of the sum are on other processes. RHS_mpi will have to be carefully completed as well. You may assume that \(n \gg numprocs > 1\).

ii) At the end of each iteration, compute the total energy, E, and store it in the array energy on process 0. Note: During the k-loop, the full solutions, y should not be collected on any one process. Here, y is a size 2n array containing both u and v.

4. After the k-loop in euler_mpi, each processor’s solution, ylocal, should be gathered into the variable y on myid=0 using MPI_gather and MPI_gatherv. The main program will then write this variable out to the file, fmpi.dat. This file can be loaded in python with np.loadtxt.

Note 1: It is only after the k-loop has been completed that the full solution y should be assembled. During the loop, each process should only work with the portion of y needed to update ylocal.

Further comments

1. Your fortran code should work to double precision accuracy.

2. The title of every figure that your python code produces should have your name and the name of the function in which it was generated.

3. Figures generated by your python functions should be saved and included in your repository following the naming convention pXY.png for the Yth figure created for question X.

3. You will not be assessed on how fast your code is, but points may be deducted if it is ludicrously inefficient. Points will be deducted if parallelization is not carried out correctly.

4. Add module variables and sub-programs as needed, but clearly add comments on substantive additions. Do not remove/modify existing code or module variables unless you are explicitly instructed to do so (or consult with the instructor).

5. Parallelization should be implemented for a general (but reasonable) number of threads/processes.

6. Lab 9 will utilize the rk4 and explicit Euler routines used in question 3.

7. Assessment of code will primarily be based on functionality: does it compile and run, and does it produce the right answer(s)? A moderate portion of the assessment will look at the implementation and efficiency of code. Highly wasteful implementations will lose points. Examples of what to avoid: repeatedly adjusting sizes of allocatable arrays in loops, generally carrying out unnecessary calculations, and creating variables that require excessive amounts of memory. The focus here will be on code used for/during simulations/iterations rather than code used for initialization and post-processing/analysis. You have the option of adding comments explaining/defending your approach.

8. Below the name==main section of each python file, add code that calls your functions and generates the figures you are submitting.

9. If you are using MLC machines and mpif90 and/or mpiexec are not working, run the following in the unix terminal (the first for mpif90, the second for mpiexec):

$ export PATH=$PATH:/usr/lib64/openmpi/bin/

$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64/openmpi/lib/

10. You may use the following Python modules as needed: numpy, scipy, matplotlib, networkx, time, timeit. Please ask if you would like to use any others.

11. Be sensible when choosing problem sizes. However, at this point in the term, it is fine for calculations to take several minutes. You do not need to run for very large problem sizes, though you should consider large problems when developing and parallelizing your codes.

Submitting the assignment

To submit the assignment for assessment, go to the course blackboard page, and click on the “Assignments” link on the left-hand side of the page, and then click on Project. Click on “Write Submission” and add the statement: “This is all my own unaided work unless stated otherwise.” Provide the url for your bibucket project repo as well as the commit id for the version which you would like to have assessed. Remember to provide read access to user, ImperialHPSC (go to Settings –> User and group access). Finally, click “Submit”.