# M3C 2017 Final Project¶

**Due date/time:** 15/12/2017, 11:59pm GMT

**Getting Started:** If needed, update your local copy of the course repo (*sync* and *pull*). There will be a *project* directory with files needed for this assignment.

## Using software version control (2 pts)¶

You should develop the code in a **private** *git* repository named, *m3cproject*. You will submit this
project by providing me with read-access to your bitbucket repo. You will also have to provide a commit id setting
the version of the project that should be marked. You should periodically add/commit updates to your project. The repository should contain three sub-directories, part1, part2, and part3,
containing relevant files as
described below. The final version of the repository should also contain a readme file with a list of new subroutines/functions/modules added
to the template codes accompanied with short 1-sentence descriptions of the new routines.

## Part 1: Global bracket-descent (18 pts)¶

In Part 1, you will develop a global optimization tool based on the bracket-descent method
from homework 2. The basic approach will be to carry out \(n^2\) bracket-descents
simultaneously (where *n* is a positive integer). After all triangles have completed one
b-d step, the positions of the vertices are shifted towards the centroid with lowest
cost: \(\mathbf{x^{(a)}_i} = \mathbf{x^{(a)}_i} - \Delta \mathbf{x^{(a)}}\). Here, \(a=1,2,...,n^2\)
, *i* =1,2, or 3, and \(\mathbf{x^{(a)}_i}\) is the position of the \(i^{th}\) vertex of the \(a^{th}\) triangle.
The shift for the \(a^{th}\) triangle, \(\mathbf{\Delta x^{(a)}}\), is defined as,
\(\mathbf{\Delta x^{(a)}} = \alpha \frac{(J^a-J^*)}{max(J^b-J^* + 1e-12)}(\mathbf{\tilde{x}^{(a)}} - \mathbf{\tilde{x}^*})\), where
\(\mathbf{\tilde{x}^{(a)}}\) is the centroid of triangle *a*, \(\mathbf{\tilde{x}^*}\) is the
centroid with lowest cost, and \(J^a\) and \(J^*\) are costs evaluated at \(\mathbf{\tilde{x}^{(a)}}\)
and \(\mathbf{\tilde{x}^*}\), respectively. The *max* operator is the maximum over the \(n^2\) centroids.
Additionally, the parameter \(\alpha\) controls how ‘aggressive’
the shift is and is constrained to satisfy, \(0 \le \alpha \le 1\). So, one iteration of global b-d consists of
a b-d step and a shift for all triangles. At the end of an iteration, each triangle’s centroid has
its cost tested using the convergence test from homework 2. If the criteria is satisfied,
that individual triangle remains frozen for any subsequent iterations. The global method terminates
when either all triangles have frozen or when *itermax* iterations
have been completed, whichever occurs first.

1. Implement this method in the subroutine, *bdglobal* in *p11_dev.f90*. The initial \(n^2\) triangles are
arranged on an *n x n* grid centered at *xguess* (so there are in fact \(n^2\) initial guesses). The spacing between the centroids of
adjacent triangles in this grid is *d* in both the \(x_1\) and \(x_2\) directions. The code setting up this configuration
is included in the subroutine. Store the values of \(\mathbf{x^*}\) and \(\mathbf{J^*}\)
at the end of each iteration in the module variables *xpath* and *jpath*. The subroutine
should also store the final values of *xpath* and *jpath* in the output variables, *xf* and *jf*.

2. Analyze the effectiveness of *bdglobal* for the cost function provided in *cost.f90* (in *costj*).
In *p1_dev.py*, add code to *bdg_analyze* to generate 1 or more figures illustrating the most important qualitative
trends;
discuss these trends in the docstring of the function. You should constrain
*d* such that initial centroids satisfy \(0 \le x_1 \le 1000\) and \(0 \le x_2 \le 1000\).

3. In *p12_dev.f90*, develop *bdglobal_mpi* so that it implements a version of this method
that is parallelized with mpi. The initial distribution of the \(n^2\) nodes to *numprocs*
processes has been set up for you. Each process should iterate triangles *istart* to *iend* where
*istart* and *iend* are set for each process by the MPE_DECOMP1D routine which has been provided.
You may assume that the total number of triangles is much larger than the number of processes. You
do not need to set *xpath* and *jpath*. After iterations have terminated, the final output variables
*xf* and *jf* should be collected on process 0, and the main program, *p12*, will write these variables to a
file named, *p12.dat*.
Concisely describe your approach to parallelization in a comment below the header of the subroutine.

## Part 2: Pedestrian motion (40 pts)¶

Consider the following model for the motion of *n* 1st-year Imperial students:

where \(\mathbf{x}_i, \mathbf{y}_i, \mathbf{u}_i, \mathbf{v}_i\) are n-element
column vectors corresponding to the *(x,y)* positions and *(u,v)* velocities of
each student at time, *i*.

The velocity vector for an individual student, \((u,v)\) has amplitude, \(s_0\) (a constant), and direction defined by the polar angle:

where \(\overline{u}_i\) is the average horizontal component of the velocity at time *i* of all students
within a distance, *d*, of the student (including the student herself); \(\overline{v}_i\)
is the analogous average vertical velocity component. The parameter, \(R_i\) is a number sampled from
a uniform distribution in the range, \([-\alpha \pi, \alpha \pi]\) and \(\alpha\)
is a parameter between 0 and 1 (inclusive) which controls the strength of the noise.

Simulations are carried out in a square, \(-L \le x \le L\), \(-L \le y \le L\), and periodic boundary conditions are enforced if a student crosses a boundary. For example, if a student’s position at the end of a step is at \(x = L+h\), with h positive, his position is shifted to \(x=-L+h\). Analogous shifts take place at the other three boundaries: \(x = -L-h \rightarrow L-h\), \(y = L+h \rightarrow -L+h\), \(y =-L-h \rightarrow L-h\).

We will assume that the total number of students is \(n = m^2\) for a positive integer, *m*. The students
are initially placed on an *m x m* square grid of size L/4 x L/4 centered at the origin (so there is one student at
each of \(m^2\) gridpoints). The initial direction of motion should be a random number between \(-\pi\) and
\(\pi\) (sampled from a uniform distribution).

A useful measure of synchronization at step, *i*, is the parameter, \(\Psi_i = \frac{|\sum_{j=1}^n (u^{(j)}_i, v^{(j)}_i)|}{n s_0}\). Here, \(|(u,v)| = \sqrt{(u^2+v^2)}\)

1) Complete the routine *simulate* in *p2.f90* and simulate this model for *nt* steps and \(m^2\) walkers
moving with speed \(s_0\) with noise level, \(\alpha\). The routine should store the paths of
all walkers in the output variables, *x* and *y* (including the initial condition). Additionally, the synchronization parameter, \(\Psi\), should
be computed and stored at each step (including the initial conditions) in the output variable, *psi*.

2) i) First consider the case with no noise, \(\alpha=0\) , \(s_0=0.1\), and *L=25*. Complete the function *analyze1* in
*p2.py* so that it analyzes how \(\Psi\) depends on time.
The function should generate one or more figures illustrating the most important trends. Add a
discussion of these trends to the docstring. Your discussion and analysis should focus on
general qualitative trends, you are not required to consider explicit functional time dependence(s).
It may be convenient to start with the parameters, *nt=200*, *m=10*, *d=1*.

ii) Analyze the effect of noise on \(\Psi\). You should consider the case, *m=10*, but feel free
to include other population sizes. For some cases, you should find that there is an initial transient where \(\Psi\) varies
rapidly with time which is then followed by relatively *stationary* behavior (\(\Psi\) will still vary with
time). Focus your analysis on \(\Psi\) after this initial transient has passed.
Add code to *analyze2* in *p3.py* generating one or more figures illustrating the most important trends. Add a
discussion of these trends to the docstring.

You are encouraged but not required to generate one or two animations (which can take the place of or complement figures) illustrating the model dynamics.

If you are running with very large problem sizes, you may restructure the code to store x and y once very *isample* steps
rather than every time step.

3) i) Complete the subroutine *simulate_omp* in *p2.f90*. This routine should have the same
functionality as *simulate*, however it should be carefully parallelized with OpenMP. Parallel
regions should run with *numthreads* threads (*numthreads* is an input variable).

ii) Complete the function, *performance1* in *p2.py* so that it analyzes the performance
of *simulate_omp* as the number of walkers is varied. Concisely describe your approach to
parallelization in a comment below the header of the subroutine.

4) Estimate the number of floating point operations required for distance calculations during one step of your simulation. In the function docstring, discuss both how you have constructed this estimate and its effect on performance. Read about cell lists (e.g. https://en.wikipedia.org/wiki/Cell_lists) and discuss the cost/benefit of using such an approach. You do not need to implement cell lists in your code.

## Part 3: Wave propagation on a fractal network (40 pts)¶

Consider the fractal networks generated by *rwnet* in *rwnetmod.f90*. This subroutine
is the same as the one used in homework 3. Here, we will interpret these networks as
graphs with nodes and links, and investigate wave propagation on these graphs.
Consider the following model for waves propagating on a graph with *n* nodes,

Here, \(u_i(t)\) is the wave amplitude at time, *t*, of the wave at node, *i*, and
\(A_{ij}\) is the adjacency matrix of the graph. We will set the initial condition
to be \(v_i=0\) for all *i*, and \(u_i=0\) at all *i*, except at *i=n* (the last node
added to the network) where \(u_n=1\). It will be useful to analyze the local and global *wave energies*, \(e_i = u_i^2\),
\(E = \sum_{i=1}^n e_i\).

In *rwnetmod*, you have been provided with *rwnet* for generating a network with *X-Y* node coordinates.
The module also contains routines for 1) counting the total number of nodes in the network ignoring
nodes with zero links 2) counting the total number of links 3) generating *X-Y* positions
with zero-link nodes removed, 4) generating the adjacency matrix for input *X-Y* coordinates
and 5) generating the edgelist for input *X-Y* coordinates.
The subroutine *wave_init* in *p31.f90* illustrates how these routines should be used sequentially.

1. Complete the subroutine *wave* in *p31.f90* so that it uses *rk4* time marching (routine provided)
to simulate wave propagation for *nt* time steps of size, *dt*, for a network with height, *h*, and *ntotal* node coordinates.
**Note:** the graph on which the simulation will run will typically have *n < ntotal*.
The subroutine *rk4* calls the Fortran function *RHS* four times. You will also have to complete
this function so that is returns the right-hand-sides to the *2n* ODEs described above.
The routine *wave* should store \(u_i(t)\) in the module variable, *u*, and \(E(t)\) in the output variable *energy*.

2. Complete *wave_analyze* in *p3.py* so that it calls *wave*, and analyzes wave propagation
from node *n* to the highest-degree node in the graph. In cases where multiple nodes
share the highest degree, select the node that is closest to node *n* (in terms of path lengths on the graph).
You should consider both the expected speed of the wave and shortest paths in the graphs. You are only required
to consider cases with *h=100*, *ntotal=200*, *dim=2* though you should feel free to consider other cases to justify your
conclusions.
**Note: In this and all other rwnet calculations, you should choose ntotal so that
nodes are not expected to be added to the network at height, h; networks where such nodes are added should be discarded.**
Generate one or more figures illustrating the most important trends, add a
discussion of these trends to the docstring.

3. Now, you will develop a wave simulation code parallelized with mpi. Time marching will use the explicit Euler method
rather than rk4.
Browse through the template code, *p32.f90*. Now, parameters
are read in through the input file, *data.in* and set in
the main program. The subroutine *euler_mpi* will advance the model equations forward in time using the explicit
Euler method. First, the *n* nodes are distributed to *numprocs* processes using *MPE_DECOMP1D*.
Then, each process should solve the model above for nodes *istart* to *iend* using the explicit Euler method.
Due to the coupling across links in the graph, each process must typically exchange information with one or more neighbors during each time step.

i) Complete *euler_mpi* so the **2n** ODEs are distributed across the *numprocs* processes and solved in parallel.
The sum in the model equations will have to be very carefully constructed and MPI directives will
be needed when components of the sum are on other processes.
*RHS_mpi* will have to be carefully completed as well. You may assume that \(n \gg numprocs > 1\).

ii) At the end of each iteration, compute the total energy, *E*, and store it in the array *energy* on process *0*.
*Note*: During the k-loop, the full solutions, *y* should not be collected on any one process. Here, *y* is a size *2n*
array containing both *u* and *v*.

4. After the k-loop in *euler_mpi*, each processor’s solution, *ylocal*, should be gathered into the variable
*y* on *myid=0* using *MPI_gather* and *MPI_gatherv*.
The main program will then write this variable out to the file, *fmpi.dat*. This file can be loaded in python with *np.loadtxt*.

**Note 1:** It is only after the k-loop has been completed that the full solution *y* should be assembled. During
the loop, each process should only work with the portion of *y* needed to update *ylocal*.

## Further comments¶

1. Your fortran code should work to double precision accuracy.

2. The title of every figure that your python code produces should have your name and the name of the function in which it was generated.

3. Figures generated by your python functions should be saved and included in your repository following the naming convention pXY.png for the Yth figure created for question X.

3. You will not be assessed on how fast your code is, but points may be deducted if it is ludicrously inefficient. Points will be deducted if parallelization is not carried out correctly.

4. Add module variables and sub-programs as needed, but clearly add comments on substantive additions. Do not remove/modify existing code or module variables unless you are explicitly instructed to do so (or consult with the instructor).

5. Parallelization should be implemented for a general (but reasonable) number of threads/processes.

6. Lab 9 will utilize the *rk4* and explicit
Euler routines used in question 3.

7. Assessment of code will primarily be based on functionality: does it compile and run, and does it produce the right answer(s)? A moderate portion of the assessment will look at the implementation and efficiency of code. Highly wasteful implementations will lose points. Examples of what to avoid: repeatedly adjusting sizes of allocatable arrays in loops, generally carrying out unnecessary calculations, and creating variables that require excessive amounts of memory. The focus here will be on code used for/during simulations/iterations rather than code used for initialization and post-processing/analysis. You have the option of adding comments explaining/defending your approach.

8. Below the *name==main* section of each python file, add code that calls your functions and generates the figures you are
submitting.

9. If you are using MLC machines and *mpif90* and/or *mpiexec* are not working, run the following in the unix terminal
(the first for *mpif90*, the second for *mpiexec*):

```
$ export PATH=$PATH:/usr/lib64/openmpi/bin/
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64/openmpi/lib/
```

10. You may use the following Python modules as needed: *numpy*, *scipy*, *matplotlib*, *networkx*, *time*, *timeit*.
Please ask if you would like to use any others.

11. Be sensible when choosing problem sizes. However, at this point in the term, it is fine for calculations to take several minutes. You do not need to run for very large problem sizes, though you should consider large problems when developing and parallelizing your codes.

## Submitting the assignment¶

To submit the assignment for assessment, go to the course blackboard page, and click on the “Assignments” link on the left-hand side of the page, and then click on Project. Click on “Write Submission” and add the statement: “This is all my own unaided work unless stated otherwise.” Provide the url for your bibucket project repo as well as the commit id for the version which you would like to have assessed. Remember to provide read access to user, ImperialHPSC (go to Settings –> User and group access). Finally, click “Submit”.