4/5/2022

## Parallel Computing

• Computing where processes are carried out simultaneously

• Break a problem into parts that can be solved separately and recombined

• When figuring out how to parallelize your code, important to think about

• What portions of your code are taking the most time
• Dependencies (which parts of the code depend on one another)

## Optimizing different parts of code

A 2x speedup in a slower portion of the code may be more helpful than a 5x speedup in a faster portion

Image from Wikipedia

## Dependencies are important to figure out what can be parallelized

This function can’t easily be parallelized:

def dependentfun(a,b):
c = a*b
d = 2*c
return (c,d)

This one is easier to parallelize because c and d can be calculated independently:

def nondependentfun(a,b):
c = 3*a
d = 2*b
return(c,d)

## Fine-grained, coarse-grained, and embarrassing parallelism

• Fine-grained parallelism - subtasks must communicate/synchronize often (e.g. many times/second)

• Coarse-grained parallelism - subtasks communicate infrequently

• Embarrassingly parallel - subtasks can be completed largely or entirely independently

• Common for complex systems models! Running a model for multiple different parameter values/initial conditions/etc. is often embarrassingly parallel

## Common parallel computers in scientific computing

• Multi-core computers (e.g. your laptop probably) - have multiple CPUs (central processing units) on the same chip

• Cluster computing (e.g. Great Lakes) - multiple computers networked together so they can share information rapidly (e.g. for parallel tasks that require synchronization or sharing of information)

• Grid computing - computers communicate over the internet to solve parallel problems (e.g. Seti@Home) - usually only for embarrassingly parallel problems

• GPU (Graphics Processing Unit) computing - particularly useful if you have extremely parallel computations that don’t need a lot of memory per task, large matrix computations (particularly if the spatial location of the memory storage can be taken advantage of), etc.

• CPU - Central Processing Unit, executes computer program instructions

• Multi-core processors - contain multiple CPU’s within the same chip

• Hyperthreading - allows each core to run two virtual ‘cores’ - the operating system sees two cores even though there is actually only one physical core.

• Typically slower than two actual cores but faster than a single core.

## Parallelization in Python: the Multiprocessing Module

• Global interpreter lock (GIL) - allows only one thread to control the python interpreter at a time. This is helpful for memory management & avoiding memory leaks, but makes it difficult to do parallelization.

• The multiprocessing module gets around this by using multiple processes instead of multiple threads—essentially, rather than trying to let multiple parallel tasks run within a single python interpreter, multiprocessing has each task get its own python interpreter so they won’t interfere.

• However, note that because each process gets its own interpreter, the overhead for parallelization can be large

## Example of two processes with multiprocessing

def myfunction(a,b):
print(a,"and also",b)

# Set up the two processes
p1 = mp.Process(target=myfunction, args=("cats", "dogs"))
p2 = mp.Process(target=myfunction, args=("stuff","more stuff"))

# start them
p1.start()
p2.start()

# the join command makes the code wait until the process is done
p1.join()
p2.join()

## Setting up a pool of cores

• However, more commonly, we’ll have a list or array of tasks to complete (e.g. run the model for a large list of parameter sets)

• Rather than setting up a process for each, we can set up a pool of cores/processors and give the pool tasks to run

• The apply and map families of functions allow you to give tasks to the pool conveniently for tasks that you would usually use a loop for in a non-parallel setting (e.g. an list/array of tasks to complete)

## Synchronous & asynchronous parallelization

Many of the apply and map families of functions in multiprocessing can be run synchronously or asynchronously:

• Synchronous - parallel tasks are completed in their original order, and the program is locked/on hold until each piece is complete
• e.g. if we have a list of 5 parameter sets to run a model for, and we want the results in order, then the next parameter set results can’t be written until the one before it is finished.
• Asynchronous - parallel tasks are completed in whatever order is convenient. Processes don’t wait for each other to continue on to the next task.
• Often faster, however may return results out of order (e.g. might return a scrambled order of parameter runs rather than the order provided—this can often be resolved by recording the index, see examples below)

## Example

import multiprocessing as mp

# Check how many cores you have
print("Number of processors: ", mp.cpu_count())

poolsize = mp.cpu_count()-1 # I usually leave one core free if I'm on my laptop

# Initialize our pool of cores
pool = mp.Pool(poolsize)

# Use pool.apply to run the model on all of our N values
results = [pool.apply(myfunction, args=(word,"Marisa")) for word in WordList]

# Close the pool once we're done
pool.close()