Lab Objective: The objective of this week’s lab is to consider a matrix compression pro- gram. Matrices are used in a huge variety of computer applications and a key issue is the space they consume and the running time of algorithms on them. We will see how we can reduce both. Here’s the lab summary:

0 A quick introduction to the relationship between matrices and graphs

O ine a well-established format for representing matrices sparsely

Œ Consider our own format that is inspired by the above but, at the expense of slightly higher file sizes, is simpler to write as C++ code

º Write a program, matz, that reads a matrix in the format we have been using and writes it out in a format that ignores all 0-entries

O Submit week06’s lab for marking

In Detail

Please be patient and read the entire lab sheet through to the end before beginning with the work. There are a lot of new ideas in this for you to absorb so please don’t rush in before understand the material and what is being asked of you.

0 One of the most common methods of representing and storing information nowadays is the graph. A graph, in the computer science sense, is a collection of nodes that contain the information and edges that represent links or relationships between pairs of nodes. One common way to represent a graph internally is by means of an n n adjacency matrix of type bool (boolean). A true in entry (i, j) means that node i is connected (adjacent) to node j; the two nodes are “related”.

We could use this way to represent street maps, or street networks, as they are often called: we represent the junction of two streets with a node and if another street junction is directly reachable then we mark the corresponding entry of the matrix with true and false otherwise. So each junction has a unique node number associated with it and the non-zero entries of the matrix for that row are the junctions (nodes) that are immediately reachable from it. We can go one step better and instead of storing true or false values in the matrix

we can store the distance between them. Note that an edge of this network corresponds to a piece of a street

The problem with representing the street network with an array / matrix is that, in reality,

most junctions are directly connected to very few other junctions. That is, the network is very sparse. Just think for a minute about the case of Limerick city. Can you think of a junction that is adjacent (directly connected) to more than 5 other junctions? Asking the same thing differently, can you find a junction where 5 streets meet? The overwhelming majority of the entries in the matrix will be 0, meaning that the junctions are not directly connected.

Whatever about trying to represent Limerick city using an n n matrix (that uses (n2) space) it would be out of the question on a larger scale. Consider the street networks on this page. The files down the left hand column represent the street networks of regions for the world as taken from the OpenStreetMap project. The two right hand columns labelled n and m indicate, respectively, the number of nodes (street junctions) and street segments in the file.

The street network files are compressed using the bz2 compression format. This should not cause a problem on the linux lab machines. After you have read Section O below pick one of the countries and spend a few minutes looking at the type of information in its .graph and .graph.xyz files.

1539797301283020.png

It happens in lots of situations where huge matrices are used to represent relationships that the “interesting” data in the matrix is a very small fraction of the entire thing.

O The data in the .graph file follows the

format. MeTiS is a very successful program

for analysing the huge graphs that arise in DNA sequencing or data mining or a variety of

other applications. This is described in detail in §4.1.1 on p. 9 of the MeTiS user manual.

The street network data that you have been looking at follows the format of Fig 2(a) on p.

11. However, the important case for you to look at is Fig 2(b) which is what we will base our format on.

Œ Our goal is to write a program that, given a matrix with a large number of zeros, only the non-zero values should be outputted but in a way that the original matrix can be recon- structed. You could think of this as a conversion program that converts a matrix file with a lot of 0s into a more efficient format.

From a programming point of view the file format shown on Fig 2(b) on p. 11 of the MeTiS manual has two problems. Imagine you had to write the conversion program that compressed a given matrix. Firstly, since the format provides the number