# Tree decompositions and large-scale computation

## A crash course in 18 lectures

by Arnold Neumaier, University of Vienna

This page, http://www.mat.univie.ac.at/~neum/Tree/TDcourse.html, contains information about my course with the same title held from February 6-20, 2012 at the

Department of Computer Science, University of Szeged, Hungary.

The lectures were each workday from 14:00 to 16:30, with 15 minutes break in the middle, except Fridays where they were from 14:00 to 15:30 without a break.

## Lecture notes

The current version of the official lecture notes (1032K) may still have inaccuracies and incomprehensible remarks. Please report misprints, unclear passages, and suggestions for improvements, so that I can improve the text.

## Prerequisites

To have maximal benefit from the course, you should have a good background in graph theory and linear algebra.

## Exams

Examinations will be in writing a few weeks after the course. (The examination will be organized by Prof. Csendes.) You will be asked to do some multifrontal computations on toy examples similar to those covered in the course or given as exercises, and to answer some questions about the subject matter.

## Program

Lecture 1 (Monday, February 6):
I discussed in general terms the graph structure of large-scale computations, and which graphs are favorable to a domain decomposition approach. Most favorable are paths and trees (a rare case in applications), but graphs that are pathlike or treelike in the sense that they have a small pathwidth or treewidth share the multifrontal structure that makes a domain decomposition approach most efficient. This leads to a join tree structure of a collection of fronts covering the edges of the graph. If the fronts are not large, we may solve decomposable problems in linear time by solving small auxiliary subproblems in a forward sweep through the fronts from the leaves to the root, followed by a backward sweep that assembles the solution, following all branches of the tree from the root to the leaves.

Parallelization is supported if the join tree has lots of leaves. I informally discussed the nested dissection approach for finding a tree decomposition with many leaves, for the eample when the graph is a path (where the natural sequential approach is not parallelizable).

Lecture 2 (Monday, February 6):
I presented applications from five areas where multifrontal computations play an important role:

• sparse problems in linear algebra (solving linear systems, the sparse inverse, parallelization; fill-in)
• large-scale optimization (linear programming, structured semidefinite programming, discrete dynamic programming, discrete constraint satisfaction problems)
• statistical data analysis (computations of marginals, covariance estimation using conditional independence)
• signal processing (hidden Markov models)
• artificial intelligence (automatic theorem proving, reasoning under aleatoric uncertainty via Bayesian networks)

Lecture 3 (Tuesday, February 7):
I completed the overview over applications, presenting five further areas where multifrontal computations are important:

• database queries (constraint satisfaction)
• computational biology (sequence alignment; plant and animal breeding)
• market models (covariance estimation for mixed model equations)
• statistical mechanics (patition functions; quantum tree tensor networks)
• chemical engineering (multiple solutions in distillation columns)

Lecture 4 (Tuesday, February 7):
I introduced Boolean and more general discrete constraint satisfaction problems (CSPs) and table-based constraint propagation as a basic pruning technique.

I discussed the NP-completeness of the satisfiability problem, and the consequences for solving not sufficiently structured CSPs. I then defined the special case of sequential CSPs, and announced their solvability in linear time by an appropriate form of constraint propagation.

Lecture 5 (Wednesday, February 8):
I discussed how to solve sequential CSPs by calculating and updating tables of possibilities on fronts determined by a moving window, eliminating in a forward sweep through the fronts the variables that no longer appear in subsequent fronts. The frontal table of the last front then contains a list of partial solution, which can be completed in a backward sweep.

The time is linear in the sequence length but potentially exponential in the window length. It is also linear in the number of solutions computed, but as there may be exponentially many solutions, one may wish to compute only a few ones. An efficient implementation uses the state space form of a sequential CSP, which reduces the window length to two by encoding the past in terms of states.

Lecture 6 (Wednesday, February 8):
Slightly generalizing sequential CSPs, I introduced the concept of a path decomposition of the constraint graph, and showed how to create an efficient state space form to solve problems having a path decomposition of small width.

For problems in state space form, we discussed cost functions for which it is possible to efficiently find the best among all solutions of a CSP with a good path decomposition. This leads to a dynamic programming problem, where I derived the main recursive relation.

Lecture 7 (Thursday, February 9):
I presented an example for how to execute a dynamic programming algorithm for finding the best solution of CSPs in state space form. In fact we found all (two) optimal solutions.

I then discussed a Boolean CSP in 9 variables, computed its constraint graph and a particular tree decomposition, and illustrated various features of the nonsequential setting.

Lecture 8 (Thursday, February 9):
The Boolean CSP in 9 variables was solved in complete analogy to the sequential case. We found all 7 solutions and discussed ways to answer various questions about the solution without having to generate them all. This illustrates applications to database queries (that will be treated later).

As introduction to Part 2 of the course, which is supposed to cover the mathematical side of the subject, we looked at properties of trees that are not present in general graphs, but play an essential role for the use of trees in multifrontal computations. We then looked at the defining properties of join trees, noting that the join property and the intersection property are in fact equivalent.

Lecture 9 (Friday, February 10):
I discussed how to go from an arbitrary join tree to a minimal one, and proved that every edge of a join tree gives rise to a separator of the associated filled graph. Then I introduced tree decompositions of graphs in a formal way, and deduced that graphs with small treewidth must have many small separators.

This naturally leads to the nested dissection strategy for the construction of good tree decompositions. I ended with a discussion of the principles involved in creating good nested dissection software.

Lecture 10 (Monday, February 13):
I discussed clique properties of filled graphs and proved the chordal graph theorem, the deepest result in the subject.

Lecture 11 (Monday, February 13):
I introduced maximum cardinality search for finding fill-in free tree decompositions, and discussed the role of the clique intersection graph for finding all possible join trees with a given filled graph. I mentioned without deeper discussion minimum fill and minimum degree heuristics for the construction of tree decompositions.

Lecture 12 (Tuesday, February 14):
I introduced the sparsity graph of a matrix, and showed that Gaussian elimination produces a filled matrix whose sparsity graph is the filled graph in the corresponding elimination ordering. For matrices whose sparsity graph has a good tree decomposition, a multifrontal algorithm sois therefore able to solve the associated linear equations using only dense matrix computations on matrices of the size of the fronts. This is very efficient in today's world of parallel vector processing.

Lecture 13 (Tuesday, February 15):
In extension of the previous lecture, I showed how to use an additional backward sweep to efficiently calculate for symmetric matrices all entries of the inverse matrix whose indices are in a common front. In particular, this gives the diagonal of the inverse, relevant for assessing the accuracy of least square estimators.

It also allows one to efficiently compute the gradient of the logarithm of the determinant of a symmetric matrix dependent on parameters, which is of crucial importance for applications in animal breeding.

Lecture 14 (Wednesday, February 16):
To prepare for the discussion of Bayesian network, I reviewed some concepts from probability theory, in particular conditional proabilities.

Then I defined hidden Markov models, described their use in speech analysis (from sounds to phomnemes) and in speech synthesis in cell phones, and showed that the negative loglikelihood function has the structure that makes dynamic programming feasible.

Lecture 15 (Wednesday, February 16):
I introduced Viterbi decoding for finding the most likely state sequence given a HMM and an observation sequence, and showed that it reduces to solving a dynamic programming problem. Then I discussed the training problem - finding the probability tables for a HMM from a large sample of observation sequences. Prior probability tables can be found from expert assignments of state sequences to a subset of the training sample (or just by guessing a prior). The prior tables may then refined iteratively by Viterbi training, which labels observed sequences on the basis of Viterbi decoding and created new probability tables based on counting in the resulting data.

An important alternative is Baum-Welch training, which deduces the probability tables directly from the HMM probability measure, using belief propagation. This is a much slower process than Viterbi training, but has the advantage of corectly accounting for cases where competing state assignments with almost the same probabilities exist. This leads to ambiguities in the model predictions, where Viterbi decoding often chooses an inferior alternative when the probability tables are inaccurate.

Lecture 16 (Thursday, February 17):
After introducing the concept of conditional independence, I defined causal networks and proved some basic results for it. From a tree decomposition of the causal graph (which define a Bayesian network) one gets a product formula for the probability measure in terms of probability tables on fronts. This leads to multifrontal formulas for the potential (negative loglikelihood), which allows an efficient solution of the task of finding the most likely completion of a partial scenario in a causal network, and hence of training causal networks via Viterbi training.

Lecture 17 (Thursday, February 17):
I derived the recursion formulas for doing belief propagation in causal networks, adnd for finding probabilities on subtrees. based on this, it is possible to do the more accurate Baum-Welch training.

Lecture 18 (Friday, February 18):
I discussed the problem of finding all solutions of constrained nonlinear systems of equations in high dimensions. Local methods (e.g., multistart strategies) often become unreliable, while global methods (based on branch and bound) often get stuck in a combinatorial explosion of untreated subproblems.

If the constraint graph has a small treewidth, one may contemplate to devise a multifrontal solution procedure based on discretization, which reduces the problem to a discrete CSP. However, for a fine discretization, the tables are likely to become huge.

I described some of the problems arising in this approach applied to nonlinear systems for distillation columns, whose constraint graph indeed has small pathwidth. This is work in progress done by my research group in Vienna. I reported on a particular such problem that we could already solve successfully, while conventional methods produce erratic results.

## Program overview

(from before starting the course)

I intend to begin with a general overview of large-scale computation and applications where tree decompositions have a decisive advantage. I'll also review basic themes such as Boolean constraint satisfaction and polynomial-time algorithms, that form a background for the remaining course.

The next topic will be dynamic programming techniques for discrete constraint satisfaction problems with a temporal structure: how to decide whether a solution exists, how to find some or all solutions, how to select the best or the k best solutions according to some criterion.

Then I want to discuss some graph theoretic background and its relations to the direct solution of large linear systems of equations. I'll introduce join trees, tree decompositions, and the concept of treewidth, and show how their properties benefit a number of computations in linear algebra.

This will probably be followed by nonserial dynamic programming - the extension of the dynamic programming paradigm to problems whose structure is given by a sparse graph with comparatively small treewidth.

As an application, I intend to consider algorithms for efficient database queries.

Then I'll probably look at algorithms for efficiently finding good tree decompositions for large graphs, based on principles such as minimum degree, nested dissection, and maximum cardinality search.

The next topic will probably be hidden Markov models (HMMs) for analyzing and predicting discrete time series. I'll discuss training, testing, and updating a HMM, topics closely related to dynamic programming.

Bayesian networks generalize HMMs in the same way as nonserial dynamic programming generalizes dynamic programming. They are the most appropriate tool for reasoning under aleatoric uncertainty (belief propagation) and for forming consistent and parsimonious statistical models for large, structured datasets (machine learning).

Towards the end of the course, I'll also say something about recent work in our group on applying the above ideas to the problem of finding all solutions of difficult constrained nonlinear systems of equations arising in chemical engineering.

Other applications that I might treat if there is time and interest are structured semidefinite programming, restricted maximum likelihood methods for plant and animal breeding, sparse inverse covariance selection for large-scale data sets, and applications to statistical mechanics and quantum spin systems.

## Literature

A large collection of links on the topics treated can be found at my page on the

• Multifrontal solution of sparse problems

This list is neither systematic nor comprehensive.

Probabilistic networks and explanatory coherence (Paul Thagard)