OTSM.jl

OTSM.jl implements algorithms for solving the orthogonal trace sum maximization (OTSM) problem

\[\operatorname{maximize} \sum_{i,j=1}^m \operatorname{tr} (O_i^T S_{ij} O_j)\]

subject to orthogonality constraint $O_i^T O_i = I_r$. Here $S_{ij} \in \mathbb{R}^{d_i \times d_j}$, $1 \le i, j \le m$, are data matrices. $S_{ii}$ are symmetric and $S_{ij} = S_{ji}^T$. Many problems such as canonical correlation analysis (CCA) with $m \ge 2$ data sets, Procrustes analysis with $m \ge 2$ images, orthogonal least squares, and MaxBet are special cases of OSTM.

Details on OTSM are described in paper:

Joong-Ho Won, Hua Zhou, and Kenneth Lange. (2018) Orthogonal trace-sum maximization: applications, local algorithms, and global optimality, arXiv.

Installation

OTSM.jl requires Julia v1.0 or later. The package has not yet been registered and must be installed using the repository location. Start julia and use the ] key to switch to the package manager REPL

(@v1.4) pkg> add https://github.com/Hua-Zhou/OTSM.jl

Use the backspace key to return to the Julia REPL.

versioninfo()

Julia Version 1.5.0
Commit 96786e22cc (2020-08-01 23:44 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 4

# for use in this tutorial
using OTSM

Algorithms

Proximal block ascent algorithm

The otsm_pba() function implements an efficient local search algorithm for solving OTSM.

For documentation of the otsm_pba() function, type ?otsm_bpa in Julia REPL.

OTSM.otsm_pba — Function

otsm_pba(S, r)

Maximize the trace sum sum_{i,j} trace(Oi' * S[i, j] * Oj) = trace(O' * S * O) subject to orthogonality constraint Oi' * Oi == I(r) by a proximal block ascent algorithm. Each of S[i, j] for i < j is a di x dj matrix, S[i, i] are symmetric, and S[i, j] = S[j, i]'. Note otsm_pba is mutating in the sense the keyword argument O=O_init is updated with the final solution.

Positional arguments

S :: Matrix{Matrix}: S[i, j] is a di x dj matrix, S[i, i] are

symmetric, and S[j, i] = S[i, j]'.

r :: Integer : rank of solution.

Keyword arguments

αinv :: Number: proximal update constant $1/α$, default is1e-3`.
maxiter :: Integer: maximum number of iterations, default is 50000.
tolfun :: Number: tolerance for objective convergence, default is 1e-10.
tolvar :: Number: tolerance for iterate convergence, default is 1e-8.
verbose :: Bool : verbose display, default is false.
O :: Vector{Matrix}: starting point, default is init_tb(S, r).
log :: Bool: record iterate history or not, defaut is false.
soconstr :: Bool: constrain solutionOito be inSO(d), ie,det(Oi)=1`.

Output

O : result, O[i] has dimension di x r.
tracesum: objective value evaluated at final O.
obj : final objective value from PBA algorithm, should be same as tracesum.
history : iterate history.

source

Semidefinite programming (SDP) relaxation

The otsm_sdp() function implements an SDP relaxation approach for solving OTSM.

For documentation of the otsm_sdp() function, type ?otsm_sdp in Julia REPL.

OTSM.otsm_sdp — Function

otsm_sdp(S, r)

Maximize the trace sum sum_{i,j} trace(Oi' * S[i, j] * Oj) subject to the orthogonality constraint Oi'Oi = I(r) using an SDP relaxation (P-SDP in the manuscript). Each of S[i, j] for i < j is a di x dj matrix and S[i, j] = S[j, i]'.

Output

O : solution.
tracesum: trace sum at solution.
obj : SDP relaxation objective value at solution.
isexaxt : true means global optimality.

source

Start point

Different strategies for starting point are implemented.

Initialize $O_i$ by $I_r$

OTSM.init_eye — Function

init_eye(S)

Initialize O[i] by an di x r diagonal matrix with ones on diagonal.

source

Initialize $O_i$ by a strategy by Ten Berge (default)

This is the default for the proximal block ascent algorithm otsm_pba.

OTSM.init_tb — Function

init_tb(S)

Compute initial point following Ten Berge's second upper bound (https://doi.org/10.1007/BF02294053, p. 273). This is same as the Liu-Wang-Wang starting point strategy 2 (https://doi.org/10.1137/15M100883X, Algorithm 3.2, p. 1495). Take the eigenvectors corresponding to the r largest eigenvalues of S. This is a D x r orthogonal matrix, $D=\sum_{i=1}^m d_i$. For each di x r block, project to the Stiefel manifold. These blocks constitute an initial point.

source

Initialize $O_i$ by a strategy by Liu-Wang-Wang

OTSM.init_lww1 — Function

init_lww1(S, r)

Compute initial point following Liu-Wang-Wang starting point strategy 1 (https://doi.org/10.1137/15M100883X, Algorithm 3.1, p. 1494). Set O[1] to the top r eigenvectors of S[1, 1]. Then Ok = Uk * Qk, where Uk is the top r eigenvectors of S[k, k] and Qk is the Q factor in the QR decomposition of Uk * sum_{j<k} S[k, j] * O[j].

source

Initialize $O_i$ by a strategy by Shapiro-Botha and Won-Zhou-Lange

OTSM.init_sb — Function

init_sb(S, r)

Compute initial point by Shapiro-Botha (https://doi.org/10.1137/0609032, p. 380). Also see the extension by Won-Zhou-Lange paper (Lemma B.1). Replace diagonal blocks S[i, i] by $S_{i,i} - \sum_{j} P_{ij} * D_{ij} * P_{ij}$, where $P_{ij} * D_{ij} * Q_{ij}$ is the SVD of $S_{ij}$. The resulting matrix is negative semidefinite. Take the eigenvectors corresponding to the r largest eigenvalues. This is a D x r orthogonal matrix, $D=\sum_{i=1}^m d_i$. For each di x r block, project to the Stiefel manifold. These blocks constitute an initial point.

source

Example data - Port Wine

The package contains the port wine example data set from the Hanafi and Kiers (2006) paper. It can be retrieved by the portwine_data() function.

A, _, _ = portwine_data();

Data matrices A1, A2, A3, A4 record the ratings (centered at 0) of $m=4$ accessors on 8 port wines in $d_1=4$, $d_2=3$, $d_3=4$, and $d_4=3$ aspects respectively.

for i in 1:4
    display(A[i])
end

8×4 Array{Float64,2}:
  1.25  -5.0  -1.375   3.875
 -0.75   1.0  -0.375  -1.125
  1.25  -3.0  -1.375   0.875
 -0.75   2.0   0.625  -0.125
 -0.75   2.0  -0.375  -0.125
  0.25   3.0  -0.375  -3.125
 -0.75  -1.0   3.625  -1.125
  0.25   1.0  -0.375   0.875



8×3 Array{Float64,2}:
  2.0  -4.375   0.625
  1.0   1.625   0.625
  1.0  -1.375   2.625
 -1.0   1.625  -1.375
  0.0   0.625   0.625
 -1.0   0.625  -0.375
 -2.0  -0.375  -2.375
  0.0   1.625  -0.375



8×4 Array{Float64,2}:
  3.125   3.0  -2.5   0.75
 -1.875  -1.0   1.5   0.75
  2.125   2.0  -0.5   1.75
 -1.875  -1.0   1.5  -1.25
  1.125   0.0   0.5   0.75
 -0.875  -1.0   0.5  -0.25
 -1.875  -1.0  -0.5  -2.25
  0.125  -1.0  -0.5  -0.25



8×3 Array{Float64,2}:
  1.0   0.125   0.375
  0.0  -0.875  -1.625
  2.0  -0.875  -1.625
 -1.0   0.125  -0.625
  0.0   0.125  -0.625
  0.0   1.125   1.375
 -2.0  -1.875   1.375
  0.0   2.125   1.375

MAXDIFF

The MAXDIFF approach for CCA seeks the rotations of $A_i$ that achieve the maximal agreement

\[\operatorname{maximize} 2 \sum_{i < j} \operatorname{tr} (O_i^T A_i^T A_j O_j),\]

subject to constraint $O_i^T O_i = I_r$. This corresponds to an OTSM problem with $S_{ij} = A_i^T A_j$ and $S_{ii} = 0$.

Smaxdiff = [A[i]'A[j] for i in 1:4, j in 1:4]
for i in 1:4
    fill!(Smaxdiff[i, i], 0)
end
display.(Smaxdiff);

4×4 Array{Float64,2}:
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0



3×4 Array{Float64,2}:
  5.0   -15.0  -12.0     13.0
 -9.25   36.0    5.875  -20.375
  5.75  -11.0  -14.125    7.625



4×4 Array{Float64,2}:
  9.75  -26.0  -14.625   21.125
  8.0   -27.0  -10.0     18.0
 -6.0    21.0    2.5    -13.5
  4.5    -8.0  -12.75     6.75



3×4 Array{Float64,2}:
 6.0   -11.0  -12.0     8.0
 1.75    9.0   -6.625   1.125
 0.25    3.0    6.125  -2.625



4×3 Array{Float64,2}:
   5.0   -9.25     5.75
 -15.0   36.0    -11.0
 -12.0    5.875  -14.125
  13.0  -20.375    7.625



3×3 Array{Float64,2}:
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0



4×3 Array{Float64,2}:
 13.0  -21.625  14.375
 11.0  -21.0    11.0
 -5.0   16.5    -2.5
 10.0   -5.75   13.25



3×3 Array{Float64,2}:
  9.0  -8.0    12.0
  1.0   4.375   0.375
 -6.0  -0.875  -8.875



4×4 Array{Float64,2}:
   9.75     8.0   -6.0    4.5
 -26.0    -27.0   21.0   -8.0
 -14.625  -10.0    2.5  -12.75
  21.125   18.0  -13.5    6.75



3×4 Array{Float64,2}:
  13.0     11.0  -5.0  10.0
 -21.625  -21.0  16.5  -5.75
  14.375   11.0  -2.5  13.25



4×4 Array{Float64,2}:
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0



3×4 Array{Float64,2}:
 13.0    10.0  -4.0  10.0
  2.875  -2.0  -0.5   1.25
 -2.375  -4.0  -4.5  -7.25



4×3 Array{Float64,2}:
   6.0   1.75    0.25
 -11.0   9.0     3.0
 -12.0  -6.625   6.125
   8.0   1.125  -2.625



3×3 Array{Float64,2}:
  9.0  1.0    -6.0
 -8.0  4.375  -0.875
 12.0  0.375  -8.875



4×3 Array{Float64,2}:
 13.0   2.875  -2.375
 10.0  -2.0    -4.0
 -4.0  -0.5    -4.5
 10.0   1.25   -7.25



3×3 Array{Float64,2}:
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0

Proximal block ascent algorithm for finding a rank $r=2$ solution to MAXDIFF.

Ô_pba, ts_pba, obj, history = otsm_pba(Smaxdiff, 2; verbose = true);

iter = 1, obj = 539.8501989834106
iter = 2, obj = 542.2346791607897
iter = 3, obj = 542.3267553745872
iter = 4, obj = 542.3275270111224
iter = 5, obj = 542.3275503294589
iter = 6, obj = 542.3275506362337
iter = 7, obj = 542.3275506383454
iter = 8, obj = 542.3275506383524

The test_optimality() function attempts to certify whether a local solution O::Vector{Matrix} is a global solution. The first output indicates the solution is :infeasible, :suboptimal, :stationary_point, or :global_optimal.

# proximal block ascent yields the global solution
test_optimality(Ô_pba, Smaxdiff)[1]

:global_optimal

For documentation of the test_optimality() function, type ?test_optimality in Julia REPL.

OTSM.test_optimality — Function

test_optimality(O::Vector{Matrix}, S::Matrix{Matrix}; tol=1e-3, method=:wzl)

Test if the vector of orthogonal matrices O is globally optimal. The O is assumed to satisfy the first-order optimality conditions. Each of O[i] is a di x r matrix. Each of S[i, j] for i < j is a di x dj matrix. Tolerance tol is used to test the positivity of the smallest eigenvalue of the test matrix.

Positional arguments

O :: Vector{Matrix}: A point satisifying O[i]'O[i] = I(r).
S :: Matrix{Matrix}: Data matrix.

Keyword arguments

tol :: Number: Tolerance for testing psd of the test matrix.
method :: Symbol: :wzl (Theorem 3.1 of Won-Zhou-Lange https://arxiv.org/abs/1811.03521) or :lww (Theorem 2.4 of Liu-Wang-Wang https://doi.org/10.1137/15M100883X)

Output

status :: Symbol: A certificate that
- :infeasible: orthogonality constraints violated
- :suboptimal: failed the first-order optimality (stationarity) condition
- :stationary_point: satisfied first-order optimality; may or may not be global optimal
- :nonglobal_stationary_point: satisfied first-order optimality; must not be global optimal
- :global_optimal: certified global optimality
Λ :: Vector{Matrix}: Lagrange multipliers.
C :: Matrix{Matrix}: Certificate matrix corresponding to method.
λmin :: Number: The minimum eigenvalue of the certificate matrix.

source

MAXBET

The MAXBET approach for CCA seeks the rotations of $A_i$ that achieve the maximal agreement

\[\operatorname{maximize} \sum_{i,j} \operatorname{tr} (O_i^T A_i^T A_j O_j),\]

subject to constraint $O_i^T O_i = I_r$. This corresponds to an OTSM problem with $S_{ij} = A_i^T A_j$.

Smaxbet = [A[i]'A[j] for i in 1:4, j in 1:4]
display.(Smaxbet);

4×4 Array{Float64,2}:
   5.5   -12.0  -6.25     7.25
 -12.0    54.0   6.0    -31.0
  -6.25    6.0  17.875   -9.375
   7.25  -31.0  -9.375   28.875



3×4 Array{Float64,2}:
  5.0   -15.0  -12.0     13.0
 -9.25   36.0    5.875  -20.375
  5.75  -11.0  -14.125    7.625



4×4 Array{Float64,2}:
  9.75  -26.0  -14.625   21.125
  8.0   -27.0  -10.0     18.0
 -6.0    21.0    2.5    -13.5
  4.5    -8.0  -12.75     6.75



3×4 Array{Float64,2}:
 6.0   -11.0  -12.0     8.0
 1.75    9.0   -6.625   1.125
 0.25    3.0    6.125  -2.625



4×3 Array{Float64,2}:
   5.0   -9.25     5.75
 -15.0   36.0    -11.0
 -12.0    5.875  -14.125
  13.0  -20.375    7.625



3×3 Array{Float64,2}:
  12.0  -10.0    11.0
 -10.0   29.875  -7.125
  11.0   -7.125  15.875



4×3 Array{Float64,2}:
 13.0  -21.625  14.375
 11.0  -21.0    11.0
 -5.0   16.5    -2.5
 10.0   -5.75   13.25



3×3 Array{Float64,2}:
  9.0  -8.0    12.0
  1.0   4.375   0.375
 -6.0  -0.875  -8.875



4×4 Array{Float64,2}:
   9.75     8.0   -6.0    4.5
 -26.0    -27.0   21.0   -8.0
 -14.625  -10.0    2.5  -12.75
  21.125   18.0  -13.5    6.75



3×4 Array{Float64,2}:
  13.0     11.0  -5.0  10.0
 -21.625  -21.0  16.5  -5.75
  14.375   11.0  -2.5  13.25



4×4 Array{Float64,2}:
  26.875   20.0  -13.5  12.25
  20.0     18.0  -11.0   9.0
 -13.5    -11.0   12.0  -2.0
  12.25     9.0   -2.0  11.5



3×4 Array{Float64,2}:
 13.0    10.0  -4.0  10.0
  2.875  -2.0  -0.5   1.25
 -2.375  -4.0  -4.5  -7.25



4×3 Array{Float64,2}:
   6.0   1.75    0.25
 -11.0   9.0     3.0
 -12.0  -6.625   6.125
   8.0   1.125  -2.625



3×3 Array{Float64,2}:
  9.0  1.0    -6.0
 -8.0  4.375  -0.875
 12.0  0.375  -8.875



4×3 Array{Float64,2}:
 13.0   2.875  -2.375
 10.0  -2.0    -4.0
 -4.0  -0.5    -4.5
 10.0   1.25   -7.25



3×3 Array{Float64,2}:
 10.0   2.0    -5.0
  2.0  10.875   4.625
 -5.0   4.625  11.875

Proximal block ascent algorithm for finding a rank $r=2$ solution to MAXBET.

Ô_pba, ts_pba, obj, history = otsm_pba(Smaxbet, 2; verbose = true);

iter = 1, obj = 769.5257682063867
iter = 2, obj = 778.9896186367976
iter = 3, obj = 779.6705236544664
iter = 4, obj = 779.7414861674775
iter = 5, obj = 779.7533232474001
iter = 6, obj = 779.7561742783591
iter = 7, obj = 779.7569547275212
iter = 8, obj = 779.7571745803211
iter = 9, obj = 779.7572368336032
iter = 10, obj = 779.757254470489
iter = 11, obj = 779.757259465814
iter = 12, obj = 779.7572608802018
iter = 13, obj = 779.7572612805834
iter = 14, obj = 779.7572613939061
iter = 15, obj = 779.7572614259773
iter = 16, obj = 779.7572614350536
iter = 17, obj = 779.7572614376224
iter = 18, obj = 779.757261438349
iter = 19, obj = 779.7572614385546
iter = 20, obj = 779.7572614386131
iter = 21, obj = 779.7572614386294
iter = 22, obj = 779.7572614386343
iter = 23, obj = 779.7572614386352
iter = 24, obj = 779.7572614386353
iter = 25, obj = 779.7572614386356

This solution is certified to be global optimal.

# proximal block ascent yields the global solution
test_optimality(Ô_pba, Smaxbet)[1]

:global_optimal