OTSM.jl
OTSM.jl implements algorithms for solving the orthogonal trace sum maximization (OTSM) problem
\[\operatorname{maximize} \sum_{i,j=1}^m \operatorname{tr} (O_i^T S_{ij} O_j)\]
subject to orthogonality constraint $O_i^T O_i = I_r$. Here $S_{ij} \in \mathbb{R}^{d_i \times d_j}$, $1 \le i, j \le m$, are data matrices. $S_{ii}$ are symmetric and $S_{ij} = S_{ji}^T$. Many problems such as canonical correlation analysis (CCA) with $m \ge 2$ data sets, Procrustes analysis with $m \ge 2$ images, orthogonal least squares, and MaxBet are special cases of OSTM.
Details on OTSM are described in paper:
- Joong-Ho Won, Hua Zhou, and Kenneth Lange. (2018) Orthogonal trace-sum maximization: applications, local algorithms, and global optimality, arXiv.
Installation
OTSM.jl requires Julia v1.0 or later. The package has not yet been registered and must be installed using the repository location. Start julia and use the ]
key to switch to the package manager REPL
(@v1.4) pkg> add https://github.com/Hua-Zhou/OTSM.jl
Use the backspace key to return to the Julia REPL.
versioninfo()
Julia Version 1.5.0
Commit 96786e22cc (2020-08-01 23:44 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.7.0)
CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS = 4
# for use in this tutorial
using OTSM
Algorithms
Proximal block ascent algorithm
The otsm_pba()
function implements an efficient local search algorithm for solving OTSM.
For documentation of the otsm_pba()
function, type ?otsm_bpa
in Julia REPL.
OTSM.otsm_pba
— Functionotsm_pba(S, r)
Maximize the trace sum sum_{i,j} trace(Oi' * S[i, j] * Oj) = trace(O' * S * O)
subject to orthogonality constraint Oi' * Oi == I(r)
by a proximal block ascent algorithm. Each of S[i, j]
for i < j
is a di x dj
matrix, S[i, i]
are symmetric, and S[i, j] = S[j, i]'
. Note otsm_pba
is mutating in the sense the keyword argument O=O_init
is updated with the final solution.
Positional arguments
S :: Matrix{Matrix}
:S[i, j]
is adi x dj
matrix,S[i, i]
are
symmetric, and S[j, i] = S[i, j]'
.
r :: Integer
: rank of solution.
Keyword arguments
αinv :: Number
: proximal update constant $1/α$, default is
1e-3`.maxiter :: Integer
: maximum number of iterations, default is50000
.tolfun :: Number
: tolerance for objective convergence, default is1e-10
.tolvar :: Number
: tolerance for iterate convergence, default is1e-8
.verbose :: Bool
: verbose display, default isfalse
.O :: Vector{Matrix}
: starting point, default isinit_tb(S, r)
.log :: Bool
: record iterate history or not, defaut isfalse
.soconstr
:: Bool: constrain solution
Oito be in
SO(d), ie,
det(Oi)=1`.
Output
O
: result,O[i]
has dimensiondi x r
.tracesum
: objective value evaluated at finalO
.obj
: final objective value from PBA algorithm, should be same astracesum
.history
: iterate history.
Semidefinite programming (SDP) relaxation
The otsm_sdp()
function implements an SDP relaxation approach for solving OTSM.
For documentation of the otsm_sdp()
function, type ?otsm_sdp
in Julia REPL.
OTSM.otsm_sdp
— Functionotsm_sdp(S, r)
Maximize the trace sum sum_{i,j} trace(Oi' * S[i, j] * Oj)
subject to the orthogonality constraint Oi'Oi = I(r)
using an SDP relaxation (P-SDP in the manuscript). Each of S[i, j]
for i < j
is a di x dj
matrix and S[i, j] = S[j, i]'
.
Output
O
: solution.tracesum
: trace sum at solution.obj
: SDP relaxation objective value at solution.isexaxt
:true
means global optimality.
Start point
Different strategies for starting point are implemented.
Initialize $O_i$ by $I_r$
OTSM.init_eye
— Functioninit_eye(S)
Initialize O[i]
by an di x r
diagonal matrix with ones on diagonal.
Initialize $O_i$ by a strategy by Ten Berge (default)
This is the default for the proximal block ascent algorithm otsm_pba
.
OTSM.init_tb
— Functioninit_tb(S)
Compute initial point following Ten Berge's second upper bound (https://doi.org/10.1007/BF02294053, p. 273). This is same as the Liu-Wang-Wang starting point strategy 2 (https://doi.org/10.1137/15M100883X, Algorithm 3.2, p. 1495). Take the eigenvectors corresponding to the r
largest eigenvalues of S
. This is a D x r
orthogonal matrix, $D=\sum_{i=1}^m d_i$. For each di x r
block, project to the Stiefel manifold. These blocks constitute an initial point.
Initialize $O_i$ by a strategy by Liu-Wang-Wang
OTSM.init_lww1
— Functioninit_lww1(S, r)
Compute initial point following Liu-Wang-Wang starting point strategy 1 (https://doi.org/10.1137/15M100883X, Algorithm 3.1, p. 1494). Set O[1]
to the top r
eigenvectors of S[1, 1]
. Then Ok = Uk * Qk
, where Uk
is the top r
eigenvectors of S[k, k]
and Qk
is the Q factor in the QR decomposition of Uk * sum_{j<k} S[k, j] * O[j]
.
Initialize $O_i$ by a strategy by Shapiro-Botha and Won-Zhou-Lange
OTSM.init_sb
— Functioninit_sb(S, r)
Compute initial point by Shapiro-Botha (https://doi.org/10.1137/0609032, p. 380). Also see the extension by Won-Zhou-Lange paper (Lemma B.1). Replace diagonal blocks S[i, i]
by $S_{i,i} - \sum_{j} P_{ij} * D_{ij} * P_{ij}$, where $P_{ij} * D_{ij} * Q_{ij}$ is the SVD of $S_{ij}$. The resulting matrix is negative semidefinite. Take the eigenvectors corresponding to the r
largest eigenvalues. This is a D x r
orthogonal matrix, $D=\sum_{i=1}^m d_i$. For each di x r
block, project to the Stiefel manifold. These blocks constitute an initial point.
Example data - Port Wine
The package contains the port wine example data set from the Hanafi and Kiers (2006) paper. It can be retrieved by the portwine_data()
function.
A, _, _ = portwine_data();
Data matrices A1, A2, A3, A4 record the ratings (centered at 0) of $m=4$ accessors on 8 port wines in $d_1=4$, $d_2=3$, $d_3=4$, and $d_4=3$ aspects respectively.
for i in 1:4
display(A[i])
end
8×4 Array{Float64,2}:
1.25 -5.0 -1.375 3.875
-0.75 1.0 -0.375 -1.125
1.25 -3.0 -1.375 0.875
-0.75 2.0 0.625 -0.125
-0.75 2.0 -0.375 -0.125
0.25 3.0 -0.375 -3.125
-0.75 -1.0 3.625 -1.125
0.25 1.0 -0.375 0.875
8×3 Array{Float64,2}:
2.0 -4.375 0.625
1.0 1.625 0.625
1.0 -1.375 2.625
-1.0 1.625 -1.375
0.0 0.625 0.625
-1.0 0.625 -0.375
-2.0 -0.375 -2.375
0.0 1.625 -0.375
8×4 Array{Float64,2}:
3.125 3.0 -2.5 0.75
-1.875 -1.0 1.5 0.75
2.125 2.0 -0.5 1.75
-1.875 -1.0 1.5 -1.25
1.125 0.0 0.5 0.75
-0.875 -1.0 0.5 -0.25
-1.875 -1.0 -0.5 -2.25
0.125 -1.0 -0.5 -0.25
8×3 Array{Float64,2}:
1.0 0.125 0.375
0.0 -0.875 -1.625
2.0 -0.875 -1.625
-1.0 0.125 -0.625
0.0 0.125 -0.625
0.0 1.125 1.375
-2.0 -1.875 1.375
0.0 2.125 1.375
MAXDIFF
The MAXDIFF approach for CCA seeks the rotations of $A_i$ that achieve the maximal agreement
\[\operatorname{maximize} 2 \sum_{i < j} \operatorname{tr} (O_i^T A_i^T A_j O_j),\]
subject to constraint $O_i^T O_i = I_r$. This corresponds to an OTSM problem with $S_{ij} = A_i^T A_j$ and $S_{ii} = 0$.
Smaxdiff = [A[i]'A[j] for i in 1:4, j in 1:4]
for i in 1:4
fill!(Smaxdiff[i, i], 0)
end
display.(Smaxdiff);
4×4 Array{Float64,2}:
0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0
3×4 Array{Float64,2}:
5.0 -15.0 -12.0 13.0
-9.25 36.0 5.875 -20.375
5.75 -11.0 -14.125 7.625
4×4 Array{Float64,2}:
9.75 -26.0 -14.625 21.125
8.0 -27.0 -10.0 18.0
-6.0 21.0 2.5 -13.5
4.5 -8.0 -12.75 6.75
3×4 Array{Float64,2}:
6.0 -11.0 -12.0 8.0
1.75 9.0 -6.625 1.125
0.25 3.0 6.125 -2.625
4×3 Array{Float64,2}:
5.0 -9.25 5.75
-15.0 36.0 -11.0
-12.0 5.875 -14.125
13.0 -20.375 7.625
3×3 Array{Float64,2}:
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
4×3 Array{Float64,2}:
13.0 -21.625 14.375
11.0 -21.0 11.0
-5.0 16.5 -2.5
10.0 -5.75 13.25
3×3 Array{Float64,2}:
9.0 -8.0 12.0
1.0 4.375 0.375
-6.0 -0.875 -8.875
4×4 Array{Float64,2}:
9.75 8.0 -6.0 4.5
-26.0 -27.0 21.0 -8.0
-14.625 -10.0 2.5 -12.75
21.125 18.0 -13.5 6.75
3×4 Array{Float64,2}:
13.0 11.0 -5.0 10.0
-21.625 -21.0 16.5 -5.75
14.375 11.0 -2.5 13.25
4×4 Array{Float64,2}:
0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0
3×4 Array{Float64,2}:
13.0 10.0 -4.0 10.0
2.875 -2.0 -0.5 1.25
-2.375 -4.0 -4.5 -7.25
4×3 Array{Float64,2}:
6.0 1.75 0.25
-11.0 9.0 3.0
-12.0 -6.625 6.125
8.0 1.125 -2.625
3×3 Array{Float64,2}:
9.0 1.0 -6.0
-8.0 4.375 -0.875
12.0 0.375 -8.875
4×3 Array{Float64,2}:
13.0 2.875 -2.375
10.0 -2.0 -4.0
-4.0 -0.5 -4.5
10.0 1.25 -7.25
3×3 Array{Float64,2}:
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
Proximal block ascent algorithm for finding a rank $r=2$ solution to MAXDIFF.
Ô_pba, ts_pba, obj, history = otsm_pba(Smaxdiff, 2; verbose = true);
iter = 1, obj = 539.8501989834106
iter = 2, obj = 542.2346791607897
iter = 3, obj = 542.3267553745872
iter = 4, obj = 542.3275270111224
iter = 5, obj = 542.3275503294589
iter = 6, obj = 542.3275506362337
iter = 7, obj = 542.3275506383454
iter = 8, obj = 542.3275506383524
The test_optimality()
function attempts to certify whether a local solution O::Vector{Matrix}
is a global solution. The first output indicates the solution is :infeasible
, :suboptimal
, :stationary_point
, or :global_optimal
.
# proximal block ascent yields the global solution
test_optimality(Ô_pba, Smaxdiff)[1]
:global_optimal
For documentation of the test_optimality()
function, type ?test_optimality
in Julia REPL.
OTSM.test_optimality
— Functiontest_optimality(O::Vector{Matrix}, S::Matrix{Matrix}; tol=1e-3, method=:wzl)
Test if the vector of orthogonal matrices O
is globally optimal. The O
is assumed to satisfy the first-order optimality conditions. Each of O[i]
is a di x r
matrix. Each of S[i, j]
for i < j
is a di x dj
matrix. Tolerance tol
is used to test the positivity of the smallest eigenvalue of the test matrix.
Positional arguments
O :: Vector{Matrix}
: A point satisifyingO[i]'O[i] = I(r)
.S :: Matrix{Matrix}
: Data matrix.
Keyword arguments
tol :: Number
: Tolerance for testing psd of the test matrix.method :: Symbol
::wzl
(Theorem 3.1 of Won-Zhou-Lange https://arxiv.org/abs/1811.03521) or:lww
(Theorem 2.4 of Liu-Wang-Wang https://doi.org/10.1137/15M100883X)
Output
status :: Symbol
: A certificate that:infeasible
: orthogonality constraints violated:suboptimal
: failed the first-order optimality (stationarity) condition:stationary_point
: satisfied first-order optimality; may or may not be global optimal:nonglobal_stationary_point
: satisfied first-order optimality; must not be global optimal:global_optimal
: certified global optimality
Λ :: Vector{Matrix}
: Lagrange multipliers.C :: Matrix{Matrix}
: Certificate matrix corresponding tomethod
.λmin :: Number
: The minimum eigenvalue of the certificate matrix.
MAXBET
The MAXBET approach for CCA seeks the rotations of $A_i$ that achieve the maximal agreement
\[\operatorname{maximize} \sum_{i,j} \operatorname{tr} (O_i^T A_i^T A_j O_j),\]
subject to constraint $O_i^T O_i = I_r$. This corresponds to an OTSM problem with $S_{ij} = A_i^T A_j$.
Smaxbet = [A[i]'A[j] for i in 1:4, j in 1:4]
display.(Smaxbet);
4×4 Array{Float64,2}:
5.5 -12.0 -6.25 7.25
-12.0 54.0 6.0 -31.0
-6.25 6.0 17.875 -9.375
7.25 -31.0 -9.375 28.875
3×4 Array{Float64,2}:
5.0 -15.0 -12.0 13.0
-9.25 36.0 5.875 -20.375
5.75 -11.0 -14.125 7.625
4×4 Array{Float64,2}:
9.75 -26.0 -14.625 21.125
8.0 -27.0 -10.0 18.0
-6.0 21.0 2.5 -13.5
4.5 -8.0 -12.75 6.75
3×4 Array{Float64,2}:
6.0 -11.0 -12.0 8.0
1.75 9.0 -6.625 1.125
0.25 3.0 6.125 -2.625
4×3 Array{Float64,2}:
5.0 -9.25 5.75
-15.0 36.0 -11.0
-12.0 5.875 -14.125
13.0 -20.375 7.625
3×3 Array{Float64,2}:
12.0 -10.0 11.0
-10.0 29.875 -7.125
11.0 -7.125 15.875
4×3 Array{Float64,2}:
13.0 -21.625 14.375
11.0 -21.0 11.0
-5.0 16.5 -2.5
10.0 -5.75 13.25
3×3 Array{Float64,2}:
9.0 -8.0 12.0
1.0 4.375 0.375
-6.0 -0.875 -8.875
4×4 Array{Float64,2}:
9.75 8.0 -6.0 4.5
-26.0 -27.0 21.0 -8.0
-14.625 -10.0 2.5 -12.75
21.125 18.0 -13.5 6.75
3×4 Array{Float64,2}:
13.0 11.0 -5.0 10.0
-21.625 -21.0 16.5 -5.75
14.375 11.0 -2.5 13.25
4×4 Array{Float64,2}:
26.875 20.0 -13.5 12.25
20.0 18.0 -11.0 9.0
-13.5 -11.0 12.0 -2.0
12.25 9.0 -2.0 11.5
3×4 Array{Float64,2}:
13.0 10.0 -4.0 10.0
2.875 -2.0 -0.5 1.25
-2.375 -4.0 -4.5 -7.25
4×3 Array{Float64,2}:
6.0 1.75 0.25
-11.0 9.0 3.0
-12.0 -6.625 6.125
8.0 1.125 -2.625
3×3 Array{Float64,2}:
9.0 1.0 -6.0
-8.0 4.375 -0.875
12.0 0.375 -8.875
4×3 Array{Float64,2}:
13.0 2.875 -2.375
10.0 -2.0 -4.0
-4.0 -0.5 -4.5
10.0 1.25 -7.25
3×3 Array{Float64,2}:
10.0 2.0 -5.0
2.0 10.875 4.625
-5.0 4.625 11.875
Proximal block ascent algorithm for finding a rank $r=2$ solution to MAXBET.
Ô_pba, ts_pba, obj, history = otsm_pba(Smaxbet, 2; verbose = true);
iter = 1, obj = 769.5257682063867
iter = 2, obj = 778.9896186367976
iter = 3, obj = 779.6705236544664
iter = 4, obj = 779.7414861674775
iter = 5, obj = 779.7533232474001
iter = 6, obj = 779.7561742783591
iter = 7, obj = 779.7569547275212
iter = 8, obj = 779.7571745803211
iter = 9, obj = 779.7572368336032
iter = 10, obj = 779.757254470489
iter = 11, obj = 779.757259465814
iter = 12, obj = 779.7572608802018
iter = 13, obj = 779.7572612805834
iter = 14, obj = 779.7572613939061
iter = 15, obj = 779.7572614259773
iter = 16, obj = 779.7572614350536
iter = 17, obj = 779.7572614376224
iter = 18, obj = 779.757261438349
iter = 19, obj = 779.7572614385546
iter = 20, obj = 779.7572614386131
iter = 21, obj = 779.7572614386294
iter = 22, obj = 779.7572614386343
iter = 23, obj = 779.7572614386352
iter = 24, obj = 779.7572614386353
iter = 25, obj = 779.7572614386356
This solution is certified to be global optimal.
# proximal block ascent yields the global solution
test_optimality(Ô_pba, Smaxbet)[1]
:global_optimal