inequality constrained least square for large matrix

bettertony · Sep 26, 2014

Hi,

I am working on a linear equation (y'=X*m'). To get the solution, I need to add some inequality constraint like, b1'<=A*m'<=b2'. The dimension of X is very high, let's say 10^5 * 10^5. I think I have to use some iterative method.

Anybody can provide a solution for me? Many thanks!

Ishuda · Sep 26, 2014

bettertony said:
Hi,

I am working on a linear equation (y=a*X). To get the solution, I need to add some inequality constraint like, b1<=AX<=b2. The dimension of X is very high, let's say 10^5 * 10^5. I think I have to use some iterative method.

Anybody can provide a solution for me? Many thanks!

The inequality
b1 <= AX <= b2
implies AX is a scalar (one dimensional) so we are going to need more information. If X is multiple dimensional just what does AX mean? If the quantities b1, AX, and b2 are not scalars, what do the inequalities mean?

For example, lets come down to a much smaller dimension. Looking at just the left side of the inequality
AX <= b2
and letting AX be the dot product of two three dimensional vectors A and X, one boundary of the restricted region given by the inequalities is the plane
a_x x + a_y y + a_z z = b2
where A = (a_x, a_y, a_z) and X = (x, y, z). Thus the region is on one side of that plane and on a corresponding side of the plane for the right side of the inequality.

If AX were the dot product for your example, the restricted region would just be the region bounded by two hyper-planes. Although some people do have the ability to visualize multiple dimension regions and immediately tell where a point in N space lies relative to an N space hyper-plane, most of us mortals can only do that for N = 1. That is, all of us can normally immediately tell if 5 is less than 7 but few of us can immediately tell if (19,321) lies below the line y = 17 +92 x (it generally takes maybe a second or so for something like this), and even fewer can visual so distinctly in 3 and higher dimensions. So, if it is the dot product, you are just going to have to do the arithmetic or have someone/a computer/... do it for you.

bettertony · Sep 26, 2014

Hi Ishuda,
Thanks for your reply. Below are some answers in bold font.

Ishuda said:
The inequality
b1 <= AX <= b2
implies AX is a scalar (one dimensional) so we are going to need more information. If X is multiple dimensional just what does AX mean? If the quantities b1, AX, and b2 are not scalars, what do the inequalities mean?
X is a matrix representing the coefficients for the independent variable a (vector). Sorry for that I wrote wrong, it should be y=X*a, and usually we may prefer to use the notation "m" (model) instead of "a".
A is just some extra weighting matrix on X to set the constraint. In many cases we may just ignore A, means A is an diagonal identical matrix. b1,b2 can be scalar or vector. Scalar is just a special case of vector.

For example, lets come down to a much smaller dimension. Looking at just the left side of the inequality
AX <= b2
and letting AX be the dot product of two three dimensional vectors A and X, one boundary of the restricted region given by the inequalities is the plane
a_x x + a_y y + a_z z = b2
where A = (a_x, a_y, a_z) and X = (x, y, z). Thus the region is on one side of that plane and on a corresponding side of the plane for the right side of the inequality.

If AX were the dot product for your example, the restricted region would just be the region bounded by two hyper-planes. Although some people do have the ability to visualize multiple dimension regions and immediately tell where a point in N space lies relative to an N space hyper-plane, most of us mortals can only do that for N = 1. That is, all of us can normally immediately tell if 5 is less than 7 but few of us can immediately tell if (19,321) lies below the line y = 17 +92 x (it generally takes maybe a second or so for something like this), and even fewer can visual so distinctly in 3 and higher dimensions. So, if it is the dot product, you are just going to have to do the arithmetic or have someone/a computer/... do it for you.

What you said about hyper plane makes sense. The main question is how to solve this numerically, especially when the matrix is huge and not sparse. I think for matrix inverse, there is several iterative ways. But for constrained examples, I have no experience.

Ishuda · Sep 26, 2014

bettertony said:
Hi Ishuda,
Thanks for your reply. Below are some answers in bold font.

What you said about hyper plane makes sense. The main question is how to solve this numerically, especially when the matrix is huge and not sparse. I think for matrix inverse, there is several iterative ways. But for constrained examples, I have no experience.

I'm still not understanding the question? What is a? What is X? What is a*X? What is A? Is a the same as A? What is AX? What is b1? What is b2? What is fixed and what is variable.

Normally, if one uses inequality signs, it means the expressions are scalars, that is the expressions evaluate to numbers belonging to the real numbers. Also expressions like a*X (or AX) would be a scalar times a scalar. However, since you have said that the dimension of X is very large in your first post and now mention a huge matrix, further explanations must be made or any questions involving them are meaningless.

bettertony · Sep 26, 2014

Hi Ishuda,

I have explained in the last reply. It was in the middle of your message. There was an error. It should be y=X*a. Just a mormal expression of linear system. Every little case symbol represents vector or scalar. Big case represents matrix. "a" is what we want to estimate (the model). "X" is the coefficients of "a". The row of X represents the dimension for data (or say observations), the column of X represents the dimension of model. It is huge means both data and model are very large, say we have 10^5 data to estimate 10^5 parameters. If "X" is not huge or it is a sparse matrix, I think there are codes in matlab or scipy which can solve the problem. But in case of a large non-sparse matrix, I have no experience of solving it.

Ishuda said:
I am working on a linear equation (y=a*X). To get the solution, I need to add some inequality constraint like, b1<=AX<=b2. The dimension of X is very high, let's say 10^5 * 10^5. I think I have to use some iterative method.

I'm still not understanding the question? What is a? What is X? What is a*X? What is A? Is a the same as A? What is AX? What is b1? What is b2? What is fixed and what is variable.

Normally, if one uses inequality signs, it means the expressions are scalars, that is the expressions evaluate to numbers belonging to the real numbers. Also expressions like a*X (or AX) would be a scalar times a scalar. However, since you have said that the dimension of X is very large in your first post and now mention a huge matrix, further explanations must be made or any questions involving them are meaningless.

tkhunny · Sep 26, 2014

#1 thing you should worry about is that 10^5 is probably ridiculous. I DARE you to show that so many variables are independent. You are embarking on a voyage of Junk Science - at best!!! Cut it down - WAY DOWN - WAY WAY DOWN!!!!! Stick with variables that make sense and have a theoretical basis. One you have a rational number of independent variables, it will not matter how much data you have. The generalized inverse of such a cauldron of data is very likely to exist and you'll be done. Of course, then you've the task of estimating the error.

Ishuda · Sep 26, 2014

bettertony said:
Hi Ishuda,

I have explained in the last reply. It was in the middle of your message. There was an error. It should be y=X*a. Just a mormal expression of linear system. Every little case symbol represents vector or scalar. Big case represents matrix. "a" is what we want to estimate (the model). "X" is the coefficients of "a". The row of X represents the dimension for data (or say observations), the column of X represents the dimension of model. It is huge means both data and model are very large, say we have 10^5 data to estimate 10^5 parameters. If "X" is not huge or it is a sparse matrix, I think there are codes in matlab or scipy which can solve the problem. But in case of a large non-sparse matrix, I have no experience of solving it.

All right, maybe we've gotten this far: X and A are n by n matrices, a and y are column vectors of dimension n. a is the coefficient vector, X and y are observed data. Is that correct?

If that is correct then AX (meaning matrix multiplication of A and X, I would assume) is another n by n matrix so we come back to the question what do you mean by your constraint equation? It still doesn't make sense without some explanation, how can an j dimensional scalar or vector be larger/smaller than a n by n matrix? Is it possible that A was supposed to be a and less/greater than means component pairwise less/greater than the row vectors b1 and b2? However that doesn't make sense unless you tell us why you would expect multiple solutions for a. Do you, for instance, know the determinate of X is zero and there are multiple solutions?

You still haven't answered the questions about:

What is b1 (scalar or vector)?
What is b2 (scalar or vector)?
What is AX (just a single matrix or does it mean the multiplication of some given matrix A and the observed data X)?
What do the inequalities mean (what does a scalar/vector being less than a matrix mean)?
What is fixed and what is variable, i.e. we (may) have a is the coefficient vector, X and y are observed data but are A (a matrix), b1 (a scalar or vector) and b2 (a scalar or vector) given but unspecified at the particular moment?

Edit: Changed to numbered list from bullet list. Actually the answer to (3) doesn't really make any difference as longe as the result is as stated, i.e. AX (capital letters) is a matrix.

bettertony · Sep 27, 2014

Apologize for my bad typing. I modified the original post. the constraint should be b1'<A*a'<b2'. ' means transpose. This is just a common inequality constrained least square system in science or engineering applications. If "X" is not big, there should not be any difficulty. If I am correct, just need to apply a Lagrange multiplier. Anyway I forgot the complete solution. But here the issue is computer memory and computation time when the dimension becomes huge.

PS: to be accuarate b1 and b2 should be vector (say scalar just because in high level programming language, there is usually a process called broadcasting). e.g. if I want to constrain all "a" is larger than "0". Then "A" is a diagonal identical matrix, and b1 = [0, 0, 0 ..., 0].

Ishuda said:
All right, maybe we've gotten this far: X and A are n by n matrices, a and y are column vectors of dimension n. a is the coefficient vector, X and y are observed data. Is that correct?

If that is correct then AX (meaning matrix multiplication of A and X, I would assume) is another n by n matrix so we come back to the question what do you mean by your constraint equation? It still doesn't make sense without some explanation, how can an j dimensional scalar or vector be larger/smaller than a n by n matrix? Is it possible that A was supposed to be a and less/greater than means component pairwise less/greater than the row vectors b1 and b2? However that doesn't make sense unless you tell us why you would expect multiple solutions for a. Do you, for instance, know the determinate of X is zero and there are multiple solutions?

You still haven't answered the questions about:

What is b1 (scalar or vector)?

What is b2 (scalar or vector)?

What is AX (just a single matrix or does it mean the multiplication of some given matrix A and the observed data X)?

What do the inequalities mean (what does a scalar/vector being less than a matrix mean)?

What is fixed and what is variable, i.e. we (may) have a is the coefficient vector, X and y are observed data but are A (a matrix), b1 (a scalar or vector) and b2 (a scalar or vector) given but unspecified at the particular moment?

Edit: Changed to numbered list from bullet list. Actually the answer to (3) doesn't really make any difference as longe as the result is as stated, i.e. AX (capital letters) is a matrix.

Ishuda · Sep 27, 2014

bettertony said:
Apologize for my bad typing. I modified the original post. the constraint should be b1'<A*a'<b2'. ' means transpose. This is just a common inequality constrained least square system in science or engineering applications. If "X" is not big, there should not be any difficulty. If I am correct, just need to apply a Lagrange multiplier. Anyway I forgot the complete solution. But here the issue is computer memory and computation time when the dimension becomes huge.

PS: to be accuarate b1 and b2 should be vector (say scalar just because in high level programming language, there is usually a process called broadcasting). e.g. if I want to constrain all "a" is larger than "0". Then "A" is a diagonal identical matrix, and b1 = [0, 0, 0 ..., 0].

O.K. Solutions may depend on just what X is. For example would it be worth doing a tri-diagonalization of X. If so, there are several ways to solve for that system which are easy and require much less memory (by at least a factor of 10³ I would think and maybe more). One method is to take advantage of the fact that it is a linear system. In general the solution for a tri-diagonal system can go like this (a and y are vectors and the rest, including E are scalars): Create an error function E of u (the first element of the vector a). The error function is computed by given a value of a₁ (= u₁) the value of a₂ is computed from the first row of a tri-diagonal matrix and the first element of (the modified) y (= y₁). Given a₁ and a₂ the value of a₃ can be computed using the second row and y₂, etc. Finally, compute the value v of resulting from the n^th row and let
E(u₁) = y_n - v
Now choose a u₂ and repeat to get an E(u₂). Theoretically you now have the equation for E,
E(u) = E(u₁) + \(\displaystyle \frac{E(u_2) - E(u_1)}{u_2 - u_1}\) (u - u₁)
and can solve for E(u)=0.

LU or LDU decompositions are also relatively fast to implement sometimes although memory requirements are much larger if you want to store the complete matrices (on the order of n² just for the matrices).

You might use Google for
large -sparse matrix solutions

As a final note, I don't know how you are going to constrain the solution but you will just have to perform the required calculation and check whether it meets constraints. If it doesn't then change the conditions and get another solution until you one (many?) that fit.

inequality constrained least square for large matrix

bettertony

New member

Ishuda

Elite Member

bettertony

New member

Ishuda

Elite Member

bettertony

New member

tkhunny

Moderator

Ishuda

Elite Member

bettertony

New member

Ishuda

Elite Member