My understanding is that PCA is a technique to orthogonalize possibly-non-orthogonal features (naming the outputs principle components instead of features). However, in the videos I've watched as well as this visual tool, the dimensions of the features space always have an orthogonal basis at the start, and the PCA just turns out to be an angle-preserving transformation from one orthogonal basis to another.
If features are said to be non-orthogonal, does that mean the dimensions of the features space are generated by a non-orthogonal basis, or merely that the data points are correlated over a feature space generated by an orthogonal basis? Is there some obvious isomorphism between these two geometries so that they're used interchangeably? Or is it that multicollinearity is the former concept, and multicollinearity and data point correlation are just two unrelated reasons to use PCA?
For example, in a supervised learning problem of predicting the annual average temperature on some planet at Longitude [imath]A_{test}[/imath] Latitude [imath]B_{test}[/imath] based on the features of the temperatures at Longitude [imath]A_{train 1}[/imath] Latitude [imath]B_{train 1}[/imath] and Longitude [imath]A_{train 2}[/imath] Latitude [imath]B_{train 2}[/imath], if the two training points are positioned very close to each other (especially if the testing point is far away), that seems to imply some defect in the dimensions of the feature space itself, not merely the data points. Is this a case of a highly non-orthogonal basis? The limiting case would be if we used exactly the same training position as the two features. Would that be a case of a feature space generated by two linearly dependent basis vectors, or merely of a straight line of data embedded in a feature space generated by two orthogonal basic vectors? Or is the problem that it really should be the former but you can't know to model it that way unless you know the Longitudes and Latitudes in advance and model accordingly, and the latter is the more generic representation of the data you'll get otherwise?
If features are said to be non-orthogonal, does that mean the dimensions of the features space are generated by a non-orthogonal basis, or merely that the data points are correlated over a feature space generated by an orthogonal basis? Is there some obvious isomorphism between these two geometries so that they're used interchangeably? Or is it that multicollinearity is the former concept, and multicollinearity and data point correlation are just two unrelated reasons to use PCA?
For example, in a supervised learning problem of predicting the annual average temperature on some planet at Longitude [imath]A_{test}[/imath] Latitude [imath]B_{test}[/imath] based on the features of the temperatures at Longitude [imath]A_{train 1}[/imath] Latitude [imath]B_{train 1}[/imath] and Longitude [imath]A_{train 2}[/imath] Latitude [imath]B_{train 2}[/imath], if the two training points are positioned very close to each other (especially if the testing point is far away), that seems to imply some defect in the dimensions of the feature space itself, not merely the data points. Is this a case of a highly non-orthogonal basis? The limiting case would be if we used exactly the same training position as the two features. Would that be a case of a feature space generated by two linearly dependent basis vectors, or merely of a straight line of data embedded in a feature space generated by two orthogonal basic vectors? Or is the problem that it really should be the former but you can't know to model it that way unless you know the Longitudes and Latitudes in advance and model accordingly, and the latter is the more generic representation of the data you'll get otherwise?