COMPGROUPS.NET | Search | Post Question | Groups | Stream | About | Register

### Covariance matrix - missing data

• Follow

```I want to calculate a covariance matrix on a matrix of time series data. The individual series may be of differing lengths; the data matrix contains "NaN" entries for the period before a given time series starts. (Thus, for instance, we may have n values in column 1; column 2 may start with m "NaN" entries followed by an n-m column of values; column 3 may have k "NaN" entries followed by an n-k column of values and so on.)

I currently have some rather clumsy code to calculate each covariance entry separately. For entry (n,m), it extracts columns n and m, takes only the subset with entries in each column, and takes the covariance of those series. The code appears below.

I have two questions:
1. The code doesn't work. I wind up with a "Subscripted assignment dimension mismatch" error at the line "retcov(n, m) = cov(T1,T2);" Does anyone have an idea what I'm doing wrong?
2. This feels like a laboured approach to manage a matrix calculation that Excel performs fairly effortlessly (it automatically disregards nonnumeric entries). Is there a better way to go about calculating a covariance matrix on incomplete data?

Thanks.

sz = size(Returns);
retcov = zeros(sz(2),sz(2));
for n = 1:sz(2)
tempdata1 = Returns(:,n);
tempdata1 = tempdata1(isfinite(tempdata1));
k1 = length(tempdata1);
for m =1:sz(2);
tempdata2 = Returns(:,m);
tempdata2 = tempdata2(isfinite(tempdata2));
k2 = length(tempdata2);
k = min(k1, k2);
tempdata = [tempdata1(k1 - k + 1:k1) tempdata2(k2 - k + 1:k2)];
retcov(n, m) = cov(tempdata(:,1),tempdata(:,2));
end
end
```
 0
Reply William 9/9/2010 6:30:26 AM

```A = [NaN  10, 20, 13, NaN];
B = [NaN NaN,  5,  3,   2];

idx = ~isnan(A) & ~isnan(B);

cov(A(idx), B(idx))

ans =

24.50          7.00
7.00          2.00

Oleg
```
 0
Reply Oleg 9/9/2010 7:15:22 AM

```Also if you have the  Stats or the Financial TB you can use:

"c = nancov(..., 'pairwise') computes c(i,j) using rows with no NaN values in columns ior j. The result may not be a positive definite matrix. c = nancov(..., 'complete') is the default, and it omits rows with any NaN values, even if they are not in column i or j. The mean is removed from each column before calculating the result."

Be aware that the "pairwise" option  (even if self implemented) can lead to a non invertible VarCovVar matrix...

Oleg
```
 0
Reply Oleg 9/9/2010 7:55:23 AM

```"Oleg Komarov" <oleg.komarovRemove.this@hotmail.it> wrote in message <i6a3tb\$eos\$1@fred.mathworks.com>...
> Also if you have the  Stats or the Financial TB you can use:
>
> "c = nancov(..., 'pairwise') computes c(i,j) using rows with no NaN values in columns ior j. The result may not be a positive definite matrix. c = nancov(..., 'complete') is the default, and it omits rows with any NaN values, even if they are not in column i or j. The mean is removed from each column before calculating the result."
>
> Be aware that the "pairwise" option  (even if self implemented) can lead to a non invertible VarCovVar matrix...
>
> Oleg

Thanks for that, the nancov approach worked. Under the idx approach I still got the same "Subscripted assignment dimension mismatch" error message. I suspect there's something else going on with the data that's causing it.
```
 0
Reply William 9/10/2010 4:21:04 AM

3 Replies
179 Views

(page loaded in 0.027 seconds)

Similiar Articles:

7/16/2012 3:34:46 AM