f

can MEX codes be slower than native matlab code ?

```Hi all,
I recently started trying to write MEX code for my bottlenecks in my FEM program.
For linear algebra operations like kronecker products, I saw good results.
However, when I tried to  convert my assembly procedure into MEX code -- http://www.mathworks.com/matlabcentral/newsreader/view_thread/290656

I found it fared worse. I gave that up and after some research online, I found that using the sparse matrix properties to perform the assembly process is much faster.
However, when I tried to  write MEX  code for this, again it proved way slower ( mex code attached ).--
http://blogs.mathworks.com/loren/2007/03/01/creating-sparse-finite-element-matrices-in-matlab/

I venture to guess-
1. If I have 19 inputs and 16 outputs as I do here, and they are large arrays, does MEX code become pointless ?
2. Is it that  the interfacing itself  has high overhead for such cases ?
If so, I would like to know so that I dont waste time on MEX henceforth.
%=======================================================
#include "mex.h"
#include "math.h"

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
double *kmat, *r_vec, *c_vec ;
double *I_ff, *I_fp, *I_pf, *I_pp, *J_ff, *J_fp, *J_pf, *J_pp;
double  *X_ff, *X_fp, *X_pf, *X_pp;
int  n_ff,n_fp,n_pf,n_pp ;

int size_r_vec, size_c_vec ;

/*  create a pointer to the input vector r */
kmat = mxGetPr(prhs[0]);
r_vec = mxGetPr(prhs[1]);
c_vec = mxGetPr(prhs[2]);
I_ff = mxGetPr(prhs[3]);
I_fp = mxGetPr(prhs[4]);
I_pf = mxGetPr(prhs[5]);
I_pp = mxGetPr(prhs[6]);
J_ff = mxGetPr(prhs[7]);
J_fp = mxGetPr(prhs[8]);
J_pf = mxGetPr(prhs[9]);
J_pp = mxGetPr(prhs[10]);
X_ff = mxGetPr(prhs[11]);
X_fp = mxGetPr(prhs[12]);
X_pf = mxGetPr(prhs[13]);
X_pp = mxGetPr(prhs[14]);

/* Extract n_ff, n_fp etc from input */
n_ff = (int)mxGetScalar(prhs[15]);
n_fp = (int)mxGetScalar(prhs[16]);
n_pf = (int)mxGetScalar(prhs[17]);
n_pp = (int)mxGetScalar(prhs[18]);

/*  get the dimensions of the matrix input y */

size_r_vec = mxGetM(prhs[1]);
size_c_vec = mxGetM(prhs[2]);

int i, j ;
int r_value, c_value ;

for(i = 0; i  0)
{ n_ff = n_ff +  1;
I_ff[n_ff -1] = r_value;
J_ff[n_ff -1] = c_value;
X_ff[n_ff -1] = kmat[i + size_r_vec*j];
}
else if(r_value> 0  && c_value < 0)
{ n_fp  = n_fp + 1;
I_fp[n_fp -1] = r_value;
J_fp[n_fp -1] = abs(c_value);
X_fp[n_fp -1] = kmat[i + size_r_vec*j];
}
else if(r_value 0)
{ n_pf = n_pf + 1;
I_pf[n_pf -1] = abs(r_value);
J_pf[n_pf -1] = (c_value);
X_pf[n_pf -1] = kmat[i + size_r_vec*j];
}
else if(r_value< 0  && c_value < 0)
{n_pp  = n_pp + 1;
I_pp[n_pp-1]= abs(r_value);
J_pp[n_pp -1] = abs(c_value);
X_pp[n_pp -1] = kmat[i + size_r_vec*j];
}
}
}

/* Assign to output */

plhs[0] = mxDuplicateArray(prhs[3]);
plhs[1] = mxDuplicateArray(prhs[4]);
plhs[2] = mxDuplicateArray(prhs[5]);
plhs[3] = mxDuplicateArray(prhs[6]);
plhs[4] = mxDuplicateArray(prhs[7]);
plhs[5] = mxDuplicateArray(prhs[8]);
plhs[6] = mxDuplicateArray(prhs[9]);
plhs[7] = mxDuplicateArray(prhs[10]);
plhs[8] = mxDuplicateArray(prhs[11]);
plhs[9] = mxDuplicateArray(prhs[12]);
plhs[10] = mxDuplicateArray(prhs[13]);
plhs[11] = mxDuplicateArray(prhs[14]);

plhs[12] = mxCreateDoubleScalar(n_ff);
plhs[13] = mxCreateDoubleScalar(n_fp);
plhs[14] = mxCreateDoubleScalar(n_pf);
plhs[15] = mxCreateDoubleScalar(n_pp);

return;
}
%=============================================================
```
 0
shriram
9/7/2010 3:17:05 AM
comp.soft-sys.matlab 211264 articles. 25 followers. lunamoonmoon (257) is leader.

18 Replies
659 Views

Similar Articles

[PageSpeed] 37

```On 7 Sep, 05:17, "shriram srinivasan" <shrir...@tamu.edu> wrote:
> Hi all,
> I recently started trying to write MEX code for my bottlenecks in my FEM =
program.
> For linear algebra operations like kronecker products, I saw good results=
..
> However, when I tried to =A0convert my assembly procedure into MEX code -=
>
> I found it fared worse. I gave that up and after some research online, I =
found that using the sparse matrix properties to perform the assembly proce=
ss is much faster.
> However, when I tried to =A0write MEX =A0code for this, again it proved w=
ay slower ( mex code attached ).--http://blogs.mathworks.com/loren/2007/03/=
01/creating-sparse-finite-el...
>
> I venture to guess-
> 1. If I have 19 inputs and 16 outputs as I do here, and they are large ar=
rays, does MEX code become pointless ?

No. But you need to know exactly what you are doing to squeeze
any hint of performance out of the C compiler.

> 2. Is it that =A0the interfacing itself =A0has high overhead for such cas=
es ?

No. But you need to know exactly what you are doing to squeeze
any hint of performance out of the C compiler.

> If so, I would like to know so that I dont waste time on MEX henceforth.

Instead of attempting stuff you don't master, spend some time
learning C or C++. The code you posted might be sped up a factor
2, maybe 3 or more, by some simple syntax changes. You would
also need to know how to configure the compiler to produce very
fast executables.

Rune
```
 0
Rune
9/7/2010 4:23:51 AM
```Rune Allnor <allnor@tele.ntnu.no> wrote in message <c3d47692-2d6d-4ad1-89a9-e9b833034924@g17g2000yqe.googlegroups.com>...
> On 7 Sep, 05:17, "shriram srinivasan" <shrir...@tamu.edu> wrote:
> > I venture to guess-
> > 1. If I have 19 inputs and 16 outputs as I do here, and they are large arrays, does MEX code become pointless ?
>
> No. But you need to know exactly what you are doing to squeeze
> any hint of performance out of the C compiler.

Also, you are writing in the input arrays -- which is a big no-no in MATLAB -- and then make a copy. Try making a copy first, and then get the pointers to the output arrays and write in them.

Cris.
```
 0
9/7/2010 8:21:07 AM
```"Cris Luengo" <cris.luengo@google.for.my.name.to.contact.me> wrote in message <i64slj\$h08\$1@fred.mathworks.com>...
> > On 7 Sep, 05:17, "shriram srinivasan" <shrir...@tamu.edu> wrote:
> > > I venture to guess-
> > > 1. If I have 19 inputs and 16 outputs as I do here, and they are large arrays, does MEX code become pointless ?
> >
> > No. But you need to know exactly what you are doing to squeeze
> > any hint of performance out of the C compiler.
>
> Also, you are writing in the input arrays -- which is a big no-no in MATLAB -- and then make a copy. Try making a copy first, and then get the pointers to the output arrays and write in them.
>
> Cris.

A "big non-no"?

If the OP is going to bend the rules and modify their inputs, the subsequent copies are a (literal) waste of time.
```
 0
Steve
9/7/2010 8:34:05 AM
```"shriram srinivasan" <shrirams@tamu.edu> wrote in message <i64arh\$c6q\$1@fred.mathworks.com>...
>
>      plhs[0] = mxDuplicateArray(prhs[3]);
>      plhs[1] = mxDuplicateArray(prhs[4]);
>      plhs[2] = mxDuplicateArray(prhs[5]);
>      plhs[3] = mxDuplicateArray(prhs[6]);
>      plhs[4] = mxDuplicateArray(prhs[7]);
>      plhs[5] = mxDuplicateArray(prhs[8]);
>      plhs[6] = mxDuplicateArray(prhs[9]);
>      plhs[7] = mxDuplicateArray(prhs[10]);
>      plhs[8] = mxDuplicateArray(prhs[11]);
>      plhs[9] = mxDuplicateArray(prhs[12]);
>      plhs[10] = mxDuplicateArray(prhs[13]);
>      plhs[11] = mxDuplicateArray(prhs[14]);

This could be a tremendous waste of time if prhs[3] - prhs[14] are large arrays. If you are just going to return them back to MATLAB with no changes you should be using the undocumented mxCreateSharedDataCopy instead, which will avoid the data copies that you are making above. Just add this prototype at the top of your code:

mxArray *mxCreateSharedDataCopy(const mxArray *pr);

James Tursa
```
 0
James
9/7/2010 1:53:07 PM
```"shriram srinivasan" <shrirams@tamu.edu> wrote in message <i64arh\$c6q\$1@fred.mathworks.com>...
> Hi all,
> I recently started trying to write MEX code for my bottlenecks in my FEM program.
> For linear algebra operations like kronecker products, I saw good results.
> However, when I tried to  convert my assembly procedure into MEX code -- http://www.mathworks.com/matlabcentral/newsreader/view_thread/290656
>
> I found it fared worse. I gave that up and after some research online, I found that using the sparse matrix properties to perform the assembly process is much faster.
> However, when I tried to  write MEX  code for this, again it proved way slower ( mex code attached ).--
> http://blogs.mathworks.com/loren/2007/03/01/creating-sparse-finite-element-matrices-in-matlab/
>
> I venture to guess-
> 1. If I have 19 inputs and 16 outputs as I do here, and they are large arrays, does MEX code become pointless ?
> 2. Is it that  the interfacing itself  has high overhead for such cases ?
> If so, I would like to know so that I dont waste time on MEX henceforth.
>  %=======================================================
> #include "mex.h"
> #include "math.h"
>
>  void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
> {
>   double *kmat, *r_vec, *c_vec ;
>   double *I_ff, *I_fp, *I_pf, *I_pp, *J_ff, *J_fp, *J_pf, *J_pp;
>   double  *X_ff, *X_fp, *X_pf, *X_pp;
>   int  n_ff,n_fp,n_pf,n_pp ;
>
>   int size_r_vec, size_c_vec ;
>
>     /*  create a pointer to the input vector r */
>      kmat = mxGetPr(prhs[0]);
>     r_vec = mxGetPr(prhs[1]);
>     c_vec = mxGetPr(prhs[2]);
>      I_ff = mxGetPr(prhs[3]);
>      I_fp = mxGetPr(prhs[4]);
>      I_pf = mxGetPr(prhs[5]);
>      I_pp = mxGetPr(prhs[6]);
>      J_ff = mxGetPr(prhs[7]);
>      J_fp = mxGetPr(prhs[8]);
>      J_pf = mxGetPr(prhs[9]);
>      J_pp = mxGetPr(prhs[10]);
>      X_ff = mxGetPr(prhs[11]);
>      X_fp = mxGetPr(prhs[12]);
>      X_pf = mxGetPr(prhs[13]);
>      X_pp = mxGetPr(prhs[14]);
>
>      /* Extract n_ff, n_fp etc from input */
>      n_ff = (int)mxGetScalar(prhs[15]);
>      n_fp = (int)mxGetScalar(prhs[16]);
>      n_pf = (int)mxGetScalar(prhs[17]);
>      n_pp = (int)mxGetScalar(prhs[18]);
>
>     /*  get the dimensions of the matrix input y */
>
>     size_r_vec = mxGetM(prhs[1]);
>     size_c_vec = mxGetM(prhs[2]);
>
>      int i, j ;
>      int r_value, c_value ;
>
>     for(i = 0; i  0)
>         { n_ff = n_ff +  1;
>          I_ff[n_ff -1] = r_value;
>          J_ff[n_ff -1] = c_value;
>          X_ff[n_ff -1] = kmat[i + size_r_vec*j];
>         }
>         else if(r_value> 0  && c_value < 0)
>         { n_fp  = n_fp + 1;
>           I_fp[n_fp -1] = r_value;
>           J_fp[n_fp -1] = abs(c_value);
>           X_fp[n_fp -1] = kmat[i + size_r_vec*j];
>         }
>         else if(r_value 0)
>         { n_pf = n_pf + 1;
>           I_pf[n_pf -1] = abs(r_value);
>           J_pf[n_pf -1] = (c_value);
>           X_pf[n_pf -1] = kmat[i + size_r_vec*j];
>         }
>         else if(r_value< 0  && c_value < 0)
>         {n_pp  = n_pp + 1;
>          I_pp[n_pp-1]= abs(r_value);
>          J_pp[n_pp -1] = abs(c_value);
>          X_pp[n_pp -1] = kmat[i + size_r_vec*j];
>         }
>     }
>     }
>
>      /* Assign to output */
>
>      plhs[0] = mxDuplicateArray(prhs[3]);
>      plhs[1] = mxDuplicateArray(prhs[4]);
>      plhs[2] = mxDuplicateArray(prhs[5]);
>      plhs[3] = mxDuplicateArray(prhs[6]);
>      plhs[4] = mxDuplicateArray(prhs[7]);
>      plhs[5] = mxDuplicateArray(prhs[8]);
>      plhs[6] = mxDuplicateArray(prhs[9]);
>      plhs[7] = mxDuplicateArray(prhs[10]);
>      plhs[8] = mxDuplicateArray(prhs[11]);
>      plhs[9] = mxDuplicateArray(prhs[12]);
>      plhs[10] = mxDuplicateArray(prhs[13]);
>      plhs[11] = mxDuplicateArray(prhs[14]);
>
>      plhs[12] = mxCreateDoubleScalar(n_ff);
>      plhs[13] = mxCreateDoubleScalar(n_fp);
>      plhs[14] = mxCreateDoubleScalar(n_pf);
>      plhs[15] = mxCreateDoubleScalar(n_pp);
>
>     return;
> }
> %=============================================================

Now I see what you're doing... you are making the inner part of some loop into a MEX file. Input arguments 4 through 15 are being updated, as are the scalar integer inputs 16 through 19. The actual operation you do on these arrays is very straight-forward and rather simple. In comparison, the work you do in copying the input to the output is HUGE! No wonder the end result is slower... :)

You should consider making the whole loop into a MEX-file. Try to have some data as input, create empty arrays for the output, and fill them with the results of your calculations. Then you will gain speed.

Cris.
```
 0
Cris
9/7/2010 2:21:35 PM
```Many thanks to   Rune, Cris, Steve and James for your observations.

@James:
If I use mxCreateSharedDataCopy, does that mean that in the code I have, everything remains the same except that I dont need the mxDuplicateArray at the end ?
Secondly, how do I avoid modifying the inputs as Steve mentioned .

@Cris:
Your understanding is quite correct. There is a loop over all the elements of my FEM mesh, and the contribution of each  element is assembled using this function.
I cannot extend this to cover the outer loop over the elements because  that would involve essentially rewriting each function I use in my FEM program.
So I am only looking at ways to avoid copying the input to the output in this code.
Any suggestions on that front would be most helpful
```
 0
shriram
9/7/2010 3:32:07 PM
```"shriram srinivasan" <shrirams@tamu.edu> wrote in message <i65ltn\$huu\$1@fred.mathworks.com>...
>
> If I use mxCreateSharedDataCopy, does that mean that in the code I have, everything remains the same except that I dont need the mxDuplicateArray at the end ?

Just replace all the calls that you showed of mxDuplicateArray with mxCreateSharedDataCopy. (This begs the question as to why you do this in the first place, however. Why not do this at the m-code level?)

> Secondly, how do I avoid modifying the inputs as Steve mentioned .

I would suggest posting the m-code that you are trying to speed up. That way we can make a more accurate assessment of how to proceed, and what the best use of a mex file vs m-code would be. Generally, MATLAB is not very good at array slicing inside a loop since that incurs a data copy each time. This can often be avoided in a mex routine.

James Tursa
```
 0
James
9/7/2010 5:32:10 PM
```Hi James,
As per your suggestion, I used the shared copy command. This makes it much better, but the original matlab function is still faster. As per your suggestion, here is my code  (for assembling local matrices into a global matrix)
The outputs are K_ff, K_fp, K_pf, K_pp.
%====================================
[K_ff, K_fp, K_pf, K_pp] = Global_matrix( Nele, Connectivity, etc)

% Initialize
I_ff = zeros(Nele*(size(Connectivity,2)*dofs_per_node)^2,1);
J_ff = zeros(Nele*(size(Connectivity,2)*dofs_per_node)^2,1);
I_fp = zeros(Nele*(size(Connectivity,2)*dofs_per_node)^2,1);
J_fp = zeros(Nele*(size(Connectivity,2)*dofs_per_node)^2,1);
I_pf = zeros(Nele*(size(Connectivity,2)*dofs_per_node)^2,1);
J_pf = zeros(Nele*(size(Connectivity,2)*dofs_per_node)^2,1);
I_pp = zeros(Nele*(size(Connectivity,2)*dofs_per_node)^2,1);
J_pp = zeros(Nele*(size(Connectivity,2)*dofs_per_node)^2,1);
X_ff = zeros(Nele*(size(Connectivity,2)*dofs_per_node)^2,1);
X_fp = zeros(Nele*(size(Connectivity,2)*dofs_per_node)^2,1);
X_pf = zeros(Nele*(size(Connectivity,2)*dofs_per_node)^2,1);
X_pp =zeros(Nele*(size(Connectivity,2)*dofs_per_node)^2,1) ;
n_ff =0;
n_fp =0;
n_pf =0;
n_pp =0;

% Loops over each element
for ele = 1: Nele

% Calculate EleNodes for give element

EleNodes = Connectivity(ele,:);

% Calculate local stiffness matrix and load vector

u_vec = GlobalId(EleNodes(:),1:dofs_per_node)';
u_vec = u_vec(:) ;

%-----------------------------------------------------------------------------------
%  This is the function which am trying to convert to MEX code
%-----------------------------------------------------------------------------------
[I_ff,I_fp,I_pf,I_pp,J_ff,J_fp,J_pf,J_pp,X_ff,X_fp,X_pf,X_pp,...
n_ff,n_fp,n_pf,n_pp] = sparse_data_local_matrix(kmat,u_vec,u_vec,...
I_ff,I_fp,I_pf,I_pp,J_ff,J_fp,J_pf,J_pp,...
X_ff,X_fp,X_pf,X_pp,n_ff,n_fp,n_pf,n_pp);

end

% Assembly
K_ff = sparse(I_ff(1:n_ff),J_ff(1:n_ff),X_ff(1:n_ff),NEqns,NEqns) ;
K_fp = sparse(I_fp(1:n_fp),J_fp(1:n_fp),X_fp(1:n_fp),NEqns,NCons) ;
K_pf = sparse(I_pf(1:n_pf),J_pf(1:n_pf),X_pf(1:n_pf),NCons,NEqns) ;
K_pp = sparse(I_pp(1:n_pp),J_pp(1:n_pp),X_pp(1:n_pp),NCons,NCons) ;

% Clear memory
clear I_ff I_fp I_pf I_pp J_ff J_fp J_pf J_pp X_ff X_fp X_pf X_pp;
clear n_ff n_fp n_pf n_pp ;

%=================================================================
So now perhaps you can give me a better idea about whether it is worthwhile to try to write a MEX for the function at all.
```
 0
shrirams (2)
9/7/2010 6:24:08 PM
```Forgot to include the  mcode for the function:

%=============================================
[I_ff,I_fp,I_pf,I_pp,J_ff,J_fp,J_pf,J_pp,X_ff,X_fp,X_pf,X_pp,...
n_ff,n_fp,n_pf,n_pp] = sparse_data_local_matrix(kmat,r_vec,c_vec,...
I_ff,I_fp,I_pf,I_pp,J_ff,J_fp,J_pf,J_pp,...
X_ff,X_fp,X_pf,X_pp,n_ff,n_fp,n_pf,n_pp)

for i = 1: length(r_vec)
for j = 1: length(c_vec)

if (r_vec(i) > 0  && c_vec(j) > 0 )
n_ff       = n_ff + 1;
I_ff(n_ff) = r_vec(i);
J_ff(n_ff) = c_vec(j);
X_ff(n_ff) = kmat(i,j);

elseif (r_vec(i) > 0  && c_vec(j) < 0 )
n_fp       = n_fp + 1;
I_fp(n_fp) = r_vec(i);
J_fp(n_fp) = abs(c_vec(j));
X_fp(n_fp) = kmat(i,j);

elseif (r_vec(i) < 0  && c_vec(j) > 0 )
n_pf       = n_pf + 1;
I_pf(n_pf) = abs(r_vec(i));
J_pf(n_pf) = c_vec(j);
X_pf(n_pf) = kmat(i,j);

else
n_pp       = n_pp + 1;
I_pp(n_pp) = abs(r_vec(i));
J_pp(n_pp) = abs(c_vec(j));
X_pp(n_pp) = kmat(i,j);

end

end
end
%===============================================================
```
 0
shriram
9/7/2010 6:27:19 PM
```*bounce*
```
 0
shriram
9/8/2010 5:41:05 AM
```"shriram srinivasan" <shrirams@tamu.edu> wrote in message <i65ltn\$huu\$1@fred.mathworks.com>...
> Many thanks to   Rune, Cris, Steve and James for your observations.
>
> @James:
> If I use mxCreateSharedDataCopy, does that mean that in the code I have, everything remains the same except that I dont need the mxDuplicateArray at the end ?
> Secondly, how do I avoid modifying the inputs as Steve mentioned .
>
> @Cris:
> Your understanding is quite correct. There is a loop over all the elements of my FEM mesh, and the contribution of each  element is assembled using this function.
> I cannot extend this to cover the outer loop over the elements because  that would involve essentially rewriting each function I use in my FEM program.
> So I am only looking at ways to avoid copying the input to the output in this code.
> Any suggestions on that front would be most helpful

In a MEX-file, you are not supposed to modify the input arguments. In your case, this means making a copy of the input, then modifying the copies, and pass these copies as output arguments. But this means that your MEX-file might be slower than an M-file that doesn't need to do the copying (MATLAB nowadays has some inplace operations, very nice!).

If you want to avoid the copying of matrices in your MEX-file, you could continue modifying the input, as you are doing. Simply remove the copying of the matrices at the end of your MEX-file. You'd call it with
[counter1,counter2,...] = mexfunction(array1,array2,...,coutner1,counter2,...);
Note that you'd be breaking MATLAB "rules". But it should work as long as you don't keep additional copies of your arrays.

For example, if you do
b = a;
mexfunction(a); % this one modifies its input
then "b" will also be modified. "b" contains only a pointer to the data of "a". When you use MATLAB commands to modify "b", a full copy is made of the data. But your MEX-file is not aware of how many variables use the data in the input array. This is why you're not supposed to modify it.

But in your case you have everything under control. Just make sure that all the input arrays are independent, not copies of each other! ...and don't tell anybody at The MathWorks what you're doing! :)

Another option would be to create the output arrays, and collect them all in a cell array:
[res1{ii},res2{ii},...] = mexfunction(data)
At the end of the loop you can concatenate the results into one big array:
res1 = [res1{:}];
res2 = [res2{:}];
...
This option is a little bit more involved, but a lot more correct. :)

Cheers,
Cris.
```
 0
Cris
9/8/2010 2:09:04 PM
```"Cris Luengo" <cris.luengo@google.for.my.name.to.contact.me> wrote in message <i685e0\$41l\$1@fred.mathworks.com>...
> "shriram srinivasan" <shrirams@tamu.edu> wrote in message <i65ltn\$huu\$1@fred.mathworks.com>...
> > Many thanks to   Rune, Cris, Steve and James for your observations.
> >
> > @James:
> > If I use mxCreateSharedDataCopy, does that mean that in the code I have, everything remains the same except that I dont need the mxDuplicateArray at the end ?
> > Secondly, how do I avoid modifying the inputs as Steve mentioned .
> >
> > @Cris:
> > Your understanding is quite correct. There is a loop over all the elements of my FEM mesh, and the contribution of each  element is assembled using this function.
> > I cannot extend this to cover the outer loop over the elements because  that would involve essentially rewriting each function I use in my FEM program.
> > So I am only looking at ways to avoid copying the input to the output in this code.
> > Any suggestions on that front would be most helpful

> In a MEX-file, you are not supposed to modify the input arguments.

Hi,
This is absolutely NOT TRUE. It is the programmers choice to do it or not.
I do it, for example, when I need to multiply a large array by a scalar and don't need the pre-multiplication version of the variable. Following that "advice" one needs at certain time to have TWO copies of the array with all that implies to memory usage, whilst if we do inplace modifications there is no extra memory need.

> ... (MATLAB nowadays has some inplace operations, very nice!).

Hmm, but TMW can!
```
 0
Budias
9/8/2010 2:32:13 PM
```"Budias Aao" <budiao@hotmail.com> wrote in message <i686pd\$7i9\$1@fred.mathworks.com>...
> "Cris Luengo" <cris.luengo@google.for.my.name.to.contact.me> wrote in message <i685e0\$41l\$1@fred.mathworks.com>...
> > "shriram srinivasan" <shrirams@tamu.edu> wrote in message <i65ltn\$huu\$1@fred.mathworks.com>...
> > > Many thanks to   Rune, Cris, Steve and James for your observations.
> > >
> > > @James:
> > > If I use mxCreateSharedDataCopy, does that mean that in the code I have, everything remains the same except that I dont need the mxDuplicateArray at the end ?
> > > Secondly, how do I avoid modifying the inputs as Steve mentioned .
> > >
> > > @Cris:
> > > Your understanding is quite correct. There is a loop over all the elements of my FEM mesh, and the contribution of each  element is assembled using this function.
> > > I cannot extend this to cover the outer loop over the elements because  that would involve essentially rewriting each function I use in my FEM program.
> > > So I am only looking at ways to avoid copying the input to the output in this code.
> > > Any suggestions on that front would be most helpful
>
> > In a MEX-file, you are not supposed to modify the input arguments.
>
> Hi,
> This is absolutely NOT TRUE. It is the programmers choice to do it or not.
> I do it, for example, when I need to multiply a large array by a scalar and don't need the pre-multiplication version of the variable. Following that "advice" one needs at certain time to have TWO copies of the array with all that implies to memory usage, whilst if we do inplace modifications there is no extra memory need.

You CAN do it, but you SHOULDN'T. I don't know if you noticed, but mexFunction() defines prhs as const. The manual reads: "Do not modify any prhs values in your MEX-file. Changing the data in these read-only mxArrays can produce undesired side effects." ( http://www.mathworks.com/help/techdoc/apiref/mexfunction.html )

> > ... (MATLAB nowadays has some inplace operations, very nice!).
>
> Hmm, but TMW can!

Yes, they can, because they can check to see how many references there are to the data, and make the deep copy if necessary. In-place operations only work when there's only one array pointing at that data.

Try this experiment with your MEX-file, the one that multiplies the large array by a scalar:
data = % your large array
copy = data;
youmexfile(data,3); % this is the one that multiplies data by 3, changing data
any(data~=copy)

This is the "undesired side effect" that the manual warns about.

Of course, if you are aware of this, and control what goes in to your MEX-file, you can avoid this side effect. But I recommend that you don't. Also, considering MATLAB now does some things in-place, you might see that
data = data*3;
is actually faster than your MEX-file. MATLAB parallelizes this sort of operations, and I'm assuming your MEX-file does not.

Cheers,
Cris.
```
 0
9/8/2010 3:55:24 PM
```This MEX coding seems to be a waste of time. A quick look of the matlab file seems that OP should try to first vectorize the double-for loops (which already faster than the MEX), which could improve substantially the speed.

Bruno
```
 0
Bruno
9/8/2010 4:44:06 PM
```"Cris Luengo" <cris.luengo@google.for.my.name.to.contact.me> wrote in message <i68blc\$6cn\$1@fred.mathworks.com>...
> "Budias Aao" <budiao@hotmail.com> wrote in message <i686pd\$7i9\$1@fred.mathworks.com>...
> > "Cris Luengo" <cris.luengo@google.for.my.name.to.contact.me> wrote in message <i685e0\$41l\$1@fred.mathworks.com>...
> > > "shriram srinivasan" <shrirams@tamu.edu> wrote in message <i65ltn\$huu\$1@fred.mathworks.com>...
> > > > Many thanks to   Rune, Cris, Steve and James for your observations.
> > > >
> > > > @James:
> > > > If I use mxCreateSharedDataCopy, does that mean that in the code I have, everything remains the same except that I dont need the mxDuplicateArray at the end ?
> > > > Secondly, how do I avoid modifying the inputs as Steve mentioned .
> > > >
> > > > @Cris:
> > > > Your understanding is quite correct. There is a loop over all the elements of my FEM mesh, and the contribution of each  element is assembled using this function.
> > > > I cannot extend this to cover the outer loop over the elements because  that would involve essentially rewriting each function I use in my FEM program.
> > > > So I am only looking at ways to avoid copying the input to the output in this code.
> > > > Any suggestions on that front would be most helpful
> >
> > > In a MEX-file, you are not supposed to modify the input arguments.
> >
> > Hi,
> > This is absolutely NOT TRUE. It is the programmers choice to do it or not.
> > I do it, for example, when I need to multiply a large array by a scalar and don't need the pre-multiplication version of the variable. Following that "advice" one needs at certain time to have TWO copies of the array with all that implies to memory usage, whilst if we do inplace modifications there is no extra memory need.
>
> You CAN do it, but you SHOULDN'T. I don't know if you noticed, but mexFunction() defines prhs as const. The manual reads: "Do not modify any prhs values in your MEX-file. Changing the data in these read-only mxArrays can produce undesired side effects." ( http://www.mathworks.com/help/techdoc/apiref/mexfunction.html )
>
> > > ... (MATLAB nowadays has some inplace operations, very nice!).
> >
> > Hmm, but TMW can!
>
> Yes, they can, because they can check to see how many references there are to the data, and make the deep copy if necessary. In-place operations only work when there's only one array pointing at that data.
>
> Try this experiment with your MEX-file, the one that multiplies the large array by a scalar:
>    data = % your large array
>    copy = data;
>    youmexfile(data,3); % this is the one that multiplies data by 3, changing data
>    any(data~=copy)
>
> This is the "undesired side effect" that the manual warns about.

I guess that is where we disagree. I don't see it as "undesired side effect" but simply how it has to work. When we change in-place we are changing the contents of the pointer referenced in prhs so naturally "data" and "copy" are still equal after the  " youmexfile(data,3)" operation.

> Of course, if you are aware of this, and control what goes in to your MEX-file, you can avoid this side effect. But I recommend that you don't. Also, considering MATLAB now does some things in-place, you might see that
>    data = data*3;
> is actually faster than your MEX-file. MATLAB parallelizes this sort of operations, and I'm assuming your MEX-file does not.

Maybe is faster, maybe not (for example for slightly older ML versions). On the other hand the difference may boil down to be or not be able to run a program due to memory constraints. And Matlab is very good in consuming large amounts of memory.

chears
Budias
```
 0
budiao (21)
9/8/2010 5:07:20 PM
```"Cris Luengo" <cris.luengo@google.for.my.name.to.contact.me> wrote in message <i68blc\$6cn\$1@fred.mathworks.com>...
>
> Yes, they can, because they can check to see how many references there are to the data, and make the deep copy if necessary.

FYI, you can also make this check in a mex file if desired and have the same behavior as MATLAB for shared variables. I think the undocumented function is mxUnshareArray.

> MATLAB now does some things in-place, you might see that
>    data = data*3;
> is actually faster than your MEX-file. MATLAB parallelizes this sort of operations

Based on my experiences with this, it seems to be highly dependent on MATLAB version and whether or not the underlying BLAS routines used (if any) are multi-threaded. e.g., for (real scalar)*(real array) MATLAB is tough to beat, but for (complex scalar)*(complex array) you can sometimes beat MATLAB with simple loops in a mex routine because you can optimize memory access better than MATLAB does.

James Tursa
```
 0
James
9/8/2010 5:57:20 PM
```Thanks to everyone for  providing your perspective. I have a long way to go before understanding the details of all the technical stuff, but  I have given up this for now.  In this particular instance, it hasn't been worth the effort to write MEX code.
If someone does come up with a suggestion that can be faster than my current mcode, I shall then look into it.
```
 0
shriram
9/9/2010 5:28:03 AM
```"shriram srinivasan" <shrirams@tamu.edu> wrote in message <i69r93\$553\$1@fred.mathworks.com>...

> If someone does come up with a suggestion that can be faster than my current mcode, I shall then look into it.

I wrote earlier that you might try to vectorize - oops sorry call it rather "array-program" convert  [sic] - your code instead of running with double for-loops. Use techniques such as logical indexing, or find() function or accumarray() or whatever you know to carry out the calculation.

Bruno
```
 0
Bruno
9/9/2010 6:53:05 AM