recipes : programming : Writing better code : Loading and saving MAT files is slow

Problem

I have to do a lot of loading and saving of MAT files. It's slow, though. What can I do about it?

SolutionFrom MATLAB version 7.0 onwards (type "version" if you're unsure what you're running), MAT files are stored compressed by default. Compression takes time, so if you can get away without it things will work a lot faster. Saving with the "-v6" flag causes the files to not be compressed. Here is an example of the speed increases you can expect:

>> R=randn(5000); %Make a big array >> tic, save R R, toc %Save it as normal (compressed) Elapsed time is 6.687678 seconds. %This file occupies 184M of disk space. >> tic, load R, toc %Loading is faster than saving Elapsed time is 1.458939 seconds. % * Now we load and save the uncompresssed version * >> tic, save -v6 R R, toc %Much faster! Elapsed time is 0.815954 seconds. %The file is not much bigger (191M), since we have stored random data >> tic, load R, toc %Loading is a little faster too Elapsed time is 0.166742 seconds.

There are some things to note about this example. Firstly, we've used random data, which takes a while to compress and doesn't compress well. If you repeat the example with R being a matrix of ones, you will get a very different result (try it!). Secondly, although saving is what takes the most time, there are cases where *loading* compressed data can be faster. So you might want to check what works best for you code if you suspect this may become an issue. Thirdly, we're using just a simple array in the above example. More complicated data types, such as cell arrays or structures, will take longer to compress.

Another thing that can help is not using double-precision if you don't need it. By default all MATLAB arrays are constructed as doubles, which provide an accuracy of 16 decimal digits. This may be more than you need, particulary if you're not performing many recursive operations on the data. If this is the case, you can speed up your code and reduce your file sizes by using single-precision (7 decimal digits). Here's the improvement you would see:

>> R=rand(5000); %Make some random data >> class(R) %See? It's a double by default ans = double >> tic, save R R, toc %This is how long it takes to save as double Elapsed time is 6.670131 seconds. >> R=single(R); %Convert to single-precision >> tic, save R R, toc %Saves in half the time Elapsed time is 3.557559 seconds.

In the above case, using single-precision halves the saving time and also the file size. There is, of course, a reason why MATLAB uses doubles by default. Rounding errors may creep in during iterative operations if you're using single-precision. This is less likely if you're using double-precision. Also, some functions (e.g. pcolor) don't work with single precision data. So be careful when you use singles and remember to convert single to double (with the double command) if a function complains. You may also want to look into the integer data types and sparse arrays.

In the previous examples we are saving just one variable to a MAT file. A MAT file can, however, contain many variables. You could potentially save time by loading only the variable that you want from an existing MAT file:

%Make a large MAT file with several variables >> R1=rand(3000); >> R2=rand(3000); >> R3=rand(3000); save R R1 R2 R3 %Loading everything at once: >> tic, load R, toc Elapsed time is 0.214937 seconds. %Loading only R1 happens in one third of the time (for obvious reasons) >> tic, load R R1, toc Elapsed time is 0.074641 seconds.

In many real-world examples, compressed MAT files will take up a lot less space. Furthermore, you often need to load and save only rarely. Thus, in mosts instances you will be best served by using the default options and compressing all your data. However, if you have a complex analysis pipeline that involves repeatedly loading data from a MAT file, modifying it, then re-saving it. If you may considerably speed up your code by compressing only the *last* time you save.