As a low level research assistant in a lab, I have recently been assigned the task of translating a chunk of MATLAB code to C. When I contacted my brother for guidance, he looked at me blankly: “Why? Who does that?” I never thought code translation was a thing. Why don’t people just write their programs in their desired language from the beginning? Why must they come up with new ways to torture these poor research assistants? Warning: this post contains some nerdy information that you shouldn’t be concerning yourself with unless you unfortunately have to.
Code translation, especially MATLAB to C/C++, turns out to be a quite common dilemma. Google trends show that the search term “matlab to c” is half as popular as “complexity theory” or “RSA encryption”, and about twice as popular as “effective altruism”.
MATLAB is wildly popular in academia. While I can’t find a concrete number of current MATLAB users, various clues made me believe that it can be anywhere between 3 million to 20 million. In 1996, a research at Cornell showed that MATLAB had 300,000 users and it had doubled each year since 1978. In 2004, that number became 1 million. Researchers prefer to write their programs in MATLAB because MATLAB is fast to write and easy to use, suitable for when you need to see instant results of your work. However, MATLAB takes up a lot of resources to run. It also doesn’t work on most embedded systems. Therefore, you have to translate your MATLAB code into C/C++ to make it more efficient and versatile. EETimes claimed that “most signal processing and communication projects nowadays at some point require translating MATLAB code into equivalent C code“.
However, this conversion is a frustrating and time-consuming process, given that C and MATLAB are two languages as different as they get. Some differences are quite easy to fix, e.g.:
– C is zero-index, which means its array index starts at 0 while MATLAB is one-index. You can fix this by subtracting 1 from MATLAB’s index.
– MATLAB supports vector-based operations, which means you can do a lot of calculations for vector, array, matrix with just one operator. When translating to C, you have to replace each of these operators with a for loop.
Other differences are just plain painful and require strategic planning before starting to code.
– Data types simply don’t exist in MATLAB. In MATLAB, you can use a variable without declaring it. You can use the same variable to store different data types. It’s a nightmare when the variable you thought to be “int” turns out to be “array” and you have to go back to change everything accordingly.
– MATLAB supports polymorphism while C doesn’t. In MATLAB, you can write a generic function that takes in different data types for input parameters. In C, you have to declare exactly the data type for each parameter. For example, in MATLAB you can write the same function to compare two integers, two doubles or two characters. In C, you would have to write a function to compare to integers, then another function to compare two doubles and so on. In practice, you can still write generic functions in C by instead of passing in parameters of a specific type, you declare that parameters to be of type void *. For example, this is a generic function to swap two inputs of unknown type.
void gswap(void *px, void *py, size_t sz)
memcpy(tmp, px, sz);
memcpy(px, py, sz);
memcpy(py, tmp, sz);
When you call this function, you pass in pointers to the two variables you want to swap.
int a = 10, b = 20;
gswap(&a, &b, sizeof(int));
– In MATLAB, you can just use an array/matrix without thinking about its dimensions or the number of elements in there. It allows users to freely add/remove elements or even dimensions. However, in C, dimensions and the number of elements have to be known so that corresponding storage can be allocated. I haven’t found any effective way to deal with this yet, except use a lot of malloc/realloc/calloc/free and get frustrated when an array changes size.
– MATLAB has a huge computation library that doesn’t have an equivalent in C. People casually use them in MATLAB without even thinking twice about them, and when you run into them, you have no idea how to translate them into C. Of course, you can read the MATLAB documentation to have an idea of what they do, but MATLAB won’t tell you how they do it, like what steps they take to get there. For complicated functions, you’d kind of have to reverse-engineer with trials and errors to duplicate results.
Googling “MATLAB to C code”, the first result to show up is “MATLAB to C Made Easy – No More Manual Translation”, with a link to MATLAB Coder by Mathworks. That’s a lie. Even with MATLAB Coder, there will still be a lot of manual job sinvolved. Using MATLAB Coder for MATLAB and C is like translating from Chinese to English using Google translate. It can handle the simple differences mentioned above quite well. However, when encountering painful differences, MATLAB Coder’s translation is even more painful to look at. C code generated by MATLAB Coder is difficult to read and heartbreakingly inefficient. In fact, after trying MATLAB Coder for a few lines of code, I decided it’d be much faster to translate by hand. But it could be just me and my rigidity.
In the meantime, if you are writing MATLAB code with the idea that someone else will translate it into C later on, there are a few things you can do to make that person hate you less.
– When you start using a variable, make a comment about how that variable is going to be used: is it a double, integer. Don’t use the same variable to store different data types.
– For arrays and matrices, always state their sizes, dimensions. Will they be of fixed dimension or its dimension will depend on something else?
– Write your code in blocks, each block is a function. Comment about the input and output of each function.
That’s about it for now.