Is R simply faster on Linux? (Part 2)

Is R simply faster on Linux? (Part 2)
Photo by Joshua Reddekopp / Unsplash

Apologies for the delay in reporting the findings. One reason there was a delay was due to the lack of any useful findings. Like I mentioned in the previous post, I installed linux (for those who are guessing, Linux Mint) on my work machine on Tuesday. After installing the necessary dependancies, R, and R Studio, I quickly ran the same RMD file I used on Windows in Linux. Remember, this is the same computer. By that, I mean, the only change in the machine was in the operating system of choice. The other peripherals where the exact same thing.

Therefore, I was surprised when the computer held on for a little longer before the program aborted. I was a little upset when this happened. I had been pinning my hopes on completing the coding part and getting the results so that I could begin modifying the manuscript accordingly. It seemed like that had to wait.

In many situations, it may be critical for individuals to take a step back from time to time and view the project in totality. Yesterday afternoon, I experienced one such situation. For explain the beauty of the situation, I will have to explain a little bit about what exactly I was trying to accomplish with R. Basically, I am reworking on my thesis code to understand what happens when you add one type of retail channel in markets served by other (incumbent) retail channels. One would be correct in pointing out that new retail channels are not added simply due to chance (i.e. one does not flip a coin or roll a die and decide where to start a new store or site). Therefore, to remove some of the endogeny concerns, I was to make use of matching  (specifically, coerced exact matching. You can learn more  about it here). and then employ the use of the new Difference-in-Differences with Multiple Time Periods technique that was recently proposed by Callaway and Sant’Anna. You may find it interesting to note that matching and difference in differences make up an interesting situation here. Specifically, before performing the actual difference in differences, I could (1) balance the panel and then match or (2) match and then balance the remnant. This presents an interesting f(g(x)) = g(f(x)) type situation.

Armed with this fresh (so obvious) insight, I quickly rewrote the code to do the matching first and then the diff in diff. On the plus side, the dataset was now vastly reduced (to about a third, yes, I had to throw away two thirds of my data for getting comparable control and treatment groups) but on the minus, the computer still could not handle it, both windows and linux. After facing dejection (albeit reduced) once again, I wrote an email to our IT team requesting for some more RAM to complete this task. Hopefully, things should get resolved soon. We shall see, and we shall overcome.

Have a great morning friends!!

P.S. For those wondering (like I did yesterday), when f(g(x)) = g(f(x)), they are called composite functions.