I need some directions with the problem MATRMUL0 .
I am wondering whether I need to write optimized routine for matrix multiplication or find a way to break how the Matrix A and B are generated and inturn use it to generate Matrix C.
Any help would be highly appreciated.
For MATRMUL0, you don’t need to write a fast matrix multiplication program.
However, It should be cache-friendly.
Thanks You for your guidance @min_25
Does the same trick also apply to Matrmul