I've been trying to profile my app with JetBrains dottrace performance. The problem is when I profile it with thread cycle time, CacheObliviousMatrixMultiply takes forever. Simple log4net tracing proves that matrix multiplication is not the bottleneck for me.
I am using 2.6.2. My largest multiplication is a 55x50 matrix against a 50x10000 matrix. My matrices are Double.DenseMatrix and Matrix<double>.
How would you recommend profiling an app that uses matrix multiply?
I am using 2.6.2. My largest multiplication is a 55x50 matrix against a 50x10000 matrix. My matrices are Double.DenseMatrix and Matrix<double>.
How would you recommend profiling an app that uses matrix multiply?