Make a comment to the author.


A Test of the Effects of Rounding Precision Upon a Matrix Problem.

The Purpose.

This purpose of this test was to determine how easy it was to find a realistic problem using 'double' variables which would produce better results under the Linux default 64 bit precision mode than when run under a 53 bit precision mode.

The Problem.

Using recognised matrix code solve the following for x:
            A x = b
where A is a 10 by 10 matrix with all elements equal to 1, except for the diagonal elements which are 1+1.0e-10, i.e. it is 'almost' singular.

b is a column vector with elements 0, 1, 2, ..., 9.

The (arbitrarily) chosen code is from the meschach code of the 'C' package of netlib. Here is the program:

        main() {
          MAT *A, *QR;
          VEC *b, *x, *diag;
          u_int m=10, n=10, i, j;
        
          A = m_get(m, n);
          for ( i = 0; i < m; i++ )
            for ( j = 0; j < n; j++ ) {
                A->me[i][j] = 1.0;
                if ( i == j )
                  A->me[i][j] += 1.0e-12;
            }
          diag = v_get(A->m);
          QR = m_copy(A, MNULL);
          QRfactor(QR, diag);
        
          b = v_get(A->m);
          for ( i = 0; i < b->dim; i++ )
            b->ve[i] = i;
        
          x = QRsolve(QR, diag, b, VNULL);
        
          for ( i = 0; i < m; i++ )
            printf("%.20Lg (err=%.20g)\n", (long double)x->ve[i]);
        
          printf("\n||A*x-b|| = %Lg\n",
             (long double)(v_norm2(v_sub(mv_mlt(A, x, VNULL), b, VNULL))) );
        }

The Results.

The correct answer (computations performed at 128 bit precision) is:

  -4499599982940.764920071048
  -3499688875620.494937833037
  -2499777768300.224955595027
  -1499866660979.954973357016
   -499955553659.6849911190054
    499955553660.5849911190053
   1499866660980.854973357016
   2499777768301.124955595027
   3499688875621.394937833037
   4499599982941.664920071048

Three methods of compiling and running the program and the meschach library were tried:

Default Linux with -O2

The answer differed from the exact answer by the following vector:
 -86739704.29
   9637744.92
   9637744.92
   9637744.92
   9637744.92
   9637744.92
   9637744.92
   9637744.92
   9637744.92
   9637744.92

The error estimate:

       E = ||A*x-b||
was 0.00444047. The correct value is 0.00444017.

Default Linux with --float-store

This method gives a executable which "almost" has IEEE-style-rounding behaviour. The answer vector is almost identical to that produced in the -O2 case. However the error estimate E produced by the program is 9.53674e-06, which is several orders of magnitude too small.

53 bit precision

This gives IEEE-style-rounding operation. The FPU control bits are set to 53 bit precision via the fesetprecision() function of my wmexcep package. All arithmetic results are correctly rounded to 53 bit precision. The answer differed from the exact answer by the following vector:
  2557935794.09
  -284215088.23
  -284215088.23
  -284215088.23
  -284215088.23
  -284215088.23
  -284215088.23
  -284215088.23
  -284215088.23
  -284215088.23
and an error estimate E of 0.0067658. The correct value of E is 0.0038174.

As should be expected, the result does not depend upon the level of optimisation or whether --float-store is used.

Note that the error vector is much larger than in the other two cases.

Conclusions

This program has demonstrated the following:
  1. --float-store is not always sufficient to get IEEE-style-rounding behaviour,
  2. the Linux default behaviour can give significantly better results than IEEE-style-rounding behaviour.