Friday, November 23, 2012

Massive Integer I/O

One of the biggest challenges faced when dealing with massive multiple-precision integers is memory usage. The last thing we want is stack overflows stemming from attempts to read or write a number too big for the space allocated.

Luckily, GMP (MPIR for Windows) seems to render most of those concerns moot, as it dynamically allocates, re-allocates, and eventually frees memory as variables are initialized, operated upon, and subsequently cleared. Of course, it ultimately still comes down to the programmer to remember to properly initialize all variables before their first use, and clear them from memory when done with them.

A more in-depth approach involves manually clearing and re-initializing integer variables as is done in all NCPrime applications. At the end of a single round of testing (dividing n by x), the variables that store(d) the remainder and quotient are cleared and immediately re-initialized. Since GMP/MPIR uses a standard zero-initialization technique, there is no loss of data and no residuals, meaning you will never end up with a partial integer being rounded up or down to give a variable a value it shouldn't actually have.

When the testing thread exits, several things happen. This will be covered in a later post, but behind the scenes, stuff is constantly getting deleted from and (sometimes) re-entered into memory. On completion of the thread, the instance of the "worker" class is scheduled for deletion when safe to do so. This happens as soon as the thread has completely exited. The worker class contains instances of structures for testing data, which are deleted when the worker instance is. Deletion of the structure leads to clearing of the actual variables used in testing.

It is a bit complicated, but efficient nonetheless. However, using multiple-precision integers internally is easy compared to inputting and outputting them. Input is done via strings. That is easy enough, especially considering the data in question already exists as a string (usually somewhere in the GUI).

Output is difficult because it involves creating a massive array of multi-byte characters...this usually, if not always eats up available memory. I have caught applications like GJSieve using well over 2GiB of RAM on some machines. It doesn't even get close to that on my usual testing/programming machine with a total of 4GiB RAM.

Printing the number takes A LOT less memory as it does not ever actually display the number. In fact, if the number is already displayed, it does practically no work at all. If not, it simply calculates the number based on the current number type selected and prints it to a string.

This is why I recommend in applications like MuPuPriNT and GJSieve leaving the "Show Number" box unchecked unless it is a small number (or a Pythagorean number, as those are often much smaller).

I just thought I'd write about that for a while. In the future, expect more posts on the inner workings of NCPrime applications. It really is quite fascinating.

No comments:

Post a Comment