Sunday, 24 January 2016

Memory Mapped Files vs. fstream (C)

Memory Mapped Files vs. fstream (C)
Description:    This code compares the performance of Linux
Memory-mapped files with that of C/C++ <fstream> library.

Memory Mapped Files vs. fstream (C)


The code is divided into following modules:

#1   A simple file containing integers from 0 to (2*1024*1024 - 1) integers is created. The data is sorted.

#2   Then, the file is read using Memory-mapped files. 100 randomly chosen integers (using the rand() function) are read from this file. Total reading time is noted (using the ctime library).

#3   The same process is repeated, but this time using <fstream> library and the total time is noted again.
Results
                             Using <fstream>: 30 secs (approx.)
                             Using Memory Mapped File: 1 sec (approx.)
Report
As it is clear from the above findings that accessing a file by Mapping it on the Memory provides much fast access as compared to that usual and trivial method of accessing (i.e. via <fstream>). The difference is in front of you i.e. contrary to <fstream>, MMF provides almost 30X faster access.

<fstream>
                   When you are accessing integers using <fstream>, you can’t access them randomly. If you at some point in the file and you want to read an integer that is behind the current integer, then you have to close the file, re-open it and then you’ll be able to read that required integer. Also. If you want to read an integer that is placed after the current integer but is placed far away, then you’ve to read the integers in-between for no use. So, a lot of your time is wasted in reading useless data.

MMF
                   Whereas, in Memory Mapped Files, you don’t need to open or close any file. All you have to do is to map it on the memory and after doing that, it appears that like whole of the file is brought up into the RAM. Now you have random access to the integers same as you access the integer when they are placed in an array. For example, if you have integers 0,1,2,……,N, map[0] will give you 0, map[1] will give you 1 and so on, map[i] will give you the integer at position i.



C Code

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <fstream>
#include <time.h>

int main() {

     int size = 2 * 1024 * 1024, test, i, j;
     int randomNumbers[100];
     time_t start, end, diff;

     /*****   POPULATING THE FILE   *****/

     FILE *fout;
     fout = fopen("integers.txt", "w");
     start = time(NULL);
     for (i = 0; i < size; i++)
           fprintf(fout, "%d\n", i);

     end = time(NULL);
     diff = difftime(end, start);
     printf("\n\n\t--> Time taken during Populating the file: %ld\n\n\n", diff);
     fclose(fout);


     /*****   100 RANDOM NUMBERS TO BE SEARCHED   *****/

     for (i = 0; i < 100; i++)
           randomNumbers[i] = rand() % (5 * size);


     /*****   USINF <fstream>   *****/

     FILE *fin;

     start = time(NULL);

     for (i = 0; i < 100; i++) {
           fin = fopen("integers.txt", "r");
           for (j = 0; j < size; j++) {
                fscanf(fin, "%d", &test);
                if (randomNumbers[i] == test) {
                     printf("Integer# %d: %d Found\n", i + 1, randomNumbers[i]);
                     break;
                }
           }
           if (j == size)
                printf("Integer# %d: %d Not Found\n", i + 1, randomNumbers[i]);
           fclose(fin);
     }
     end = time(NULL);
     diff = difftime(end, start);
     printf("\n\n\t--> Time taken during <fstream> : %ld\n\n\n", diff);


     /*****   USING MMF   *****/

     int fd = open("/home/haza/integers.txt", O_RDONLY);
     int *map = mmap(0, size*sizeof(int), PROT_READ, MAP_SHARED, fd, 0);
     if (map == MAP_FAILED) {
           close(fd);
           perror("Error mmapping the file");
           exit(EXIT_FAILURE);
     }
     start = time(NULL);
     for (i = 0; i < 100; i++) {
           for (j = 0; j < size; j++) {
                if (randomNumbers[i] == map[j]) {
                     printf("Integer# %d: %d Found\n", i + 1, randomNumbers[i]);
                     break;
                }
           }
           if (j == size)
                printf("Integer# %d: %d Not Found\n", i + 1, randomNumbers[i]);
     }
     end = time(NULL);
     diff = difftime(end, start);
     printf("\n\n\t--> Time taken using MMF: %ld\n\n\n", diff);

     return 0;
}




1 comment

  1. The code is completely wrong, writes integers as text, the fstream part of the code parses it right (with fscanf) and thus is slow but the mmap part reads it as if it were a binary file, doesn't parse anything, produce a completely wrong result and... is fast. Obviously...

    ReplyDelete

Recent Posts