Memory-mapped files
with that of C/C++ <fstream> library.
The code is divided into following modules:
#1 A
simple file containing integers from 0 to (2*1024*1024 - 1) integers is created.
The data is sorted.
#2 Then,
the file is read using Memory-mapped
files. 100 randomly chosen integers (using the rand()
function) are read from this file. Total reading time is noted (using the ctime
library).
#3 The
same process is repeated, but this time using <fstream>
library and the total time is noted again.
Results
Using <fstream>:
30 secs (approx.)
Using Memory Mapped
File: 1 sec (approx.)
Report
As
it is clear from the above findings that accessing a file by Mapping it on the
Memory provides much fast access as compared to that usual and trivial method
of accessing (i.e. via <fstream>).
The difference is in front of you i.e. contrary to <fstream>, MMF provides almost 30X faster access.
<fstream>
When you are accessing integers
using <fstream>, you can’t access them randomly. If you at some point in
the file and you want to read an integer that is behind the current integer,
then you have to close the file, re-open it and then you’ll be able to read
that required integer. Also. If you want to read an integer that is placed
after the current integer but is placed far away, then you’ve to read the
integers in-between for no use. So, a lot of your time is wasted in reading
useless data.
MMF
Whereas,
in Memory Mapped Files, you don’t need to open or close any file. All you have
to do is to map it on the memory and after doing that, it appears that like
whole of the file is brought up into the RAM. Now you have random access to the
integers same as you access the integer when they are placed in an array. For
example, if you have integers 0,1,2,……,N, map[0] will give you 0, map[1] will
give you 1 and so on, map[i] will give you the integer at position i.
C Code
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <fstream>
#include <time.h>
int main() {
int size = 2 * 1024 * 1024,
test, i, j;
int randomNumbers[100];
time_t start, end, diff;
/***** POPULATING THE FILE *****/
FILE *fout;
fout
= fopen("integers.txt", "w");
start
= time(NULL);
for (i = 0; i < size; i++)
fprintf(fout,
"%d\n", i);
end
= time(NULL);
diff
= difftime(end, start);
printf("\n\n\t--> Time
taken during Populating the file: %ld\n\n\n", diff);
fclose(fout);
/***** 100 RANDOM NUMBERS TO BE SEARCHED *****/
for (i = 0; i < 100; i++)
randomNumbers[i]
= rand() % (5 * size);
/***** USINF <fstream> *****/
FILE *fin;
start
= time(NULL);
for (i = 0; i < 100; i++) {
fin
= fopen("integers.txt", "r");
for (j = 0; j < size; j++) {
fscanf(fin,
"%d", &test);
if (randomNumbers[i] == test) {
printf("Integer# %d: %d
Found\n",
i + 1, randomNumbers[i]);
break;
}
}
if (j == size)
printf("Integer# %d: %d Not
Found\n",
i + 1, randomNumbers[i]);
fclose(fin);
}
end
= time(NULL);
diff
= difftime(end, start);
printf("\n\n\t--> Time
taken during <fstream> : %ld\n\n\n", diff);
/***** USING MMF
*****/
int fd = open("/home/haza/integers.txt", O_RDONLY);
int *map = mmap(0, size*sizeof(int), PROT_READ, MAP_SHARED, fd,
0);
if (map == MAP_FAILED) {
close(fd);
perror("Error mmapping the
file");
exit(EXIT_FAILURE);
}
start
= time(NULL);
for (i = 0; i < 100; i++) {
for (j = 0; j < size; j++) {
if (randomNumbers[i] == map[j])
{
printf("Integer# %d: %d
Found\n",
i + 1, randomNumbers[i]);
break;
}
}
if (j == size)
printf("Integer# %d: %d Not
Found\n",
i + 1, randomNumbers[i]);
}
end
= time(NULL);
diff
= difftime(end, start);
printf("\n\n\t--> Time
taken using MMF: %ld\n\n\n", diff);
return 0;
}
The code is completely wrong, writes integers as text, the fstream part of the code parses it right (with fscanf) and thus is slow but the mmap part reads it as if it were a binary file, doesn't parse anything, produce a completely wrong result and... is fast. Obviously...
ReplyDelete