Introduction:
In the Universal Media Server project, we recently ran some benchmarks to discover the fastest way to read files, particularly big files like HD movies. We tested four methods using an automatic benchmark script:
FileChannel
using File
input
FileChannel
using Path
input
DataInputStream
using File
input
RandomAccessFile
using File
input
We tested these on different hard drives with different rotation speeds, and with files from 600MB up to 22GB each, and using 1-100 threads to see what effect that had on the results.
Results:
We experienced different results but on average for our use case, we found that the two FileChannel
methods were the best, and went with the second option since the Path input is the newer syntax in Java. The DataInputStream
and RandomAccessFile
had significantly slow outliers that had been causing problems on some hard drives.
My results:
FileChannel using File input:
Benchmarking of hashing 152000 files using 1 thread took 57277 ms (376824 ns average per file)
Benchmarking of hashing 152000 files using 100 threads took 20130 ms (132437 ns average per file)
FileChannel using Path input:
Benchmarking of hashing 152000 files using 1 thread took 56675 ms (372867 ns average per file)
Benchmarking of hashing 152000 files using 100 threads took 21373 ms (140615 ns average per file)
DataInputStream using File input:
Benchmarking of hashing 152000 files using 1 thread took 75716 ms (498133 ns average per file)
Benchmarking of hashing 152000 files using 100 threads took 330825 ms (2176486 ns average per file)
RandomAccessFile using File input:
Benchmarking of hashing 152000 files using 1 thread took 51090 ms (336121 ns average per file)
Benchmarking of hashing 152000 files using 100 threads took 326446 ms (2147671 ns average per file)
For other results and more details, check out the branch with the benchmarking code
Also note that we were doing a specific type of hashing that is used by OpenSubtitles, which involves reading the beginning and end of the file, so other uses of the reads may give different results.