Would reading 100 text files into memory be a bad idea? What about 1000 text files? Talking mostly about text files in Markdown format so I expect at max average size of each one would be like 20KB. This is working with Javascript/Node.js so I'm a bit unsure if the size of each file would translate 1:1 when it's under a String object. I feel like it would be bigger due to some overhead added by JS. Or maybe not, I'm really unsure how this works.
For a context, I'm working on a static site generator in Node.js and I'm trying to see if reading all the files at once and then process them would be better than say trying to read and process one file at a time.
In nodejs, you can use Buffer.byteLength(yourstring, 'utf8') to determine its size. (Provided that UTF-8 is your encoding.)
Loading all strings, then processing them has the drawback that you have to wait until all strings are loaded before doing anything with the processing result.
Processing them one file after another enables you to get processing results early.
Cache misses are heavily reduced if you load, then process one file after another. Might not matter at all; depends on how you're traversing loaded data. (You'll have a hard time measuring this directly; you might observe less time used for processing though, along with less CPU cycles used (the latter is a tautology here -- less raw processing time => less cycles)).
Preloaded data is harder to debug -- diving into big dictionaries is a pain in the ass.
File reading locks your program if you're not processing while reading -- seems insignificant until you're going over lots of files. So use buffering: Declare the buffer, traverse the buffer. (Since you're using nodejs, I'm pretty sure you're already buffering. Just sayin'.)
Since you're buffering data, you should be nice to the buffer -- process as early as possible to leave time for file reads. This is much easier if you're not preloading data.
You'll probably do a directory listing to find the files in some folder (either directly or indirectly). In Linux systems, directory listings have to complete before producing output. So THIS LISTING should be done as early as possible, and if you can manage it, you should do it asynchronously. No biggie with 1000 files, but with, say, 398572 files, it'll result in a complete stall.
Edit: Oh, if you're on Linux, consider having the files in RAM on a RAM FS. Helps if file ops are the bottleneck.
sudo mount -t tmpfs -o size=200M none /your/path/to/poop/
Then copy to that folder.
As soon as you umount that, it's gone though. So it's some hot shit.
The directory listing stall issue can be handled with multibucket directories in RAM FS, that's only interesting if you're eating Twitter firehoses to feed a RabbitMQ or whatever. (Probably not what you're doing; you wouldn't ask then.)