Hi, I got this exercise I have to do and was hoping for a little help... don't worry, I don't want the solution, I'm merely hoping for suggestions in the way of optimization. :)
Basic problem: Open a file and determine its length by way of counting the number of characters (bytes) in it. IOW, not a hard problem at all. It is to be written in C, which I haven't used for a very long time--due to circumstances, I've lately been using Java.
Anyway. At least in Java, reading (counting) the characters one by one is, of course, terribly inefficient. Just for the fun of it, I wrote some short snippets in Java--one that reads character by character, another that prepares a buffer and reads into it. For a large file, the former took about 600+ times longer to complete than the latter.
Question #1: I assume it is safe to assume the same for C (meaning that reading into some sort of buffer is the way to go)?
Question #2: What would typically be a good size for the buffer? At least in Java, smaller buffers are somewhat less efficient but at some point larger buffers seem to slow down the process rather than speed it up. As an example, with a ~107MB file I got the best performance using a 20K buffer, and using a 1K buffer took about 50% longer; ditto using a 40K buffer. (Buffers were allocated before I started the timer--the only thing I timed was the actual reading.) This would indicate that 20K (in Java) is a good size for a buffer for this purpose. Opinions?
Question #3: Would it be safe to assume that a similar buffer size will be fairly efficient in C as well, and if not, what would be?
Question #4: In Java, since I really am not interested in what actually is read from the file, is there some sort of "buffer" that I could use that would be quicker (essentially, "I don't care of the contents, just tell me how many bytes you've read and drop the bytes")? (This doesn't seem functionally useful to me, so I'd assume the answer is no.)
Question #5: In C, I suppose similar functionality is accomplished using fgets (and then "count" the bytes read using the strlen function)? Or is there some "better" solution, possibly one that would "drop" the read data (as described under question #4) and just return how many bytes were read? (Again, I assume the answer is no.)
|