Wednesday 28 January 2015

Difference between write() and writelines()


Actually, you can use a single string together with write() and a sequence of strings together withwritelines().
write(arg) expects a string as argument and writes it to the file. If you provide a list of strings, it will raise an exception (btw, show errors to us!).
writelines(arg) expects an interable as argument (an iterable object can be a tuple, a list, or an iterator in the most general sense). Each item contained in the iterator is expected to be a string. A tuple of strings is what you provided, so things worked.
The nature of the string(s) does not matter to both of the functions, i.e. they just write to the file whatever you provide them. The interesting part is that writelines() does not add newline characters on its own, so the method name can actually be quite confusing. It actually behaves like an imaginary method called write_all_of_these_strings(sequence).
What follows is an idiomatic way in Python to write a list of strings to a file while keeping each string in its own line:
lines = ['line1', 'line2']
with open('filename.txt', 'w') as f:
    f.write('\n'.join(lines))
This takes care of closing the file for you. The construct '\n'.join(lines) concatenates (connects) the strings in the list lines with the character '\n'. It is more efficient than using +operator.
Starting from the same lines sequence, ending up with the same output, but using writelines():
lines = ['line1', 'line2']
with open('filename.txt', 'w') as f:
    f.writelines("%s\n" % l for l in lines)
This makes use of a generator expression and dynamically creates newline-terminated strings. writelines() iterates over this sequence of strings and .
Edit: Another point you should be aware of:
write() and readlines() existed before writelines() was introduced. writelines() was introduced later as a counterpart of readlines(), so that one could easily write the file content that was just read via readlines():
outfile.writelines(infile.readlines())
Really, this is the main reason why writelines has such a confusing name. Also, today, we do not really want to use this method anymore. readlines() reads the entire file to the memory of your machine before writelines() starts to write the data. First of all, this may waste time. Why not start writing parts of data while reading other parts? But, most importantly, this approach can be very memory consuming. In an extreme scenario, where the input file is larger than the memory of your machine, this approach won't even work. The solution to this problem is to use iterators only. A working example:
with open('inputfile') as infile:
    with open('outputfile') as outfile:
        for line in infile:
            outfile.write(line)
This reads the input file line by line. As soon as one line is read, this line is written to the output file. Schematically spoken, there always is only one single line in memory (compared to the entire file content being in memory in case of the readlines/writelines approach).

No comments:

Post a Comment