1. Reading and writing files
Java offers a wide variety of ways to manipulate the contents of files, bringing great power but complexity at the same time.
The reading and writing of files in Java is done through data flows or streams, which may be either oriented to bytes or considered as a set of characters. The concept of file, which is static, must be differentiated from the concept of stream, which has a dynamic character: a file is stored on disk, but from the point of view of the application, we are interested in the transfer of this information from the file to our program. A classic comparison: streams would be like water pipes, while files would be reservoirs. The concept of data flow, apart from the scope of files, will also be applicable to the transfer of information, for example through the network or between processes.
Streams can be classified into:
Input Streams: Those that go from a source (e.g. file) to the program.Output Streams: Those that exit the program to a destination (e.g. file).
When reading and storing data, we must be very careful with the types of data we work with, since the same sequence of bits on disk represents different information depending on the type of data used. Even when dealing with the same type of data, as in the case of characters, special care must be taken, since different systems may use different coding. Java, for example for the char type uses 16-bit Unicode (UTF-16), but we can try to access encoded files, for example with UTF-8 or ASCII.
The parent superclasses for handling byte-oriented streams are InputStream and OutputStream. Others are derived from these, but for file management we are interested in two: FileInputStream and FileOutputStream.
1.1. FileInputStream
The FileInputStream class is used to access files for reading, and has two main constructors:
FileInputStream (File f)→ Receives an object of typeFile, which will reference the object we will read from.FileInputStream (String name)→ Receives a string with the name or path of the location of the file to be read.
The main methods we have for this class are:
int read()→ Abstract sequential reading method: Returns an integer corresponding to the next byte of an input stream (file, byte array...). If it reaches the end of the sequence, it returns-1. On error, it will throw an exception of typeIOException.int read(byte[] buffer)→ Reads a given number of bytes from the input (as long as the buffer), saving them to the buffer. It will return the number of bytes effectively read, which at most will be as large as the buffer. If no bytes are available return-1.int available()→ Indicates the bytes we have available for reading.long skip(long des)→ Skips as many bytes as the parameter indicates. The return value is the number of bytes that have been effectively dumped (they can be less than what we have indicated if we reach the end for example).int close()→ Closes the data stream.
1.2. FileOutputStream
The FileOutputStream class is used to access files for writing. It has the following builders:
FileOutputStream (File f)→ Receives an object of typeFileand opens it in write mode. In case it does not exist, it will be created, and if it exists, it will be overwritten.FileOutputStream (String name)→ Receives a string with the name or path of the location of the file to be written, and opens it in write mode. In case it does not exist, it will be created, and if it exists, it will be overwritten.FileOutputStream (File f, boolean append)→ Gets an object of typeFileand opens it in append mode, to write to the end. If it doesn't exist, it will be created, and if it exists, it will write behind it (only if append is set to True).FileOutputStream (String , boolean append)→ Receives a string with the name or path of the file location to be written and opens it in append mode, to be written at the end. If it doesn't exist, it will be created, and if it exists, it will write behind it (only if append is set to True).
The main methods of the class are:
write(int byte)→ Writes the byte to the output stream. Although this is an integer parameter, only one byte is written. If there is an error, an IOException is thrown.void write(byte[ ] buffer)→ Writes the contents of the buffer (byte vector) to the file. If the buffer is null, it will throw an exception.void write(byte[ ] buffer, int pos, int length)→ Writes the contents of the buffer (byte vector) from position pos, and as many bytes as given in length.void flush()→ Forces the remaining bytes in the write cache to be written to the output file.void close()→Closes the output stream and frees the resources.
1.2.1. Solved Exercise
Create a simple Java program that copies a file to another file byte to byte. Source and destination file will be indicated on the command line.
Solution
Think and try
With the sample program we have seen, we could copy files of every kind: text, audio, video. Try it as your own and test the content is the same.
Challenge
Try to change the last program in order to read data in 32 bytes chunks, using the accurate methods of the classes. Share your solution in the forum.
1.3. Text Files
As stated, Java allows streams to be managed with either a byte orientation or a character orientation. The abstract classes for managing character-oriented streams are Reader and Writer. These are derived from others, but for the management of files we focus on FileReader and FileWriter.
1.3.1. FileReader
The FileReader class serves to access files for reading, and has two constructors:
FileReader (File f)→ Receives an object of typeFile, which will reference the object from which it is to be obtained.FileReader (String name)→ Receives a String with the name or path to the file.
The methods we have for this class are pretty much the same as we have for FileInputStream, with the difference that we now read characters instead of bytes. One aspect to consider is that these characters will be represented in the format of the host operating system (UTF-8 in the case of Linux), unlike the UTF-16 used by other internal storage classes:
int read()→ Reads the next character from the input stream and returns it as an integer. If there are no characters left, it returns -1, and if there is an error, it throws an exception of type IOException.int read(char[] buffer)→ Fills the buffer with as many characters from the input as it has (at most). It returns the number of characters effectively read. This number will be the length of the buffer or less if there are not enough characters to read. If none, return -1.int available()→ Returns the number of characters available to read.long skip(long des)→ Skips as many characters as the parameter indicates. The return value is the number of characters that have been effectively thrown (they can be less than what we have indicated if we reach the end for example)int close()→ Closes the data stream
1.3.2. FileWriter
The FileWriter class would be the equivalent of FileOutputStream in character-oriented streams version. The class constructors are:
FileWriter (File f)→ Opens the file specified byFilefor writing. If the file does not exist, it will be created, and if it exists, it will delete the contents.FileWriter (String name)→ Opens the file specified by a string with name and path in writing mode. If it does not exist, it will be created, and if it exists, it will delete the contentFileWriter (File f, boolean append)→ Receives an object of type File and opens it in append mode, to write at the end. If it doesn't exist, it will be created, and if it exists, it will write behind it.FileWriter (String name_f, boolean append)→ Receives a string with the name or path of the location of the file to be written and opens it in append mode, to write at the end. If it doesn't exist, it will be created, and if it exists, it will write behind it.
The main methods of the FileWriter class are very similar to those of OutputStream:
write(int character)→ Writes the character to the output stream, with the operating system's own encoding. If there is an error, an IOException is thrown.void write(char[] buffer)→ Writes the contents of the buffer (character vector) to the file. If the buffer is null, it will throw an exception.void write(char[] buffer, int pos, int length)→ Writes the contents of the buffer (character array) from position pos, and as many bom bytes are given in length.void flush()→ Forces the remaining bytes in the write cache to be written to the output file.void close()→ Closes the output stream and frees the resources.void write(String text)→ Writes the entire text content to the file.
1.3.3. Solved Exercise
Create a simple Java program that merge all the files inside a folder in a unique file. The source folder and destination file will be indicated in program's call. We suppose all the files inside that folder are text files.
Solution
Improve your code
Try to improve the last code creating a merge() function. We should call it inside the main loop
1.4. Decorators
Decorator classes are those that inherit from a certain class, and they provide added functionality to the original. In the case of input and output streams, we have decorators that allow us to read or write complete lines, instead of byte by byte, or save certain data format. It simplifies our work, adding a more natural and human-friendly way to use the base classes.
The InputStream class has several decorators, but we'll stick with the following:
DataInputStream→ Allows reading data of any type (integer, logical, etc.)ObjectInputStream→ Add the option to reads an entire object
The OutputStream class, on the other hand, also has different decorators, among which we highlight:
DataOutputStream→ Allows writing data of any type (integer, logical, etc.)PrintStream→ Allows writing data of any type, and also acceptsprintfandprintlnmethodsObjectOutputStream→ Allows writing (serializing) objects
Regarding the decorator classes for character-oriented streams, we have, on the one hand, the (most prominent) Reader decorators:
BufferedReader→ Creates an input buffer, allowing for example to read a complete line
And for writer:
BufferedWriter→ Creates an output buffer, allowing for example to write a full linePrintWriterAllows the writing of data of different types, and has methods such as printf and println
The BufferedReader class, among others, has the readLine() method, which allows reading an entire line of the file up to the end of the line, very useful in text files.
On their behalf, the BufferedWriter class provides the newLine() method to enter the carriage return character, and the write(String chain, int start, int length) method to write a string or a specific part.
The other decorator for writing is the PrintWriter, which offers us the methods print(data), println(data) and printf(), for formatting.
Alternatives to...
Obviously, you could :
- concatenate
\nat the end of every String on everyprint()method instead callingnewLine(). - use
println()instead callingnewLine().
1.4.1. Solved Exercise
Create a program to copy a text file adding the number lines at the beginning of each line.
Solution
1.5. Binary Files
In this section and the following we will see how to storage different types of data and objects in files of different formats: binary files, XML or JSON files. Now, we will focus on binary files, both for storing structured data on the one hand, and objects, on the other, a process known as serialization.
1.5.1. Storing structured data in binary files
In the previous section we saw how to work with character and byte files. In the case of byte files, we have seen how to read and write them sequentially, byte by byte to the end of the file. Let's now see how to store structured data in binary files and how to be able to read them.
Let's imagine that we want to save the following table, which combines data of different types. This sample will appear in next sections.
| Module | Hours | Qualification |
|---|---|---|
| Accés a Dades | 6 | 8.45 |
| Programació de serveis i processos | 3 | 9.0 |
| Desenvolupament d'interfícies | 6 | 8.0 |
| Programació Multimèdia i dispositius mòbils | 5 | 7.34 |
| Sistemes de Gestió Empresarial | 5 | 8.2 |
| Empresa i iniciativa emprenedora | 3 | 7.4 |
As we can see, we have text data, integer and floating point numeric data. If we want to maintain the types, character-oriented streams like Reader or Writer will not be useful, so we should use InputStream and OutputStream. The disadvantage is that it would be necessary to know exactly how many bytes each type of data occupies.
In order to efficiently save these structures, we can make use of the DataInputStream and DataOutputStream classes, which are stream decorators and which offer us the following methods to save or retrieve data of different types, without worrying about what they occupy internally each. As you can see, there are reciprocal methods to read and write every base types.
DataInputStream |
DataOutputStream |
Description |
|---|---|---|
byte readByte() |
void writeByte(int) |
a byte |
short readShort() |
void writeShort(short) |
short int |
int readInt() |
void writeInt(int) |
an int |
long readLong() |
void writeLong(long) |
a long int |
float readFloat() |
void writeFloat(float) |
single precision |
double readDouble() |
void writeDouble(double) |
double precision |
char readChar() |
void writeChar(int) |
a Unicode char |
String readUTF() |
void writeUTF(String) |
a UTF-8 string |
Warning
A UTF-8 string is different of a single String. When a String is written, as is evident, the whole characters are storage. Saving it as UTF-8 string, add information about the string's length, and this information is essential to cut this strings when you will read in a future.
Imagine you save two strings, "euro" and "sport". The result will be, at the end "eurosport". When somebody open this file in the future, how do he or she know the number of string stored ("euro", "sport" or simply "eurosport" (TV channel))
Saving as UTF-8, when you save "euro" and "sport", the result is "4euro5sport". When somebody try to read it, first of all see a '4', and read "euro". Then, see a '5', and then read "sport". Notice that if "eurosport" is stored, the result is "9eurosport". Try to write a sample program with this string, opening the resulting file with amn hexadecimal editor.
1.5.2. Solved exercise
Write a program who has data stored in three parallel arrays, storing modules data of DAM. Write two functions, one for write this data on a file and another to read it. Consider to storage data group by module, instead name, hours and grade.
Solució
1.6. Object serialization
Java provides a generic object serialization system: a recursive system that iterates over each object contained in an instance, until it reaches the primitive types, which it stores as an array of bytes. Apart from this information of the primitive types, additional information is also stored, or metadata specific to each class, such as the name or attributes among others. Thanks to this metadata, which describes the objects we save, we can automate serialization in a generic way, ensuring that we can read the objects later.
The disadvantage of this method is that when we change the definition of the class (for example adding one more attribute, or changing its type), the metadata is modified, and we would not be able to read serialized objects with previous versions of the class In addition, it is also necessary to take into account that this is a specific mechanism of Java, and that we will not be able to consult these objects from other languages.
For all of this, other techniques are preferable for the permanent storage of objects, which we will see later, but serialization can be useful for temporary storage, within the same execution of the application.
What are this?
| Text Only | |
|---|---|
1 | |
1.6.1. The Serializable interface and Decorators
If we want a class to be serializable, it must implement the Serializable interface, the purpose of which is simply to act as a marker to indicate to the JVM that the class can be serialized, so this class will have no methods.
It should be said that all the classes corresponding to the basic types already implement the Serializable interface, as well as the String class, containers and Arrays. In the case of collections, it will depend on its contents, if its elements are serializable, the collection will be as well. If the object we want to serialize or those of any of its objects do not implement the Serializable interface, the exception NotSerializableException is thrown.
The ObjectInputStream and ObjectOutputStream decorators offer us the ability to serialize any de-serializable object. In order to write an object, we will make use of the writeObject method of ObjectOutputStream, and to read it we will make use of readObject, of ObjectInputStream.
Sheeps with sheeps
Note that reading objects must be done on instances of the same class that was saved. Otherwise, a ClassCastException is thrown. Also, you must have the compiled code of the class, to avoid the ClassNotFoundException exception.
In addition, readObject return an Object, and we need an object of a specific class. For this reason you have to cast from Object to the needed class. Inheritance concepts are very important to guarantee robust programs.
1.6.2. Solved exercise
Starting with the same base of the Moduls class in the previous exercise, we are going to create a Modul class, to store a single module. This kind of classes are called POJO's (Plain Old Java Objects), and are designed only to storage information. (It will appear later, togethre with BEAN's classes).
Once the Modul class is created, write a program to save on a file objects directly. Later, write the complementary function to read all the stored objects from that file.
Solution
Less work, but the same in the end
You probably will think about it: If all in Java inherits from Object, an ArrayList is an Object...Can I save or load an entire ArrayList in one unique call?. Try it as an improvement of the last exercise.