File Format Basics
There are two key parts to a file format, the codec and the container format.
A codec, which is a portmanteau for coder/decoder, is a device or program that encodes or decodes a data stream or signal (simply referred to as "data" hereinafter). The coder converts data into code for transmission or storage (i.e. it "encodes" the data, hence the terms "coder" and "encoder" are used interchangeably). This encoding process may also incorporate encryption and compression for security and efficiency, respectively. After transmission or storage, the decoder reverses the encoding process to facilitate examining or manipulating the data.
A container format, also known as a wrapper, defines how one or more data streams or signals, as well as any metadata, is to be stored in a single file. For example, a basic container format designed for streaming videos will define things like:
- which video and audio codecs are used;
- how the video and audio data streams are chunked;
- where video and audio chunks are located in the file (e.g. interwoven such that audio sections match with their video frame counterparts); and, among other things,
- where file metadata is kept.
The Importance
File formats define how digital data is stored. Given that the fundamental action of a computer is to manipulate stored digital data, file formats play an indispensable role in facilitating efficient computation. Therefore, accessibility to use and/or improve file formats has the potential to benefit tens of billions of devices, and consequently, the billions of people interacting with them everyday. At these orders of magnitude, the economic, environmental and social upshot of relatively small improvements in either file size, access time or quality has staggering consequences.