1. HTML5 defines a standard way to embed video in a web page, using a <video> element.
2. “AVI” and “MP4” are just container formats. Just like a ZIP file can contain any sort of file within it, video container formats only define how to store things within them, not what kinds of data are stored. (It’s a little more complicated than that, because not all video streams are compatible with all container formats.)
3. A video file usually contains multiple tracks — a video track (without audio), plus one or more audio tracks (without video). Tracks are usually interrelated. An audio track contains markers within it to help synchronize the audio with the video. Individual tracks can have metadata, such as the aspect ratio of a video track, or the language of an audio track. Containers can also have metadata, such as the title of the video itself, cover art for the video, episode numbers (for television shows), and so on.
4. There are lots of video container formats. Some of the most popular include
• MPEG 4, usually with an .mp4 or .m4v extension. The MPEG 4 container is based on Apple’s older QuickTime container (.mov).
• Flash Video, usually with an .flv extension. Flash Video is, unsurprisingly, used by Adobe Flash.
• Ogg, usually with an .ogv extension. Ogg is an open standard, open source–friendly, and unencumbered by any known patents.
• WebM is a new container format. It is technically similar to another format, called Matroska.
• Audio Video Interleave, usually with an .avi extension. The AVI container format was invented by Microsoft in a simpler time, when the fact that computers could play video at all was considered pretty amazing. It does not officially support features of more recent container formats like embedded metadata. It does not even officially support most of the modern video and audio codecs in use today.
5. When you “watch a video,” your video player is doing at least three things at once:
• Interpreting the container format to find out which video and audio tracks are available, and how they are stored within the file so that it can find the data it needs to decode next
• Decoding the video stream and displaying a series of images on the screen
• Decoding the audio stream and sending the sound to your speakers
6. A video codec is an algorithm by which a video stream is encoded. (The word “codec” is a portmanteau, a combination of the words “coder” and “decoder.”) Most modern video codecs use all sorts of tricks to minimize the amount of information required to display one frame after the next. There are lossy and lossless video codecs. There are lossy and lossless video codecs. A lossy video codec means that information is being irretrievably lost during encoding. There are tons of video codecs. The three most relevant codecs are H.264, Theora, and VP8.
7. H.264 is also known as “MPEG-4 part 10,” a.k.a. “MPEG-4 AVC,” a.k.a. “MPEG-4 Advanced Video Coding.” H.264 was also developed by the MPEG group and standardized in 2003. It aims to provide a single codec for low-bandwidth, low-CPU devices (cell phones); high-bandwidth, high-CPU devices (modern desktop computers); and everything in between. The H.264 standard is split into “profiles,” which each define a set of optional features that trade complexity for file size. Higher profiles use more optional features, offer better visual quality at smaller file sizes, take longer to encode, and require more CPU power to decode in real-time. The H.264 standard is patent-encumbered and can be embedded in most popular container formats, including MP4 (used primarily by Apple’s iTunes Store) and MKV (used primarily by non-commercial video enthusiasts).
8. Theora evolved from the VP3 codec and has subsequently been developed by the Xiph.org Foundation. Theora is a royalty-free codec and is not encumbered by any known patents other than the original VP3 patents, which have been licensed royalty-free. Theora video can be embedded in any container format, although it is most often seen in an Ogg container.
9. VP8 is another video codec from On2, the same company that originally developed VP3 (later Theora). Technically, it produces output on par with H.264 High Profile, while maintaining a low decoding complexity on par with H.264 Baseline. VP8 is a royalty-free, modern codec and a sample encoder and decoder was open-sourced by Google.
10. Audio codecs are algorithms by which an audio stream is encoded. There are lossy and lossless audio codecs. Lossless audio is really too big to put on the web. There are all sorts of tricks to minimize the amount of information stored in the audio stream. Different audio codecs throw away different things, but they all have the same purpose: to trick your ears into not noticing the parts that are missing.
11. We may have several speakers, each speaker is fed a particular channel of the original recording. Most general-purpose audio codecs can handle two channels of sound. During recording, the sound is split into left and right channels; during encoding, both channels are stored in the same audio stream; during decoding, both channels are decoded and each is sent to the appropriate speaker. There are gobs and gobs of audio codecs, but on the web, there are really only three you need to know about: MP3, AAC, and Vorbis.
12. MPEG-1 Audio Layer 3 is colloquially known as “MP3.” MP3s can contain up to 2 channels of sound. They can be encoded at different bitrates: 64 kbps, 128 kbps, 192 kbps, and a variety of others from 32 to 320. Higher bitrates mean larger file sizes and better quality audio, although the ratio of audio quality to bitrate is not linear. (128 kbps sounds more than twice as good as 64 kbps, but 256 kbps doesn’t sound twice as good as 128 kbps.) The MP3 format allows for variable bitrate encoding, which means that some parts of the encoded stream are compressed more than others. For example, silence between notes can be encoded at a low bitrate, then the bitrate can spike up a moment later when multiple instruments start playing a complex chord. The MP3 standard doesn’t define exactly how to encode MP3s (although it does define exactly how to decode them).
13. Advanced Audio Coding is affectionately known as “AAC.” Standardized in 1997, it lurched into prominence when Apple chose it as their default format for the iTunes Store. The AAC format is patent-encumbered. AAC was designed to provide better sound quality than MP3 at the same bitrate, and it can encode audio at any bitrate. AAC can encode up to 48 channels of sound. The AAC format also differs from MP3 in defining multiple profiles. The “low-complexity” profile is designed to be playable in real-time on devices with limited CPU power, while higher profiles offer better sound quality at the same bitrate at the expense of slower encoding and decoding.
14. Vorbis is often called “Ogg Vorbis,” although this is technically incorrect. (“Ogg” is just a container format, and Vorbis audio streams can be embedded in other containers.) Vorbis is not encumbered by any known patents. Vorbis audio streams are usually embedded in an Ogg or WebM container, but they can also be embedded in an MP4 or MKV container. Vorbis supports an arbitrary number of sound channels.
15. One <video> element can link to multiple video files, and the browser will choose the first video file it can actually play:
a) Mozilla Firefox (3.5 and later) supports Theora video and Vorbis audio in an Ogg container. Firefox 4 also supports WebM.
b) Opera (10.5 and later) supports Theora video and Vorbis audio in an Ogg container. Opera 10.60 also supports WebM.
c) Google Chrome (3.0 and later) supports Theora video and Vorbis audio in an Ogg container. Google Chrome 6.0 also supports WebM.
d) Safari on Macs and Windows PCs (3.0 and later) will support anything that QuickTime supports. In theory, you could require your users to install third-party QuickTime plugins. This is a long list, but it does not include WebM, Theora, Vorbis, or the Ogg container. However, QuickTime does ship with support for H.264 video (main profile) and AAC audio in an MP4 container.
e) Mobile phones like Apple’s iPhone and Google Android phones support H.264 video (baseline profile) and AAC audio (“low complexity” profile) in an MP4 container.
f) Adobe Flash (9.0.60.184 and later) supports H.264 video (all profiles) and AAC audio (all profiles) in an MP4 container.
g) Internet Explorer 9 supports all profiles of H.264 video and either AAC or MP3 audio in an MP4 container. It will also play WebM video if you install a third-party codec, which is not installed by default on any version of Windows.
h) Internet Explorer 8 has no HTML5 video support at all, but virtually all Internet Explorer users will have the Adobe Flash plugin.
16. Here’s what your video workflow will look like:
a) Make one version that uses WebM (VP8 + Vorbis).
b) Make another version that uses H.264 baseline video and AAC “low complexity” audio in an MP4 container.
c) Make another version that uses Theora video and Vorbis audio in an Ogg container.
d) Link to all three video files from a single <video> element, and fall back to a Flash-based video player.
17. HTML5 gives you two ways to include video on your web page. Both of them involve the <video> element. If you only have one video file, you can simply link to it in a src attribute:
<video src="pr6.webm" width="320" height="240" controls></video>
You should always include width and height attributes in your <video> tags. Your browser will center the video inside the box defined by the <video> tag. It won’t ever be smooshed or stretched out of proportion. By default, the <video> element will not expose any sort of player controls. The <video> element has methods like play() and pause() and a read/write property called currentTime . There are also read/write volume and muted properties. You can tell the browser to display a built-in set of controls by including the controls attribute in your <video> tag.
18. The preload attribute tells the browser that you would like it to start downloading the video file as soon as the page loads. This makes sense if the entire point of the page is to view the video. On the other hand, if it’s just supplementary material that only a few visitors will watch, then you can set preload to none to tell the browser to minimize network traffic.
19. The autoplay attribute does exactly what it sounds like: it tells the browser that you would like it to start downloading the video file as soon as the page loads, and you would like it to start playing the video automatically as soon as possible.
20. Each <video> element can contain more than one <source> element. Your browser will go down the list of video sources, in order, and play the first one it’s able to play. You’ll save a lot of network traffic if you tell the browser up-front about each video. You do this with the type attribute on the <source> element:
<video width="320" height="240" controls> <source src="pr6.mp4" type='video/mp4; codecs="avc1.42E01E, mp4a.40.2"'> <source src="pr6.webm" type='video/webm; codecs="vp8, vorbis"'> <source src="pr6.ogv" type='video/ogg; codecs="theora, vorbis"'> </video>
The type attribute is a combination of three pieces of information: the container format, the video codec, and the audio codec.
21. You can nest an <object> element within a <video> element. Browsers that don’t support HTML5 video will ignore the <video> element and simply render the nested <object> instead, which will invoke the Flash plug-in and play the movie through FlowPlayer. Browsers that support HTML5 video will find a video source they can play and play it, and ignore the nested <object> element altogether.
22. The poster attribute of the <video> element allows you to display a custom image while the video is loading, or until the user presses “play.”