From www.xvid.org
Encode:
+--------------------------------------------------------------------+
Short explanation for the XviD data strutures and routines
The encoding part
If you have further questions, visit http://www.xvid.org
+--------------------------------------------------------------------+
Document version :
$Id: xvid-encoder.txt,v 1.3 2002/06/27 14:49:05 edgomez Exp $
+--------------------------------------------------------------------+
| Abstract
+--------------------------------------------------------------------+
This document presents the basic structures and API of XviD. It tries
to explain how to use them to obtain a simple profile compliant MPEG4
stream feeding the encoder with a sequence of frames.
+-------------------------------------------------------------------+
| Document
+-------------------------------------------------------------------+
Chapter 1 : The XviD version
+-----------------------------------------------------------------+
The Xvid version is defined at library compilation time using the
constant defined in xvid.h
#define API_VERSION ((2 << 16) | (1))
Where 2 stands for the major XviD version, and 1 for the minor version
number.
The current version of the API is 2.1 and should be incremented each
time a user defined structure is modified (XVID_INIT_PARAM,
XVID_ENC_PARAM ... we will discuss about them later).
When you're writing a program/library which uses the XviD library, you
must check your XviD API version against the available library
version. We will see how to check the version number in the next
chapter.
Chapter 2 : The XVID_INIT_PARAM
+-----------------------------------------------------------------+
typedef struct
{
int cpu_flags; [in/out]
int api_version; [out]
int core_build; [out]
} XVID_INIT_PARAM;
Used in: xvid_init(NULL, 0, &xinit, NULL);
This tructure is used and filled by the xvid_init function depending
on the cpu_flags value.
List of valid flags for the cpu_flags member :
- XVID_CPU_MMX : cpu feature
- XVID_CPU_MMXEXT : cpu feature
- XVID_CPU_SSE : cpu feature
- XVID_CPU_SSE2 : cpu feature
- XVID_CPU_3DNOW : cpu feature
- XVID_CPU_3DNOWEXT : cpu feature
- XVID_CPU_TSC : cpu feature
- XVID_CPU_IA64 : cpu feature
- XVID_CPU_CHKONLY : command
- XVID_CPU_FORCE : command
In order to set a flag : xinit.cpu_flags |= desired_flag_constant;
1st case : you call xvid_init without setting the XVID_CPU_CHKONLY or
the XVID_CPU_FORCE flag, the xvid_init function detects auto magically
the host cpu features and fills the cpu_flags member. The xvid_init
function also performs all internal function pointers initialization
according to deteced features and then returns XVID_ERR_OK.
2nd case : you call xvid_init setting the XVID_CPU_CHKONLY flag, the
xvid_init function will just detect the host cpu features and return
XVID_ERR_OK without initializing the internal function pointers (NB:
The XviD library is not usable after such a call to xvid_init).
3rd case : you call xvid_init with the cpu_flags XVID_CPU_FORCE and
desired feature flags set up (eg : XVID_CPU_SSE | XVID_CPU_MMX). In
this case you force XviD to use the given cpu features passed in the
cpu_flags member. Use this if you know what you're doing.
NB for PowerPC archs : the ppc arch has not automatic detection, the
library must be compiled for a specific ppc target using the right
Makefile (the cpu_flags is irrevelevant for these archs). Use
Makefile.linuxppc for standard ppc optimized functions and
Makefile.linuxppc_altivec for altivec simd optimized functions.
NB for IA64 archs : There's optimized ia64 assembly functions provided
in the library, they must be forced using the
XVID_CPU_FORCE|XVID_CPU_IA64 pair of flags.
To check the XviD library version against your own XviD header file,
you have just to call the xvid_init function (no matter the cpu_flags)
and compare the returnded xinit.api_version integer with your
API_VERSION number. The core_build build member is not relevant at the
moment but is reserved for future use (when XviD would have reached a
certain stability in its API and releases).
Chapter 3 : XVID_ENC_PARAM structure
+-----------------------------------------------------------------+
typedef struct
{
int width, height; [in]
int fincr, fbase; [in]
int rc_bitrate; [in]
int rc_reaction_delay_factor; [in]
int rc_averaging_period; [in]
int rc_buffer; [in]
int max_quantizer; [in]
int min_quantizer; [in]
int max_key_interval; [in]
void *handle; [out]
}
XVID_ENC_PARAM;
Used in: xerr = xvid_encore(NULL, XVID_ENC_CREATE, &xparam, NULL);
This structure has to be filled to create a new encoding instance:
- width and height.
They have to be set to the size of the image to be encoded.
- fincr and fbase (<0 forces default value 25fps - [25,1]).
They are the MPEG-way of defining the framerate. If you have an
integer framerate, say 24, 25 or 30fps, use fincr=1, fbase=framerate.
However, if framerate is non-integer, like 23.996fps you can
e.g. multiply with 1000, getting fincr=1000 and fbase=23996, giving
you integer values again.
- rc_bitrate (<0 forces default value : 900000).
This the desired target bitrate. XviD will try to do its best to
respect this setting but keep in mind XviD is still in development and
it has not been tuned for very low bitrates.
- Any other rc_xxxx parameter are for the bit rate controler in order
to respect your rc_bitrate setting the best it can. (<0 forces
default values)
Default's are good enough and you should not change them.
ToDo : describe briefly their impact on the bit rate variations and
the rc_bitrate setting respect.
- min_quantizer and max_quantizer (<0 forces default values : 1,31).
These 2 memebers limit the range of allowed quantizers. Normally,
quantizer's range is [1..31], so min=1 and max=31.
NB : the HIGHER the quantizer, the LOWER the quality.
the HIGHER the quantizer, the HIGHER the compression ratio.
min_quant=1 is somewhat overkill, min_quant=2 is good enough max_quant
depends on what you encode, leave it with 31 or lower it to something
like 15 or 10 for better quality (but encoding with very low bitrate
might fail then).
- max_key_interval (<0 forces default value : 10*framerate == 10s)
This is the maximum value of frames between two keyframes
(I-frames). Keyframes are also inserted dynamically at scene breaks.
It is important to have some keyframes, even in longer scenes, if you
want to skip position in the resulting file, because skipping is only
possible from one keyframe to the next. However, keyframes are much
larger than non-keyframes, so do not use too many of them. A value of
framerate*10 is a good choice normally.
- handle
This is the returned internal encoder instance.
Chapter 4 : the XVID_ENC_FRAME structure.
+-----------------------------------------------------------------+
typedef struct
{
int general; [in]
int motion; [in]
void *bitstream; [in]
int length; [out]
void *image; [in]
int colorspace; [in]
unsigned char *quant_intra_matrix; [in]
unsigned char *quant_inter_matrix; [in]
int quant; [in]
int intra; [in/out]
HINTINFO hint; [in/out]
}
XVID_ENC_FRAME;
Used in:
xerr = xvid_encore(enchandle, XVID_ENC_ENCODE, &xframe, &xstats);
This is the main structure to encode a frame, it gives hints to the
encoder on how to process an image.
- general flag member.
The general flag member informs XviD on general algorithm choices made
by the library client.
Valid flags :
- XVID_CUSTOM_QMATRIX : informs xvid to use the custom user
matrices.
- XVID_H263QUANT : informs xvid to use H263 quantization
algorithm.
- XVID_MPEGQUANT : informs xvid to use MPEG quantization
algorithm.
- XVID_HALFPEL : informs xvid to perform a half pixel motion
estimation.
- XVID_ADAPTIVEQUANT : informs xvid to perform an adaptative
quantization.
- XVID_LUMIMASKING : infroms xvid to use a lumimasking algorithm.
- XVID_LATEINTRA : ???
- XVID_INTERLACING : informs xvid to use the MPEG4 interlaced
mode.
- XVID_TOPFIELDFIRST : ???
- XVID_ALTERNATESCAN : ???
- XVID_HINTEDME_GET : informs xvid to return Motion Estimation
vectors from the ME encoder algorithm. Used during a first pass.
- XVID_HINTEDME_SET : informs xvid to use the user given motion
estimation vectors as hints for the encoder ME algorithms. Used
during a 2nd pass.
- XVID_INTER4V : forces XviD to search a vector for each 8x8 block
within the 16x16 Macro Block. This mode should be used only if
the XVID_HALFPEL mode is activated (this could change in the
future).
- XVID_ME_ZERO : forces XviD to use the zero ME algorithm.
- XVID_ME_LOGARITHMIC : forces XviD to use the logarithmic
ME algorithm.
- XVID_ME_FULLSEARCH : forces XviD to use the full search ME
algorithm.
- XVID_ME_PMVFAST : forces XviD to use the PMVFAST ME algorithm.
- XVID_ME_EPZS : forces XviD to use the EPZS ME algorithm.
ToDo : fill the void entries in flags, and describe briefly each ME
algorithm.
- motion member.
Valid flags for 16x16 motion estimation (no XVID_INTER4V flag in the
general flag).
- PMV_ADVANCEDDIAMOND16 : XviD has a modified diamond algorithm
that performs a bit faster than the original one. Use this flag
if you want to use the speed optimized diamond serach. The
quality loss is not big (better quality than square search but
less than the normal diamond search).
- PMV_HALFPELDIAMOND16 : switches the search algorithm from 1 or 2
full pixels precision to 1 or 2 half pixel precision.
- PMV_HALFPELREFINE16 : After normal diamond search, an extra
halfpel refinement step is performed. Should always be used if
XVID_HALFPEL is on, because it gives a rather big increase in
quality.
- PMV_EXTSEARCH16 : Normal PMVfast predicts one start vector and
does diamond search around this position. EXTSEARCH means that 2
more start vectors are used: (0,0) and median predictor and
diamond search is done for those, too. Makes search slightly
slower, but quality sometimes gets better.
- PMV_EARLYSTOP16 : PMVfast and EPZS stop search if current best
is below some dynamic threshhold. No diamond search is done,
only halfpel refinement (if active). Without EARLYSTOP diamond
search is always done. That would be much slower, but not really
lead to better quality.
- PMV_QUICKSTOP16 : like EARLYSTOP, but not even halfpel
refinement is done. Normally worse quality, so it defaults to
off. Might be removed, too.
- PMV_UNRESTRICTED16 : "unrestricted ME" is a feature of
MPEG4. It's not implemented, so this flag is ignored (not even
checked).
- PMV_OVERLAPPING16 : same as unrestricted. Not implemented, nor
checked.
- PMV_USESQUARES16 : Replace the diamond search with a square
search.
Valid flags when using 4 vectors mode prediction. They have the same
meaning as their 16x16 counter part so we only give the list :
- PMV_ADVANCEDDIAMOND8
- PMV_HALFPELDIAMOND8
- PMV_HALFPELREFINE8
- PMV_EXTSEARCH8
- PMV_EARLYSTOP8
- PMV_QUICKSTOP8
- PMV_UNRESTRICTED8
- PMV_OVERLAPPING8
- PMV_USESQUARES8
- quant member.
The quantizer value is used when the DCT coefficients are divided to
zero those coefficients not important (according to the target bitrate
not the image quality :-)
Valid values :
- 0 (zero) : Then the rate controler chooses the right quantizer
for you. Tipically used in ABR encoding or first pass of a VBR
encoding session.
- != 0 : Then you force the encoder to use this specific
quantizer value. It is clamped in the interval
[1..31]. Tipically used during the 2nd pass of a VBR encoding
session.
- intra member.
[in usage]
The intra value decides wether the frame is going to be a keyframe or
not.
Valid values :
- 1 : forces the encoder to create a keyframe. Mainly used during
a VBR 2nd pass.
- 0 : forces the encoder not to create a keyframe. Minaly used
during a VBR second pass
- -1 : let the encoder decide (based on contents and
max_key_interval). Mainly used in ABR mode and dunring a 1st
VBR pass.
[out usage]
When first set to -1, the encoder returns the effective keyframe state
of the frame.
- 0 : the resulting frame is not a keyframe
- 1 : the resulting frame is a keyframe (scene change).
- 2 : the resulting frame is a keyframe (max_keyframe interval
reached)
- quant_intra_matrix and quant_inter_matrix members.
These are pointers to to a pair of user quantization matrices. You
must set the general XVID_CUSTOM_QMATRIX flag to make sure XviD uses
them.
When set to NULL, the default XviD matrices are used.
NB : each time the matrices change, XviD must write a header into the
bitstream, so try not changing these matrices very often. This will
save space.
Chapter 5 : The XVID_ENC_STATS structure
+-----------------------------------------------------------------+
typedef struct
{
int quant; // [out] frame quantizer
int hlength; // [out] header length (bytes)
int kblks, mblks, ublks; // [out]
} XVID_ENC_STATS;
Used in:
xerr = xvid_encore(enchandle, XVID_ENC_ENCODE, &xframe, &xstats);
In this structure the encoder return statistical data about the
encoding process, e.g. to be saved for two-pass-encoding. quant is
the quantizer chosen for this frame (if you let ratecontrol do it)
hlength is the length of the frame's header, including motion
information etc. kblks, mblks, ublks are unused at the moment.
Chapter 6 : The xvid_encode function
+-----------------------------------------------------------------+
int xvid_encore(void * handle,
int opt,
void * param1,
void * param2);
XviD uses a single-function API, so everything you want to do is done
by this routine. The opt parameter chooses the behaviour of the
routine:
XVID_ENC_CREATE: create a new encoder, XVID_ENC_PARAM in param1, a
handle to the new encoder is returned in handle.
XVID_ENC_ENCODE: encode one frame, XVID_ENC_FRAME-structure in param1,
XVID_ENC_STATS in param2 (or NULL, if you are not interested in
statistical data).
XVID_DEC_DESTROY: shut down this encoder, do not use handle afterwards.
Decode:
XviD core API overview: Decoding
+-----------------------------------------------------------------+
* Short explanation for the XviD data strutures and routines
*
* decoding part
*
* if you have further questions, visit http://www.xvid.org
*
+-----------------------------------------------------------------+
/* these are are structures/routines from xvid.h needed for decoding */
+-----------------------------------------------------------------+
#define API_VERSION ((1 << 16) | (0))
This is the revision of the xvid.h file that you have in front of you.
Check it against the
library's version.
+-----------------------------------------------------------------+
typedef struct
{
int cpu_flags; [in/out]
int api_version; [out]
int core_build; [out]
} XVID_INIT_PARAM;
This is filled by xvid_init with the correct CPU flags for initialization
(auto-detect), unless you pass flag to it (cpu_flags!=0). Do not use that
unless you really know what you are doing.
api_version can (should) be checked against API_VERSION, to see if you
have the right core library.
Used in: xvid_init(NULL, 0, &xinit, NULL);
+-----------------------------------------------------------------+
typedef struct
{
int width; [in] (should be a multiple of 16, max is )
int height; [in] (should be a multiple of 16, max is )
void *handle; [out]
} XVID_DEC_PARAM;
When creating decoder, you have to provide it with height and width of the
image to decode (this is _not_ in the bytestream itself!).
In handle a unique handle is given back, that has to be used to identify
this instance of decoding.
Used in: xerr = xvid_decore(NULL, XVID_DEC_CREATE, &xparam, NULL);
+-----------------------------------------------------------------+
typedef struct
{
void * bitstream; [in]
int length; [in]
void * image; [in]
int stride; [in]
int colorspace; [in]
} XVID_DEC_FRAME;
This is the main structure for decoding itself. You provide the
MPEG4-bitstream and it's length,
image is the position where the decoded picture should be stored.
stride is the difference between the memory address of the first pixel of
a row in the image and the first pixel of the next row. If the image is
going to be one big block, then stride=width, but by making it larger you
can create an "edged" picture.
By colorspace the output format for the image is given, XVID_CSP_RGB24 or
XVID_CSP_YV12 might be might common.
A special case is XVID_CSP_USER. If you use this, then *image will not
filled with the image but with a structure that contains pointers to the
decoder's internal representation of it. That's faster, because no memcopy
is involved, but don't use it, if you don't know what you're doing.
Used in: xerr = xvid_decore(dechandle, XVID_DEC_DECODE, &xframe, NULL);
+-----------------------------------------------------------------+
int xvid_decore(void * handle, [in/out]
int opt, [in]
void * param1, [in]
void * param2); [in]
XviD uses a single-function API, so everything you want to do is done by
this routine. The opt parameter chooses the behaviour of the routine:
XVID_DEC_CREATE: create a new decoder, XVID_DEC_PARAM in param1,
a handle to the new decoder is returned in handle
XVID_DEC_DECODE: decode one frame, XVID_DEC_FRAME-structure in param1
XVID_DEC_DESTROY: shut down this decoder, do not use handle afterwards