ColorPaper

xml

The Extensible Markup Language (XML) is a general-purpose specification for creating custom markup languages.^[1] It is classified as an extensible language because it allows its users to define their own elements. Its primary purpose is to facilitate the sharing of structured data across different information systems, particularly via the Internet,^[2] and it is used both to encode documents and to serialize data. In the latter context, it is comparable with other text-based serialization languages such as JSON and YAML.^[3]

It started as a simplified subset of the Standard Generalized Markup Language (SGML), and is designed to be relatively human-legible（人可读的）. By adding semantic constraints(语义约束), application languages can be implemented in XML. These include XHTML,^[4] RSS, MathML, GraphML, Scalable Vector Graphics, MusicXML, and thousands of others. Moreover, XML is sometimes used as the specification language for such application languages.

XML is recommended by the World Wide Web Consortium. It is a fee-free open standard. The W3C recommendation specifies both the lexical grammar and the requirements for parsing.

Well-formed and valid XML documents

There are two levels of correctness(正确性) of an XML document:

Well-formed. A well-formed document conforms to all of XML's syntax rules. For example, if a start-tag appears without a corresponding end-tag, it is not well-formed. A document that is not well-formed is not considered to be XML; a conforming parser is not allowed to process it.
Valid. A valid document additionally conforms to some semantic rules. These rules are either user-defined, or included as an XML schema or DTD. For example, if a document contains an undefined element, then it is not valid; a validating parser is not allowed to process it.

Well-formed documents: XML syntax

As long as only well-formedness is required, XML is a generic framework for storing any amount of text or any data whose structure can be represented as a tree. The only indispensable syntactical requirement is that the document has exactly one root element (alternatively called the document element). This means that the text must be enclosed between a root start-tag and a corresponding end-tag. The following is a "well-formed" XML document:

>This is a book.... >

The root element can be preceded by an optional XML declaration. This element states what version of XML is in use (normally 1.0); it may also contain information about character encoding and external dependencies.

 version="1.0" encoding="UTF-8"?>

The specification requires that processors of XML support the pan-Unicode character encodings UTF-8 and UTF-16 (UTF-32 is not mandatory). The use of more limited encodings, such as those based on ISO/IEC 8859, is acknowledged and is widely used and supported.

Comments can be placed anywhere in the tree, including in the text if the content of the element is text or #PCDATA.

XML comments start with . Two dashes (--) may not appear anywhere in the text of the comment.

In any meaningful application, additional markup is used to structure the contents of the XML document. The text enclosed by the root tags may contain an arbitrary number of XML elements. The basic syntax for one element is:

 attribute="value">KrizerX 0915551234... txt me>

The two instances of »name« are referred to as the start-tag and end-tag, respectively. Here, »content« is some text which may again contain XML elements. So, a generic XML document contains a tree-based data structure. Here is an example of a structured XML document:

 name="bread" prep_time="5 mins" cook_time="3 hours">
>Basic bread>
 amount="3" unit="cups">Flour>
 amount="0.25" unit="ounce">Yeast>
 amount="1.5" unit="cups" state="warm">Water>
 amount="1" unit="teaspoon">Salt>
>
>Mix all ingredients together.>
>Knead thoroughly.>
>Cover with a cloth, and leave for one hour in warm room.>
>Knead again.>
>Place in a bread baking tin.>
>Cover with a cloth, and leave for one hour in warm room.>
>Bake in the oven at 350°F for 30 minutes.>
>
>

Attribute values must always be quoted, using single or double quotes; and each attribute name should appear only once in any element.

XML requires that elements be properly nested — elements may never overlap. For example, the code below is not well-formed XML, because the title and authorelements overlap:


>Book on Logic>Aristotle>Another Book on Logic>Boole>>>>

>Book on Logic> >Aristotle> >Another Book on Logic> >Boole>
Alternatively,
>Book on Logic> >Aristotle>Another Book on Logic>Boole>>>

XML provides special syntax for representing an element with empty content. Instead of writing a start-tag followed immediately by an end-tag, a document may contain an empty-element tag. An empty-element tag resembles a start-tag but contains a slash just before the closing angle bracket. The following three examples are equivalent in XML:

>>
 />
/>

An empty-element may contain attributes:

 author="John Smith" genre="science-fiction" date="2009-Jan-01" />

Entity references

An entity in XML is a named body of data, usually text. Entities are often used to represent single characters that cannot easily be entered on the keyboard; they are also used to represent pieces of standard ("boilerplate") text that occur in many documents, especially if there is a need to allow such text to be changed in one place only.

Special characters can be represented either using entity references, or by means of numeric character references. An example of a numeric character reference is "€", which refers to the Euro symbol by means of its Unicode codepoint in hexadecimal.

An entity reference is a placeholder that represents that entity. It consists of the entity's name preceded by an ampersand ("&") and followed by a semicolon (";"). XML has five predeclared entities:

`&`	&	ampersand
`<`	<	less than
`>`	>	greater than
`'`	'	apostrophe
`"`	"	quotation mark

Here is an example using a predeclared XML entity to represent the ampersand in the name "AT&T":

>AT&T>

Additional entities (beyond the predefined ones) can be declared in the document's Document Type Definition (DTD). A basic example of doing so in a minimal internal DTD follows. Declared entities can describe single characters or pieces of text, and can reference each other.

 version="1.0" encoding="UTF-8"?>

    
<!ENTITY copyright-notice "Copyright © 2006, XYZ Enterprises">
]>
>
©right-notice;
>

When viewed in a suitable browser, the XML document above appears as:

 Copyright © 2006, XYZ Enterprises

Numeric character references

Numeric character references look like entity references, but instead of a name, they contain the "#" character followed by a number. The number (in decimal or "x"-prefixed hexadecimal) represents a Unicode code point. Unlike entity references, they are neither predeclared nor do they need to be declared in the document's DTD. They have typically been used to represent characters that are not easily encodable, such as an Arabic character in a document produced on a European computer. The ampersand in the "AT&T" example could also be escaped like this (decimal 38 and hexadecimal 26 both represent the Unicode code point for the "&" character):

>AT&T>
>AT&T>

Similarly, in the previous example, notice that “©” is used to generate the “©” symbol.

Well-formed documents

In XML, a well-formed document must conform to the following rules, among others:

Non-empty elements are delimited by both a start-tag and an end-tag.
Empty elements may be marked with an empty-element (self-closing) tag, such as . This is equal to .
All attribute values are quoted with either single (') or double (") quotes. Single quotes close a single quote and double quotes close a double quote.
Tags may be nested but must not overlap. Each non-root element must be completely contained in another element.
The document complies with its declared character encoding. The encoding may be declared or implied externally, such as in "Content-Type" headers when a document is transported via HTTP, or internally, using explicit markup at the very beginning of the document. When no such declaration exists, a Unicode encoding is assumed, as defined by a Unicode Byte Order Mark before the document's first character. If the mark does not exist, UTF-8 encoding is assumed.

Element names are case-sensitive. For example, the following is a well-formed matching pair:

...

whereas this is not

...

By carefully choosing the names of the XML elements one may convey the meaning of the data in the markup. This increases human readability while retaining the rigor needed for software parsing.

Choosing meaningful names implies the semantics of elements and attributes to a human reader without reference to external documentation. However, this can lead to verbosity, which complicates authoring and increases file size.

Automatic verification

It is relatively simple to verify that a document is well-formed or validated XML, because the rules of well-formedness and validation of XML are designed for portability of tools. The idea is that any tool designed to work with XML files will be able to work with XML files written in any XML language (or XML application). One example of using an independent tool follows:

load it into an XML-capable browser, such as Firefox or Internet Explorer
use a tool like xmlwf (usually bundled with expat)
parse the document, for instance in Ruby:

irb> require "rexml/document"
irb> include REXML
irb> doc = Document.new(File.new("test.xml")).root

Valid documents: XML semantics

By leaving the names, allowable hierarchy, and meanings of the elements and attributes open and definable by a customizable schema or DTD, XML provides a syntactic foundation for the creation of purpose-specific, XML-based markup languages. The general syntax of such languages is rigid — documents must adhere to the general rules of XML, ensuring that all XML-aware software can at least read and understand the relative arrangement of information within them. The schema merely supplements the syntax rules with a set of constraints. Schemas typically restrict element and attribute names and their allowable containment hierarchies, such as only allowing an element named 'birthday' to contain one element named 'month' and one element named 'day', each of which has to contain only character data. The constraints in a schema may also include data type assignments that affect how information is processed; for example, the 'month' element's character data may be defined as being a month according to a particular schema language's conventions, perhaps meaning that it must not only be formatted a certain way, but also must not be processed as if it were some other type of data.

An XML document that complies with a particular schema/DTD, in addition to being well-formed, is said to be valid.

An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic constraints imposed by XML itself. A number of standard and proprietary XML schema languages have emerged for the purpose of formally expressing such schemas, and some of these languages are XML-based, themselves.

Before the advent of generalised data description languages such as SGML and XML, software designers had to define special file formats or small languages to share data between programs. This required writing detailed specifications and special-purpose parsers and writers.

XML's regular structure and strict parsing rules allow software designers to leave parsing to standard tools, and since XML provides a general, data model-oriented framework for the development of application-specific languages, software designers need only concentrate on the development of rules for their data, at relatively high levels of abstraction.

Well-tested tools exist to validate an XML document "against" a schema: the tool automatically verifies whether the document conforms to constraints expressed in the schema. Some of these validation tools are included in XML parsers, and some are packaged separately.

Other usages of schemas exist: XML editors, for instance, can use schemas to support the editing process (by suggesting valid elements and attributes names, etc).

DTD

Main article: Document Type Definition

The oldest schema format for XML is the Document Type Definition (DTD), inherited from SGML. While DTD support is ubiquitous due to its inclusion in the XML 1.0 standard, it is seen as limited for the following reasons:

It has no support for newer features of XML, most importantly namespaces.
It lacks expressiveness. Certain formal aspects of an XML document cannot be captured in a DTD.
It uses a custom non-XML syntax, inherited from SGML, to describe the schema.

DTD is still used in many applications because it is considered the easiest to read and write.

XML Schema

Main article: XML Schema (W3C)

A newer XML schema language, described by the W3C as the successor of DTDs, is XML Schema, or more informally referred to by the initialism for XML Schema instances, XSD (XML Schema Definition). XSDs are far more powerful than DTDs in describing XML languages. They use a rich datatyping system, allow for more detailed constraints on an XML document's logical structure, and must be processed in a more robust validation framework. XSDs also use an XML-based format, which makes it possible to use ordinary XML tools to help process them, although XSD implementations require much more than just the ability to read XML.

Criticisms of XSD include the following:

The specification is very large, which makes it difficult to understand and implement.
The XML-based syntax leads to verbosity in schema descriptions, which makes XSDs harder to read and write.
Schema validation can be an expensive addition to XML parsing, especially for high volume systems.
The modeling capabilities are very limited, with no ability to allow attributes to influence content models.
The type derivation model is very limited, in particular that derivation by extension is rarely useful.
Database-related data transfer is supported with arcane ideas such as nillability, but the requirements of industrial publishing are under-supported.
The key/keyref/uniqueness mechanisms are not type-aware.
The PSVI concept (Post Schema Validation Infoset) does not have a standard XML representation or Application Programming Interface, thus it works againstvendor independence unless revalidation is performed.

RELAX NG

Main article: RELAX NG

Another popular schema language for XML is RELAX NG. Initially specified by OASIS, RELAX NG is now also an ISO international standard (as part of DSDL). It has two formats: an XML based syntax and a non-XML compact syntax. The compact syntax aims to increase readability and writability but, since there is a well-defined way to translate the compact syntax to the XML syntax and back again by means of James Clark's Trang conversion tool, the advantage of using standard XML tools is not lost. RELAX NG has a simpler definition and validation framework than XML Schema, making it easier to use and implement. It also has the ability to use datatype frameworkplug-ins; a RELAX NG schema author, for example, can require values in an XML document to conform to definitions in XML Schema Datatypes.

ISO DSDL and other schema languages

The ISO DSDL (Document Schema Description Languages) standard brings together a comprehensive set of small schema languages, each targeted at specific problems. DSDL includes RELAX NG full and compact syntax, Schematron assertion language, and languages for defining datatypes, character repertoire constraints, renaming and entity expansion, and namespace-based routing of document fragments to different validators. DSDL schema languages do not have the vendor support of XML Schemas yet, and are to some extent a grassroots reaction of industrial publishers to the lack of utility of XML Schemas for publishing.

Some schema languages not only describe the structure of a particular XML format but also offer limited facilities to influence processing of individual XML files that conform to this format. DTDs and XSDs both have this ability; they can for instance provide attribute defaults. RELAX NG and Schematron intentionally do not provide these; for example the infoset augmentation facility.

International use

XML supports the direct use of almost any Unicode character in element names, attributes, comments, character data, and processing instructions (other than the ones that have special symbolic meaning in XML itself, such as the open corner bracket, "<"). Therefore, the following is a well-formed XML document, even though it includes both Chinese and Cyrillic characters:

 version="1.0" encoding="UTF-8"?>
<俄語>Данные俄語>

Displaying XML on the web

XML documents do not carry information about how to display the data. Without using CSS or XSL, a generic XML document is rendered as raw XML text by most web browsers. Some display it with 'handles' (e.g. + and - signs in the margin) that allow parts of the structure to be expanded or collapsed with mouse-clicks.

In order to style the rendering in a browser with CSS, the XML document must include a reference to the stylesheet:

 type="text/css" href="myStyleSheet.css"?>

Note that this is different from specifying such a stylesheet in HTML, which uses the element.

Extensible Stylesheet Language (XSL) can be used to alter the format of XML data, either into HTML or other formats that are suitable for a browser to display.

To specify client-side XSL Transformation (XSLT), the following processing instruction is required in the XML:

 type="text/xsl" href="myTransform.xslt"?>

Client-side XSLT is supported by many web browsers. Alternatively, one may use XSL to convert XML into a displayable format on the server rather than being dependent on the end-user's browser capabilities. The end-user is not aware of what has gone on 'behind the scenes'; all they see is well-formatted, displayable data.

See the XSLT article for an example of server-side XSLT in action.

XML extensions

XPath makes it possible to refer to individual parts of an XML document. This provides random access to XML data for other technologies, including XSLT, XSL-FO, XQuery etc. XPath expressions can refer to all or part of the text, data and values in XML elements, attributes, processing instructions, comments etc. They can also access the names of elements and attributes. XPaths can be used in both valid and well-formed XML, with and without defined namespaces.
XInclude defines the ability for XML files to include all or part of an external file. When processing is complete, the final XML infoset has no XInclude elements, but instead has copied the documents or parts thereof into the final infoset. It uses XPath to refer to a portion of the document for partial inclusions.
XQuery is to XML what SQL and PL/SQL are to relational databases: ways to access, manipulate and return XML.
XML Namespaces enable the same document to contain XML elements and attributes taken from different vocabularies, without any naming collisions occurring.
XML Signature defines the syntax and processing rules for creating digital signatures on XML content.
XML Encryption defines the syntax and processing rules for encrypting XML content.
XPointer is a system for addressing components of XML-based internet media.

XML files may be served with a variety of Media types. RFC 3023 defines the types "application/xml" and "text/xml", which say only that the data is in XML, and nothing about its semantics. The use of "text/xml" has been criticized as a potential source of encoding problems but is now in the process of being deprecated.^[5] RFC 3023 also recommends that XML-based languages be given media types beginning in "application/" and ending in "+xml"; for example "application/atom+xml" for Atom. This page discusses further XML and MIME.

Processing XML files

Three traditional techniques for processing XML files are:

Using a programming language and the SAX API.
Using a programming language and the DOM API.
Using a transformation engine and a filter

More recent and emerging techniques for processing XML files are:

Pull Parsing
Data binding

Simple API for XML (SAX)

SAX is a lexical, event-driven interface in which a document is read serially and its contents are reported as "callbacks" to various methods on a handler object of the user's design. SAX is fast and efficient to implement, but difficult to use for extracting information at random from the XML, since it tends to burden the application author with keeping track of what part of the document is being processed. It is better suited to situations in which certain types of information are always handled the same way, no matter where they occur in the document.

DOM

DOM is an interface-oriented Application Programming Interface that allows for navigation of the entire document as if it were a tree of "Node" objects representing the document's contents. A DOM document can be created by a parser, or can be generated manually by users (with limitations). Data types in DOM Nodes are abstract; implementations provide their own programming language-specific bindings. DOM implementations tend to be memory intensive, as they generally require the entire document to be loaded into memory and constructed as a tree of objects before access is allowed. DOM is supported in Java by several packages that usually come with the standard libraries. As the DOM specification is regulated by the World Wide Web Consortium, the main interfaces (Node, Document, etc.) are in the package org.w3c.dom.*, as well as some of the events and interfaces for other capabilities like serialization (output). The package com.sun.org.apache.xml.internal.serialize.* provides the serialization (output capacities) by implementing the appropriate interfaces, while the javax.xml.parsers.* package parses data to create DOM XML documents for manipulation. [2]

Transformation engines and filters

A filter in the Extensible Stylesheet Language (XSL) family can transform an XML file for displaying or printing.

XSL-FO is a declarative, XML-based page layout language. An XSL-FO processor can be used to convert an XSL-FO document into another non-XML format, such as PDF.
XSLT is a declarative, XML-based document transformation language. An XSLT processor can use an XSLT stylesheet as a guide for the conversion of the data tree represented by one XML document into another tree that can then be serialized as XML, HTML, plain text, or any other format supported by the processor.
XQuery is a W3C language for querying, constructing and transforming XML data.
XPath is a DOM-like node tree data model and path expression language for selecting data within XML documents. XSL-FO, XSLT and XQuery all make use of XPath. XPath also includes a useful function library.

Pull parsing

Pull parsing ^[6] treats the document as a series of items which are read in sequence using the Iterator design pattern. This allows for writing of recursive-descent parsers in which the structure of the code performing the parsing mirrors the structure of the XML being parsed, and intermediate parsed results can be used and accessed as local variables within the methods performing the parsing, or passed down (as method parameters) into lower-level methods, or returned (as method return values) to higher-level methods. Examples of pull parsers include StAX in the Java programming language, SimpleXML in PHP and System.Xml.XmlReader in .NET.

A pull parser creates an iterator that sequentially visits the various elements, attributes, and data in an XML document. Code which uses this 'iterator' can test the current item (to tell, for example, whether it is a start or end element, or text), and inspect its attributes (local name, namespace, values of XML attributes, value of text, etc.), and can also move the iterator to the 'next' item. The code can thus extract information from the document as it traverses it. The recursive-descent approach tends to lend itself to keeping data as typed local variables in the code doing the parsing, while SAX, for instance, typically requires a parser to manually maintain intermediate data within a stack of elements which are parent elements of the element being parsed. Pull-parsing code can be more straightforward to understand and maintain than SAX parsing code.

Data binding

Another form of XML Processing API is data binding, where XML data is made available as a custom, strongly typed programming language data structure, in contrast to the interface-oriented DOM. Example data binding systems include the Java Architecture for XML Binding (JAXB)^[7].

Non-extractive XML Processing API

Non-extractive XML Processing API is a new and emerging category of parsers that aim to overcome the fundamental limitations of DOM and SAX. The most representative is VTD-XML, which abolishes the object-oriented modeling of XML hierarchy and instead uses 64-bit Virtual Token Descriptors (encoding offsets, lengths, depths, and types) of XML tokens. VTD-XML's approach enables a number of interesting features/enhancements, such as high performance, low memory usage ^[8], ASIC implementation ^[9], incremental update ^[10], and native XML indexing ^[11] ^[12].

Specific XML applications and editors

The native file format of OpenOffice.org, AbiWord, and Apple's iWork applications is XML. Some parts of Microsoft Office 2007 are also able to edit XML files with a user-supplied schema (but not a DTD), and Microsoft has released a file format compatibility kit for Office 2003 that allows previous versions of Office to save in the new XML based format. There are dozens of other XML editors available.

History

The versatility of SGML for dynamic information display was understood by early digital media publishers in the late 1980s prior to the rise of the Internet.^[13]^[14] By the mid-1990s some practitioners of SGML had gained experience with the then-new World Wide Web, and believed that SGML offered solutions to some of the problems the Web was likely to face as it grew. Dan Connolly added SGML to the list of W3C's activities when he joined the staff in 1995; work began in mid-1996 when Jon Bosak developed a charter and recruited collaborators. Bosak was well connected in the small community of people who had experience both in SGML and the Web. He received support in his efforts from Microsoft.

XML was compiled by a working group of eleven members,^[15] supported by an (approximately) 150-member Interest Group. Technical debate took place on the Interest Group mailing list and issues were resolved by consensus or, when that failed, majority vote of the Working Group. A record of design decisions and their rationales was compiled by Michael Sperberg-McQueen on December 4th 1997.^[16] James Clark served as Technical Lead of the Working Group, notably contributing the empty-element "" syntax and the name "XML". Other names that had been put forward for consideration included "MAGMA" (Minimal Architecture for Generalized Markup Applications), "SLIM" (Structured Language for Internet Markup) and "MGML" (Minimal Generalized Markup Language). The co-editors of the specification were originally Tim Bray and Michael Sperberg-McQueen. Halfway through the project Bray accepted a consulting engagement with Netscape, provoking vociferous protests from Microsoft. Bray was temporarily asked to resign the editorship. This led to intense dispute in the Working Group, eventually solved by the appointment of Microsoft'sJean Paoli as a third co-editor.

The XML Working Group never met face-to-face; the design was accomplished using a combination of email and weekly teleconferences. The major design decisions were reached in twenty weeks of intense work between July and November of 1996, when the first Working Draft of an XML specification was published.^[17] Further design work continued through 1997, and XML 1.0 became a W3C Recommendation on February 10, 1998.

XML 1.0 achieved the Working Group's goals of Internet usability, general-purpose usability, SGML compatibility, facilitation of easy development of processing software, minimization of optional features, legibility, formality, conciseness, and ease of authoring. Like its antecedent SGML, XML allows for some redundant syntactic constructs and includes repetition of element identifiers. In these respects, terseness was not considered essential in its structure.

Sources

XML is a profile of an ISO standard SGML, and most of XML comes from SGML unchanged. From SGML comes the separation of logical and physical structures (elements and entities), the availability of grammar-based validation (DTDs), the separation of data and metadata (elements and attributes), mixed content, the separation of processing from representation (processing instructions), and the default angle-bracket syntax. Removed were the SGML Declaration (XML has a fixed delimiter set and adopts Unicode as the document character set).

Other sources of technology for XML were the Text Encoding Initiative (TEI), which defined a profile of SGML for use as a 'transfer syntax'; HTML, in which elements were synchronous with their resource, the separation of document character set from resource encoding, the xml:lang attribute, and the HTTP notion that metadata accompanied the resource rather than being needed at the declaration of a link; and the Extended Reference Concrete Syntax (ERCS), from which XML 1.0's naming rules were taken, and which had introduced hexadecimal numeric character references and the concept of references to make available all Unicode characters.

Ideas that developed during discussion which were novel in XML, were the algorithm for encoding detection and the encoding header, the processing instruction target, the xml:space attribute, and the new close delimiter for empty-element tags.

Versions

There are two current versions of XML. The first, XML 1.0, was initially defined in 1998. It has undergone minor revisions since then, without being given a new version number, and is currently in its fourth edition, as published on August 16, 2006. It is widely implemented and still recommended for general use. The second, XML 1.1, was initially published on February 4, 2004, the same day as XML 1.0 Third Edition, and is currently in its second edition, as published on August 16, 2006. It contains features — some contentious — that are intended to make XML easier to use in certain cases^[18] - mainly enabling the use of line-ending characters used on EBCDIC platforms, and the use of scripts and characters absent from Unicode 2.0. XML 1.1 is not very widely implemented and is recommended for use only by those who need its unique features. ^[19]

XML 1.0 and XML 1.1 differ in the requirements of characters used for element and attribute names: XML 1.0 only allows characters which are defined in Unicode 2.0, which includes most world scripts, but excludes those which were added in later Unicode versions. Among the excluded scripts are Mongolian, Cambodian, Amharic, Burmese, and others.

Almost any Unicode character can be used in the character data and attribute values of an XML 1.1 document, even if the character is not defined, aside from having a code point, in the current version of Unicode. The approach in XML 1.1 is that only certain characters are forbidden, and everything else is allowed, whereas in XML 1.0, only certain characters are explicitly allowed, thus XML 1.0 cannot accommodate the addition of characters in future versions of Unicode.

In character data and attribute values, XML 1.1 allows the use of more control characters than XML 1.0, but, for "robustness", most of the control characters introduced in XML 1.1 must be expressed as numeric character references. Among the supported control characters in XML 1.1 are two line break codes that must be treated as whitespace. Whitespace characters are the only control codes that can be written directly.

There are also discussions on an XML 2.0, although it remains to be seen^[vague] if such will ever come about. XML-SW (SW for skunk works), written by one of the original developers of XML, contains some proposals for what an XML 2.0 might look like: elimination of DTDs from syntax, integration of namespaces, XML Base and XML Information Set (infoset) into the base standard.

The World Wide Web Consortium also has an XML Binary Characterization Working Group doing preliminary research into use cases and properties for a binary encoding of the XML infoset. The working group is not chartered to produce any official standards. Since XML is by definition text-based, ITU-T and ISO are using the name Fast Infoset[3] for their own binary infoset to avoid confusion (see ITU-T Rec. X.891 | ISO/IEC 24824-1).

Patent claims

In October 2005 the small company Scientigo publicly asserted that two of its patents, U.S. Patent 5,842,213 and U.S. Patent 6,393,426 , apply to the use of XML. The patents cover the "modeling, storage and transfer [of data] in a particular non-hierarchical, non-integrated neutral form", according to their applications, which were filed in 1997 and 1999. Scientigo CEO Doyal Bryant expressed a desire to "monetize" the patents but stated that the company was "not interested in having us against the world." He said that Scientigo was discussing the patents with several large corporations.^[20]

XML users and independent experts responded to Scientigo's claims with widespread skepticism and criticism. Some derided the company as a patent troll. Tim Bray described any claims that the patents covered XML as "ridiculous on the face of it".^[21]

Because there exists a large amount of prior art relating to XML, including SGML, some legal experts believed it would be difficult for Scientigo to enforce its patents through litigation.^{[citation needed]}

Critique of XML

Commentators have offered various critiques of XML, suggesting circumstances where XML provides both advantages and potential disadvantages.^[22]

Advantages of XML

It is text-based.
It supports Unicode, allowing almost any information in any written human language to be communicated.
It can represent common computer science data structures: records, lists and trees.
Its self-documenting format describes structure and field names as well as specific values.
The strict syntax and parsing requirements make the necessary parsing algorithms extremely simple, efficient, and consistent.
XML is heavily used as a format for document storage and processing, both online and offline.
It is based on international standards.
It can be updated incrementally.
It allows validation using schema languages such as XSD and Schematron, which makes effective unit-testing, firewalls, acceptance testing, contractual specification and software construction easier.
The hierarchical structure is suitable for most (but not all) types of documents.
It is platform-independent, thus relatively immune to changes in technology.
Forward and backward compatibility are relatively easy to maintain despite changes in DTD or Schema.
Its predecessor, SGML, has been in use since 1986, so there is extensive experience and software available.
An element fragment of a well-formed XML document is also a well-formed XML document.^{[citation needed]}

Disadvantages of XML

XML syntax is redundant or large relative to binary representations of similar data,^[23] especially with tabular data.
The redundancy may affect application efficiency through higher storage, transmission and processing costs.^[24]^[25]
XML syntax is verbose, especially for human readers, relative to other alternative 'text-based' data transmission formats.^[26]^[27]
The hierarchical model for representation is limited in comparison to an object oriented graph.^[28]^[29]
Expressing overlapping (non-hierarchical) node relationships requires extra effort.^[30]
XML namespaces are problematic to use and namespace support can be difficult to correctly implement in an XML parser.^[31]
XML is commonly depicted as "self-documenting" but this depiction ignores critical ambiguities.^[32]^[33]
The distinction between content and attributes in XML seems unnatural to some and makes designing XML data structures harder.^[34]

Standardization

In addition to the ISO standards mentioned above, other related document include

ISO/IEC 8825-4:2002 Information technology -- ASN.1 encoding rules: XML Encoding Rules (XER)
ISO/IEC 8825-5:2004 Information technology -- ASN.1 encoding rules: Mapping W3C XML schema definitions into ASN.1
ISO/IEC 9075-14:2006 Information technology -- Database languages -- SQL -- Part 14: XML-Related Specifications (SQL/XML)

ISO 10303-28:2007 Industrial automation systems and integration -- Product data representation and exchange -- Part 28: Implementation methods: XML representations of EXPRESS schemas and data, using XML schemas
ISO/IEC 13250-3:2007 Information technology -- Topic Maps -- Part 3: XML syntax
ISO/IEC 13522-5:1997 Information technology -- Coding of multimedia and hypermedia information -- Part 5: Support for base-level interactive applications
ISO/IEC 13522-8:2001 Information technology -- Coding of multimedia and hypermedia information -- Part 8: XML notation for ISO/IEC 13522-5
ISO/IEC 18056:2007 Information technology -- Telecommunications and information exchange between systems -- XML Protocol for Computer Supported Telecommunications Applications (CSTA) Phase III
ISO/IEC 19503:2005 Information technology -- XML Metadata Interchange (XMI)
ISO/IEC 19776-1:2005 Information technology -- Computer graphics, image processing and environmental data representation -- Extensible 3D (X3D) encodings -- Part 1: Extensible Markup Language (XML) encoding

ISO/IEC 22537:2006 Information technology -- ECMAScript for XML (E4X) specification
ISO 22643:2003 Space data and information transfer systems -- Data entity dictionary specification language (DEDSL) -- XML/DTD Syntax
ISO/IEC 23001-1:2006 Information technology -- MPEG systems technologies -- Part 1: Binary MPEG format for XML
ISO 24531:2007 Intelligent transport systems -- System architecture, taxonomy and terminology -- Using XML in ITS standards, data registries and data dictionaries

Notes and references

It is often said to be a markup language itself. This is incorrect.
Bray, Tim; Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, François Yergeau (September 2006). Extensible Markup Language (XML) 1.0 (Fourth Edition) - Origin and Goals. World Wide Web Consortium. Retrieved on October 29, 2006.
JSON and YAML are among other alternative text-based formats commonly described as lighter-weight and less verbose in comparison to XML. See Critique of XML in this article.
XHTML is an attempt to simplify and improve the consistency of HTML, which was based on SGML.
http://lists.xml.org/archives/xml-dev/200407/msg00208.html
Push, Pull, Next! by Bob DuCharme, at XML.com
http://java.sun.com/xml/jaxb/
http://www.javaworld.com/javaworld/jw-03-2006/jw-0327-simplify.html
http://www.ximpleware.com/wp_SUN.pdf
http://www.javaworld.com/javaworld/jw-07-2006/jw-0724-vtdxml.html
VTD+XML format spec
Index XML documents with VTD-XML
Bray, Tim (February 2005). A conversation with Tim Bray: Searching for ways to tame the world’s vast stores of information. Association for Computing Machinery's "Queue site". Retrieved on April 16, 2006.
(1988) "Publishers, multimedia, and interactivity", Interactive multimedia. Cobb Group. ISBN 1-55615-124-1.
The working group was originally called the "Editorial Review Board." The original members and seven who were added before the first edition was complete, are listed at the end of the first edition of the XML Recommendation, at http://www.w3.org/TR/1998/REC-xml-19980210.
Reports From the W3C SGML ERB to the SGML WG And from the W3C XML ERB to the XML SIG
http://www.w3.org/TR/WD-xml-961114.html
Extensible Markup Language (XML) 1.1 (Second Edition) - Rationale and list of changes for XML 1.1. W3C. Retrieved on 2006-12-21.
Harold, Elliotte Rusty (2004). Effective XML. Addison-Wesley, 10-19. ISBN 0321150406.
http://news.com.com/Small+company+makes+big+claims+on+XML+patents/2100-1014_3-5905949.html
http://blogs.zdnet.com/BTL/?p=2052
(See e.g., XML-QL Proposal discussing XML benefits, When to use XML, "XML Sucks" on c2.com, Daring to Do Less with XML)
Harold, Elliotte Rusty (2002). Processing XML with Java(tm): a guide to SAX, DOM, JDOM, JAXP, and TrAX. Addison-Wesley. 0201771861. XML documents are too verbose compared with binary equivalents.
Harold, Elliotte Rusty (2002). XML in a Nutshell: A Desktop Quick Reference. O'Reilly. 0596002920. XML documents are very verbose and searching is inefficient for high-performance largescale database applications.
However, the Binary XML effort strives to alleviate these problems by using a binary representation for the XML document. For example, the Java reference implementation of the Fast Infoset standard parsing speed is better by a factor 10 compared to Java Xerces, and by a factor 4 compared to the Piccolo driver, one of the fastest Java-based XML parser [1].
Bierman, Gavin (2005). Database Programming Languages: 10th international symposium, DBPL 2005 Trondheim, Norway. Springer. 3540309519. XML syntax is too verbose for human readers in for certain applications. Proposes a dual syntax for human readability.
Although many purportedly "less verbose" text formats actually cite XML as both inspiration and prior art. See e.g., http://yaml.org/spec/current.html,http://innig.net/software/sweetxml/index.html, http://www.json.org/xml.html.
A hierachical model only gives a fixed, monolithic view of the tree structure. For example, either actors under movies, or movies under actors, but not both.
Lim, Ee-Peng (2002). Digital Libraries: People, Knowledge, and Technology. Springer. 3540002618. Discusses some of the limitation with fixed hierarchy. Proceedings of the 5th International Conference on Asian Digital Libraries, ICADL 2002, held in Singapore in December 2002.
Searle, Leroy F. (2004). Voice, text, hypertext: emerging practices in textual studies. University of Washington Press. 0295983051. Proposes an alternative system for encoding overlapping elements.
(See e.g., http://www-128.ibm.com/developerworks/library/x-abolns.html )
The Myth of Self-Describing XML. Retrieved on 2007-05-12.
(See e.g., Use–mention distinction, Naming collision, Polysemy)
Does XML Suck?. Retrieved on 2007-12-15.(See "8. Complexity: Attributes and Content")

External links

Specifications

W3C XML homepage
The XML 1.0 specification
The XML 1.1 specification

Parsers

Xerces, a parser implemented in Java.
Expat free stream-oriented XML 1.0 parser library, written in C.
Libxml2 free XML C parser and toolkit.
RomXML Embedded XML commercial toolkit written in ANSI-C.
XDOM open-source XML parser (and DOM and XPath implementation) in Delphi/Kylix.
XML resources at the Open Directory Project
TinyXml Simple and small C++ XML parser.

Conversion tools

Altova
Stylusstudio
Navicat
XRay XML Editor
EditiX XML Editor

Sources

Introduction to Generalized Markup by Charles Goldfarb
Annex A of ISO 8879:1986 (SGML)
The Multilingual WWW by Gavin Nicol
Retrospective on Extended Reference Concrete Syntax by Rick Jelliffe
XML Based languages
XML, Java and the Future of the Web by Jon Bosak
XML tutorials in w3schools

Retrospectives

Thinking XML: The XML decade by Uche Ogbuji
XML: Ten year anniversary by Elliot Kimber
Closing Keynote, XML 2006 by Jon Bosak
Five years later, XML... by Simon St. Laurent
23 XML fallacies to watch out for by Sean McGrath
W3C XML is Ten!, XML 10 years press release

Papers

Lawrence A. Cunningham (2005). "Language, Deals and Standards: The Future of XML Contracts". Washington University Law Review. SSRN 900616.

Standards of the World Wide Web Consortium
Recommendations	Canonical XML · CDF · CGI · CSS · DOM · HTML · MathML · OWL · RDF · RDF Schema · SISR · SMIL · SOAP · SRGS · SSML · SVG · Timed Text ·VoiceXML · WSDL · XACML · XForms · XHTML · XML · XML Base · XML Events · XML Information Set · XML Schema (W3C) · XML Signature ·XPath · XPointer · XQuery · XSL Transformations · XSL-FO · XSL · XLink
Notes	XHTML+SMIL · XAdES
Working drafts	CCXML · CURIE · InkML · XFrames · XFDL · WICD · XHTML+MathML+SVG · XBL

你可能感兴趣的:(各项IT技术,杂项,xml,character,attributes,processing,schema,encoding)

LoRA微调详解：如何为AIGC模型节省90%显存 SuperAGI2025 AI大模型应用开发宝典 AIGC ai
LoRA微调详解：如何为AIGC模型节省90%显存关键词：LoRA、低秩适应、AIGC模型、参数高效微调、显存优化摘要：在AIGC（人工智能生成内容）领域，大模型（如GPT-3、LLaMA、StableDiffusion）的微调需要消耗海量显存，普通用户或企业难以负担。本文将深入解析LoRA（Low-RankAdaptation，低秩适应）这一参数高效微调技术，通过生活类比、数学原理、代码实战和应
数据仓库之星型模型 james二次元数据仓库大数据数据仓库
星型模型（StarSchema）是一种常见的数据仓库建模技术，专门用于支持高效的查询和数据分析。它以其简单直观的结构得名，中心是一个事实表（FactTable），周围是多个维度表（DimensionTables），整体结构看起来像一颗星。星型模型的组成部分事实表（FactTable）定义：存储与业务过程相关的数值型度量数据（Measures），如销售额、数量等。特征：主键：由多个外键组成，这些外键
Python爬虫网安-beautiful soup+示例
目录beautifulsoup:解析器：节点选择器：嵌套选择：关联选择：子节点：子孙节点：父节点：祖先节点：兄弟节点：上一个兄弟节点：下一个兄弟节点：后面所有的兄弟节点：前面所有的兄弟节点：方法选择器：CSS选择器：beautifulsoup:bs4用于解析htmlandxml文档解析器：html.parser、lxml解析器和XML的内置解析器文档遍历：跟xpath差不多，也是整理成树形结构搜索
模型上下文协议（MCP）和Function Calling的区别是什么？——深度解析两种AI交互技术的设计理念与实战应用码力金矿 python 人工智能 MCP 人工智能大数据 hadoop eclipse 前端 python java
一、引言：AI工具连接范式的“USB革命”与“专用遥控器”之争在AI应用开发中，模型与外部工具的交互能力至关重要。Anthropic推出的模型上下文协议（MCP）与OpenAI的FunctionCalling（函数调用）作为两种主流技术路径，常被开发者视为“万能接口”与“专用工具”的对比。本文将通过技术原理、应用场景、生态特性等多维度解析两者的本质差异，帮助您在实战中做出更高效的选择。二、MCP与
基于MCP架构的ChatBI：破解数据分析难题，让智能对话赋能商业决策码力金矿 MCP 人工智能 python 架构数据分析数据挖掘数据库 sql oceanbase 人工智能
在数据驱动的时代，传统BI工具操作复杂、效率低下，而ChatBI（对话式商业智能）的兴起为企业带来了新希望。本文将深入探讨一种基于MCP（ModelContextProtocol，模型上下文协议）架构的ChatBI解决方案，通过创新设计解决数据准确性、多指标查询及自动化分析等核心痛点。文章以技术拆解+实战案例的形式呈现，帮助您快速理解其原理与价值，助力企业高效实现智能数据分析。关键词：MCP、Ch
2025年人形机器人赛道爆发！这10家“黑马公司”一季度净利润暴涨，技术突破与商业化落地双轮驱动
2025年，人形机器人行业迎来历史性转折点。随着特斯拉OptimusGen-3量产突破20万台、波士顿动力AtlasE-Atlas电驱版发布，以及中国政策红利释放（如工信部《人形机器人创新发展指导意见》），全球市场进入“从0到1”的爆发期。本文通过公开财报数据与行业动态，梳理出2025年第一季度净利润增长最快的10家核心企业，并深度解析其技术突破与商业逻辑。一、增长最快TOP10企业榜单（数据来源
智能代码管理：用 Trae 激活 Gitee MCP 的高效协作潜能码力金矿人工智能 MCP python gitee 服务器运维 MCP python 人工智能
在代码协作的世界里，高效管理代码仓库一直是个技术活儿。现在，随着GiteeMCP与Trae的完美结合，我们迎来了全新的智能代码管理时代。今天就带大家深入浅出地看看，如何用Trae激活GiteeMCP，让代码管理从复杂走向简单，从手动迈向智能。一、初窥Trae与GiteeMCP的协作魅力GiteeMCP是个强大的代码管理平台，它让我们能轻松搞定代码仓库、Issue、PullRequest等等。而Tr
学习记录：DAY35
《技术学习笔记：Swagger、SpringBoot配置与AOP实践》前言昨天熬死我了，md，舍友不睡觉搁那敲鼠标，byd哪里买的那么响的鼠标，铛铛铛把我血压都敲高了，我想找都找不到。又要在睡眠上投资了。开始调整生物钟的计划，今天很困，但是必须顶到晚上才能睡觉，再顶个一俩天就好了。byd舍友最好早点回去，不然留你和我，你看我把不把你当日本人整。日程9：00，很困，先趁着还有点状态学会习。22：42
Servlet 自动刷新页面沐知全栈开发开发语言
Servlet自动刷新页面引言在Web开发中，实现页面的自动刷新是一个常见的需求。这种需求通常出现在需要实时更新信息显示的场景中，例如股票行情、新闻资讯等。Servlet技术作为一种成熟的JavaWeb技术，为我们提供了实现这一需求的有效途径。本文将详细介绍如何利用Servlet技术实现自动刷新页面的功能。Servlet简介Servlet是Java平台的一部分，它允许Java代码运行在Web服务器
基于人体骨架动作识别的神经信息处理技术（2 相关工作-2.4提高信号质量）路由跳变动作识别人工智能
2相关工作在本节中，我们将回顾本论文的相关工作。我们根据文献的功能将文献分为四类，包括1)数据集，2)提取空间特征，3)捕获时间模式，4)提高信号质量。对于每个组件，我们将其进一步分解为细分区域。最后，我们展示了现有方法在不同数据集上的SOTA改进。总之，该分类法如下：1)数据集2)提取空间特征利用拓扑结构、设计空间操作符、分离通道功能、学习参数化拓扑、分区层次结构。3)捕获时间模式提取多尺度特征
华为和H3c--交换技术
华为和H3c–交换技术一、VLAN的作用和交换网络链路类以及VLAN封装1、VLAN的作用和优势1）VLAN的作用隔离广播域2）VLAN的优势降低广播网络占用带宽资源安全性强屏蔽VLAN间访问增强设备的稳定性2、隔离广播的方式1）物理隔离通过路由器设备实现成本高2）VLAN交换机创建VLAN将接口加入到不同的VLAN中，VLAN之间相互隔离一个VLAN表示一个广播域3、交换网络链路的类型和Trun
【Python爬虫实战】全面抓取网页资源（图片、JS、CSS等）——超详细教程与源码解析 Python爬虫项目 python 爬虫 javascript 新浪微博开发语言 css 旅游
前言在互联网时代，网页数据已经成为重要的信息来源。许多时候，我们不仅需要抓取网页中的文字信息，还需要将网页中的各种资源文件（如图片、CSS样式表、JavaScript脚本文件等）一起抓取并保存下来。这种需求广泛应用于网页备份、离线浏览、数据分析等场景。本篇文章将带你从零开始，系统讲解如何使用Python最新技术，一步步实现抓取网页中所有静态资源的完整流程，包括：页面结构分析爬虫基本架构搭建异步爬取
用Python爬虫抓取网页中的视频文件：从数据获取到处理与保存的完整教程 Python爬虫项目 2025年爬虫实战项目 python 爬虫开发语言 selenium
一、引言随着在线视频平台的快速发展，视频成为了互联网中最重要的媒介之一。无论是用于娱乐、教育还是技术学习，视频内容都极大地改变了我们的信息获取方式。对于开发者、数据分析师或者研究者而言，获取和分析视频文件的数据不仅可以帮助他们深入理解某些平台的运营模式，也有助于建立自定义的多媒体内容库。爬虫技术是自动化抓取网页数据的一种工具。它通过模拟浏览器行为，抓取目标网页的内容。对于视频文件的抓取，尤其是那些
使用Python爬虫抓取免费音乐下载网站：从数据抓取到下载 Python爬虫项目 2025年爬虫实战项目 python 爬虫开发语言
目录：前言爬虫基础知识什么是Web爬虫爬虫的工作原理抓取音乐下载网站的目标目标网站分析确定抓取数据的元素爬虫技术栈介绍Python爬虫的常用库requests库BeautifulSoup库Selenium库aiohttp和异步抓取抓取音乐下载网站的步骤选择目标网站并分析页面结构使用requests获取网页内容使用BeautifulSoup解析HTML解析音频文件下载链接使用Selenium抓取动态
WebRTC基础介绍
WebRTC全称为：WebReal-TimeCommunication。它是为了解决Web端无法捕获音视频的能力，并且提供了peer-to-peer（就是浏览器间）的视频交互。WebRTC汇集了先进的实时通信技术，包括：先进的音视频编解码器（Opus和VP8/9），强制加密协议（SRTP和DTLS）和网络地址转换器（ICE＆STUN）。根据最初的定义，WebRTC被指定为P2P（peer-to-p
webRTC入门概览音视频开发老马 webrtc 服务器运维
1.什么是webRTCWebRTC（WebReal-TimeCommunications）是由谷歌开源并推进纳入W3C标准的一项音视频技术，旨在通过点对点的方式，在不借助中间媒介的情况下，实现浏览器之间的实时音视频通信。与Web经典的B/S架构(即浏览器和服务器架构模式)最大的不同是WebRTC的通信不经过服务器，而直接与客户端连接，在节省服务器资源的同时，提高通信效率。2.信令服务器信令(sig
【网络编程】EPOLL 事件触发机制的服务器啟明起鸣网络服务器运维
文章目录业务拆解EPOLL机制介绍EPOLL的核心变量和函数EPOLL程序流程图C代码实现准备工作服务器代码代码运行效果总结推荐一个零声教育学习教程，个人觉得老师讲得不错，分享给大家：[Linux，Nginx，ZeroMQ，MySQL，Redis，fastdfs，MongoDB，ZK，流媒体，CDN，P2P，K8S，Docker，TCP/IP，协程，DPDK等技术内容，点击立即学习:https:/
华为研发岗位面试与暑期实习攻略：C++与Java深入解析丹力
本文还有配套的精品资源，点击获取简介：华为的面试和暑期实习对IT求职者至关重要，涉及技术实力与团队协作。本文深入探讨了华为面试的要点，包括专业技能、项目经验、问题解决能力的考察，以及暑期实习和校招中的C++和Java研发岗位要求。在面试中，求职者需要展示C++11/14/17新特性、内存管理、设计模式，以及Java核心技术、JVM原理等，同时还需关注新技术趋势。积极学习和展现出学习能力与团队精神，
LLCC68IMLTRT：Semtech新一代LoRa®射频收发器芯片，IoT设备续航翻倍深圳市尚想信息技术有限公司物联网收发器收发器芯片升特半导体工业传感器
LLCC68IMLTRT（Semtech）产品解析与推广文案1.产品概述LLCC68IMLTRT是Semtech（升特半导体）推出的一款高性能、低功耗LoRa®Sub-GHz射频收发器芯片，支持远距离无线通信，适用于物联网（IoT）、智能表计、工业传感器网络等场景。2.主要功能与优势（1）远距离&低功耗通信LoRa®调制技术：通信距离>5km（城市环境），>15km（郊区）。超低功耗：接收电流仅5
科普语音交互所需开源技术方案
以下是ASR（自动语音识别）、LLM（大语言模型）和TTS（文本转语音）三者结合的应用场景及开源方案：一、应用场景智能语音助手如百聆（Bailing），支持语音输入、意图理解、任务管理及语音输出，端到端延迟仅800ms，支持打断和记忆功能。车载语音交互系统（如蔚来、小鹏），结合ASR识别指令、LLM处理复杂查询（如"找有充电桩的高评分餐厅"）和TTS提供语音反馈。语音到语音翻译（S2ST）阿里Fu
结合LangGraph、DeepSeek-R1和Qdrant 的混合 RAG 技术实践大模型之路 RAG rag
一、引言：混合RAG技术的发展与挑战在人工智能领域，检索增强生成（RAG）技术正成为构建智能问答系统的核心方案。传统RAG通过向量数据库存储文档嵌入并检索相关内容，结合大语言模型（LLM）生成回答，有效缓解了LLM的“幻觉”问题。然而，单一的稠密向量检索（如基于Transformer的嵌入模型）在处理关键词匹配和多义词歧义时存在局限性，而稀疏向量检索（如BM25）虽擅长精确关键词匹配，却缺乏语义理
【C#】【Unity 五子棋 2D 游戏技术实现】小李菜鸟 unity 游戏游戏引擎
一、系统概述该五子棋游戏基于Unity引擎开发，实现了15x15标准棋盘的2D对战功能，包含棋盘渲染、落子交互、胜负判定、悔棋和重新开始等核心功能。系统由两个主要脚本组成：Board2DSetup：负责棋盘界面的初始化，包括背景图像和网格线的生成Gobang2DGameManager：核心游戏逻辑管理，处理落子、胜负判定、UI交互等二、核心流程架构1.棋盘初始化流程1.加载棋盘背景图像，设置Rec
【web安全】远程命令执行(RCE)漏洞深度解析与攻防实践 KPX web安全安全 web安全 windows linux 漏洞
目录摘要1.RCE漏洞概述1.1基本概念1.2漏洞危害等级2.RCE漏洞原理深度分析2.1漏洞产生条件2.2常见危险函数2.2.1PHP环境2.2.2Java环境2.2.3Python环境3.RCE利用技术进阶3.1基础注入技术扩展3.1.1命令分隔技术3.1.2参数注入技术3.2高级绕过技术3.2.1编码混淆3.2.2字符串拼接3.3盲注技术3.3.1时间延迟检测3.3.2DNS外带数据3.3.
linux音视频采集技术: v4l2
简介在Linux系统中，视频设备的支持和管理离不开V4L2（VideoforLinux2）。作为Linux内核的一部分，V4L2提供了一套统一的接口，允许开发者与视频设备（如摄像头、视频采集卡等）进行交互。无论是视频采集、处理，还是编码和显示，V4L2都提供了强大的支持。当有设备插入时，在/dev下会出现/dev/video0、/dev/video1这些设备节点，使用它们可以支持采集、输出、设备控
Jupiter项目版本演进与技术架构深度解析齐飞锴Timothea
Jupiter项目版本演进与技术架构深度解析JupiterJupiter是一款性能非常不错的,轻量级的分布式服务框架项目地址:https://gitcode.com/gh_mirrors/jup/Jupiter项目概述Jupiter是一个高性能的分布式服务框架，专注于提供稳定可靠的RPC通信能力。从版本迭代历史可以看出，该项目在性能优化、功能完善和稳定性提升方面持续演进。本文将深入分析Jupite
AI实践：智能工单系统的技术逻辑与应用合力亿捷-小亿人工智能机器学习
在当今数字化浪潮下，智能工单系统正逐渐成为企业服务管理的核心利器。智能工单系统，是依托前沿技术，将传统工单流程智能化、自动化的一套体系，它贯穿于企业服务的各个环节，从客户需求提交，到任务分配、进度跟踪，再到问题解决反馈，全方位覆盖。在企业服务管理中，其扮演着关键角色。一方面，它能极大提高服务效率，通过智能算法快速精准地将工单派发给最合适的人员，减少流转时间；另一方面，优化客户体验，客户能实时了解工
WebRTC H.265 浏览器支持情况（2025年7月2日） illuspas h.265 webrtc
WebRTCH.265浏览器支持情况简介WebRTC技术在现代实时通信中扮演着重要角色，而H.265（HEVC）作为高效的视频编解码器，能够显著降低带宽需求。以下是当前各平台浏览器对WebRTCH.265支持情况的总结：支持情况总表操作系统浏览器内核版本支持状态WindowsChrome138.0.7204.50✅支持Edge138.0.3351.55❌不支持Firefox140.0.2❌不支持3
Spring Boot + AI，真的有搞头吗？5大步骤带你轻松入门墨瑾轩一起学学Java【一】spring boot 人工智能后端
关注墨瑾轩，带你探索编程的奥秘！超萌技术攻略，轻松晋级编程高手技术宝库已备好，就等你来挖掘订阅墨瑾轩，智趣学习不孤单即刻启航，编程之旅更有趣亲爱的小伙伴们，你们是否听说过SpringBoot和AI结合的消息？是不是觉得这两者听起来就像是天作之合？没错，SpringBoot和AI的结合确实能为我们带来许多意想不到的好处！今天，我们就来一起探讨如何在SpringBoot项目中集成AI功能，让你的应用更
Found non-empty schema(s) `XXX` but no schema history table. Use baseline() or set baselineOnMigrate IT莫染 bug笔记 java spring boot mysql
Foundnon-emptyschema(s)XXXbutnoschemahistorytable.Usebaseline()orsetbaselineOnMigratetotruetoinitializetheschemahistorytable.发现非空模式(年代)’XXX'但没有模式历史记录表。使用baseline()或设置baselineOnMigrate为true来初始化模式历史表。解决
spring mvc拦截器实现步骤 IT莫染 Function Module java教程面试题等 springmvc
1.springmvc.xml里面配置：2.拦截器类packagecom.itheima.springmvc.interceptor;importjavax.servlet.http.HttpServletRequest;importjavax.servlet.http.HttpServletResponse;importorg.springframework.web.servlet.Handle
安装数据库首次应用 Array_06 java oracle sql
可是为什么再一次失败之后就变成直接跳过那个要求 enter full pathname of java.exe的界面这个java.exe是你的Oracle 11g安装目录中例如：【F:\app\chen\product\11.2.0\dbhome_1\jdk\jre\bin】下的java.exe 。不是你的电脑安装的java jdk下的java.exe！注意第一次，使用SQL D
Weblogic Server Console密码修改和遗忘解决方法 bijian1013 Welogic
在工作中一同事将Weblogic的console的密码忘记了，通过网上查询资料解决，实践整理了一下。一.修改Console密码打开weblogic控制台，安全领域 --> myrealm -->&n
IllegalStateException: Cannot forward a response that is already committed Cwind java Servlets
对于初学者来说，一个常见的误解是：当调用 forward() 或者 sendRedirect() 时控制流将会自动跳出原函数。标题所示错误通常是基于此误解而引起的。示例代码： protected void doPost() { if (someCondition) { sendRedirect(); } forward(); // Thi
基于流的装饰设计模式木zi_鸣设计模式
当想要对已有类的对象进行功能增强时，可以定义一个类，将已有对象传入，基于已有的功能，并提供加强功能。自定义的类成为装饰类模仿BufferedReader，对Reader进行包装，体现装饰设计模式装饰类通常会通过构造方法接受被装饰的对象，并基于被装饰的对象功能，提供更强的功能。装饰模式比继承灵活，避免继承臃肿，降低了类与类之间的关系装饰类因为增强已有对象，具备的功能该
Linux中的uniq命令被触发 linux
Linux命令uniq的作用是过滤重复部分显示文件内容，这个命令读取输入文件，并比较相邻的行。在正常情况下，第二个及以后更多个重复行将被删去，行比较是根据所用字符集的排序序列进行的。该命令加工后的结果写到输出文件中。输入文件和输出文件必须不同。如果输入文件用“- ”表示，则从标准输入读取。 AD： uniq [选项] 文件说明：这个命令读取输入文件，并比较相邻的行。在正常情况下，第二个
正则表达式Pattern 肆无忌惮_ Pattern
正则表达式是符合一定规则的表达式，用来专门操作字符串，对字符创进行匹配，切割，替换，获取。例如，我们需要对QQ号码格式进行检验规则是长度6~12位不能0开头只能是数字，我们可以一位一位进行比较，利用parseLong进行判断，或者是用正则表达式来匹配[1-9][0-9]{4,14} 或者 [1-9]\d{4,14} &nbs
Oracle高级查询之OVER (PARTITION BY ..) 知了ing oracle sql
一、rank()/dense_rank() over(partition by ...order by ...) 现在客户有这样一个需求，查询每个部门工资最高的雇员的信息，相信有一定oracle应用知识的同学都能写出下面的SQL语句： select e.ename, e.job, e.sal, e.deptno from scott.emp e, (se
Python调试矮蛋蛋 python pdb
原文地址： http://blog.csdn.net/xuyuefei1988/article/details/19399137 1、下面网上收罗的资料初学者应该够用了，但对比IBM的Python 代码调试技巧： IBM：包括 pdb 模块、利用 PyDev 和 Eclipse 集成进行调试、PyCharm 以及 Debug 日志进行调试： http://www.ibm.com/d
webservice传递自定义对象时函数为空，以及boolean不对应的问题 alleni123 webservice
今天在客户端调用方法 NodeStatus status=iservice.getNodeStatus(). 结果NodeStatus的属性都是null。进行debug之后，发现服务器端返回的确实是有值的对象。后来发现原来是因为在客户端，NodeStatus的setter全部被我删除了。本来是因为逻辑上不需要在客户端使用setter，结果改了之后竟然不能获取带属性值的
java如何干掉指针，又如何巧妙的通过引用来操作指针————>说的就是java指针百合不是茶
C语言的强大在于可以直接操作指针的地址，通过改变指针的地址指向来达到更改地址的目的,又是由于c语言的指针过于强大，初学者很难掌握， java的出现解决了c，c++中指针的问题 java将指针封装在底层，开发人员是不能够去操作指针的地址，但是可以通过引用来间接的操作：定义一个指针p来指向a的地址（&是地址符号）：
Eclipse打不开，提示“An error has occurred.See the log file ***/.log” bijian1013 eclipse
打开eclipse工作目录的\.metadata\.log文件，发现如下错误： !ENTRY org.eclipse.osgi 4 0 2012-09-10 09:28:57.139 !MESSAGE Application error !STACK 1 java.lang.NoClassDefFoundError: org/eclipse/core/resources/IContai
spring aop实例annotation方法实现 bijian1013 java spring AOP annotation
在spring aop实例中我们通过配置xml文件来实现AOP，这里学习使用annotation来实现，使用annotation其实就是指明具体的aspect,pointcut和advice。1.申明一个切面(用一个类来实现)在这个切面里,包括了advice和pointcut AdviceMethods.jav
[Velocity一]Velocity语法基础入门 bit1129 velocity
用户和开发人员参考文档 http://velocity.apache.org/engine/releases/velocity-1.7/developer-guide.html 注释 1.行级注释## 2.多行注释#* *# 变量定义使用$开头的字符串是变量定义，例如$var1, $var2, 赋值使用#set为变量赋值，例
【Kafka十一】关于Kafka的副本管理 bit1129 kafka
1. 关于request.required.acks request.required.acks控制者Producer写请求的什么时候可以确认写成功，默认是0， 0表示即不进行确认即返回。 1表示Leader写成功即返回，此时还没有进行写数据同步到其它Follower Partition中 -1表示根据指定的最少Partition确认后才返回，这个在 Th
lua统计nginx内部变量数据 ronin47 lua nginx　统计
server { listen 80; server_name photo.domain.com; location /{set $str $uri; content_by_lua ' local url = ngx.var.uri local res = ngx.location.capture(
java-11.二叉树中节点的最大距离 bylijinnan java
import java.util.ArrayList; import java.util.List; public class MaxLenInBinTree { /* a. 1 / \ 2 3 / \ / \ 4 5 6 7 max=4 pass "root"
Netty源码学习-ReadTimeoutHandler bylijinnan java netty
ReadTimeoutHandler的实现思路：开启一个定时任务，如果在指定时间内没有接收到消息，则抛出ReadTimeoutException 这个异常的捕获，在开发中，交给跟在ReadTimeoutHandler后面的ChannelHandler，例如 private final ChannelHandler timeoutHandler = new ReadTim
jquery验证上传文件样式及大小(好用) cngolon 文件上传 jquery验证
<!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <script src="jquery1.8/jquery-1.8.0.
浏览器兼容【转】 cuishikuan css 浏览器 IE
浏览器兼容问题一：不同浏览器的标签默认的外补丁和内补丁不同问题症状：随便写几个标签，不加样式控制的情况下，各自的margin 和padding差异较大。碰到频率:100% 解决方案：CSS里 *{margin:0;padding:0;} 备注：这个是最常见的也是最易解决的一个浏览器兼容性问题，几乎所有的CSS文件开头都会用通配符*来设
Shell特殊变量：Shell $0, $#, $*, $@, $?, $$和命令行参数 daizj shell $#$?特殊变量
前面已经讲到，变量名只能包含数字、字母和下划线，因为某些包含其他字符的变量有特殊含义，这样的变量被称为特殊变量。例如，$ 表示当前Shell进程的ID，即pid，看下面的代码： $echo $$ 运行结果 29949 特殊变量列表变量含义 $0 当前脚本的文件名 $n 传递给脚本或函数的参数。n 是一个数字，表示第几个参数。例如，第一个
程序设计KISS 原则-------KEEP IT SIMPLE, STUPID! dcj3sjt126com unix
翻到一本书，讲到编程一般原则是kiss：Keep It Simple, Stupid.对这个原则深有体会，其实不仅编程如此，而且系统架构也是如此。 KEEP IT SIMPLE, STUPID! 编写只做一件事情，并且要做好的程序；编写可以在一起工作的程序，编写处理文本流的程序，因为这是通用的接口。这就是UNIX哲学.所有的哲学真正的浓缩为一个铁一样的定律，高明的工程师的神圣的“KISS 原
android Activity间List传值 dcj3sjt126com Activity
第一个Activity： import java.util.ArrayList;import java.util.HashMap;import java.util.List;import java.util.Map;import android.app.Activity;import android.content.Intent;import android.os.Bundle;import a
tomcat 设置java虚拟机内存 eksliang tomcat 内存设置
转载请出自出处：http://eksliang.iteye.com/blog/2117772 http://eksliang.iteye.com/ 常见的内存溢出有以下两种: java.lang.OutOfMemoryError: PermGen space java.lang.OutOfMemoryError: Java heap space ------------
Android 数据库事务处理 gqdy365 android
使用SQLiteDatabase的beginTransaction()方法可以开启一个事务，程序执行到endTransaction() 方法时会检查事务的标志是否为成功，如果程序执行到endTransaction()之前调用了setTransactionSuccessful() 方法设置事务的标志为成功则提交事务，如果没有调用setTransactionSuccessful() 方法则回滚事务。事
Java 打开浏览器 hw1287789687 打开网址 open浏览器 open browser 打开url 打开浏览器
使用java 语言如何打开浏览器呢? 我们先研究下在cmd窗口中,如何打开网址使用IE 打开 D:\software\bin>cmd /c start iexplore http://hw1287789687.iteye.com/blog/2153709 使用火狐打开 D:\software\bin>cmd /c start firefox http://hw1287789
ReplaceGoogleCDN：将 Google CDN 替换为国内的 Chrome 插件 justjavac chrome Google google api chrome插件
Chrome Web Store 安装地址： https://chrome.google.com/webstore/detail/replace-google-cdn/kpampjmfiopfpkkepbllemkibefkiice 由于众所周知的原因，只需替换一个域名就可以继续使用Google提供的前端公共库了。同样，通过script标记引用这些资源，让网站访问速度瞬间提速吧
进程VS.线程 m635674608 线程
资料来源： http://www.liaoxuefeng.com/wiki/001374738125095c955c1e6d8bb493182103fac9270762a000/001397567993007df355a3394da48f0bf14960f0c78753f000 1、Apache最早就是采用多进程模式 2、IIS服务器默认采用多线程模式 3、多进程优缺点优点：多进程模式最大
Linux下安装MemCached 字符串 memcached
前提准备：1. MemCached目前最新版本为：1.4.22，可以从官网下载到。2. MemCached依赖libevent，因此在安装MemCached之前需要先安装libevent。2.1 运行下面命令，查看系统是否已安装libevent。[root@SecurityCheck ~]# rpm -qa|grep libevent libevent-headers-1.4.13-4.el6.n
java设计模式之--jdk动态代理（实现aop编程） Supanccy2013 java DAO 设计模式 AOP
与静态代理类对照的是动态代理类，动态代理类的字节码在程序运行时由Java反射机制动态生成，无需程序员手工编写它的源代码。动态代理类不仅简化了编程工作，而且提高了软件系统的可扩展性，因为Java 反射机制可以生成任意类型的动态代理类。java.lang.reflect 包中的Proxy类和InvocationHandler 接口提供了生成动态代理类的能力。 &
Spring 4.2新特性-对java8默认方法(default method)定义Bean的支持 wiselyman spring 4
2.1 默认方法(default method) java8引入了一个default medthod; 用来扩展已有的接口,在对已有接口的使用不产生任何影响的情况下,添加扩展使用default关键字 Spring 4.2支持加载在默认方法里声明的bean 2.2 将要被声明成bean的类 public class DemoService {

xml

Well-formed and valid XML documents

Well-formed documents: XML syntax

Entity references

Numeric character references

Well-formed documents

Automatic verification

Valid documents: XML semantics

DTD

XML Schema

RELAX NG

ISO DSDL and other schema languages

International use

Displaying XML on the web

XML extensions

Processing XML files

Simple API for XML (SAX)

DOM

Transformation engines and filters

Pull parsing

Data binding

Non-extractive XML Processing API

Specific XML applications and editors

History

Sources

Versions

Patent claims

Critique of XML

Advantages of XML

Disadvantages of XML

Standardization

See also

Notes and references

External links

Specifications

Parsers

Conversion tools

Sources

Retrospectives

Papers

你可能感兴趣的:(各项IT技术,杂项,xml,character,attributes,processing,schema,encoding)