soruce :http://www.cs.auckland.ac.nz/~pgut001/pubs/x509guide.txt
X.509 Style Guide ================= Peter Gutmann, [email protected] October 2000 [This file is frequently cited as a reference on PKI issues, when in fact it was really intended as X.509 implementation notes meant mostly for developers, to tell them all the things the standards leave out. If you're looking for a general overview of PKI that includes most of what's in here but presented in a more accessible manner, you should use "Everything you never wanted to know about PKI but have been forced to find out", http://www.cs.auckland.ac.nz/~pgut001/pubs/pkitutorial.pdf, a less technical overview aimed at people charged with implementing and deploying the technology. If you need to know what you're in for when you work with PKI, this is definitely the one to read. Further PKI information and material can be found on my home page, http://www.cs.auckland.ac.nz/~pgut001/]. There seems to be a lot of confusion about how to implement and work with X.509 certificates, either because of ASN.1 encoding issues, or because vagueness in the relevant standards means people end up taking guesses at what some of the fields are supposed to look like. For this reason I've put together these guidelines to help in creating software to work with X.509 certificates, PKCS #10 certification requests, CRLs, and other ASN.1-encoded data types. I knew a guy who set up his own digital ID heirarchy, could issue his own certificates, sign his own controls, ran SSL on his servers, etc. I don't need to pay Verisign a million bucks a year for keys that expire and expire. I just need to turn off the friggen [browser warning] messages. -- Mark Bondurant, "Creating My Own Digital ID", in alt.computer.security. In addition, anyone who has had to work with X.509 has probably experienced what can best be described as ISO water torture, which involves ploughing through all sorts of ISO, ANSI, ITU, and IETF standards, amendments, meeting notes, draft standards, committee drafts, working drafts, and other work-in-progress documents, some of which are best understood when held upside-down in front of a mirror (this has lead to people trading hard-to-find object identifiers and ASN.1 definitions like baseball cards - "I'll swap you the OID for triple DES in exchange for the latest CRL extensions"). This document is an attempt at providing a cookbook for certificates which should give you everything that you can't easily find anywhere else, as well as comments on what you'd typically expect to find in certificates. Given humanity's track record with languages, you wonder why we bother with standards committies -- Marcus Leech Since the original X.509 spec is somewhat vague and open-ended, every non-trivial group which has any reason to work with certificates has to produce an X.509 profile which nails down many features which are left undefined in X.509. You can't be a real country unless you have a beer and an airline. It helps if you have some kind of a football team, or some nuclear weapons, but at the very least you need a beer. -- Frank Zappa And an X.509 profile. -- Me The difference between a specification (X.509) and a profile is that a specification doesn't generally set any limitations on combinations what can and can't appear in various certificate types, while a profile sets various limitations, for example by requiring that signing and confidentiality keys be different (the Swedish profile requires this, and the German profile specifies exclusive use of certificates for digital signatures). The major profiles in use today are: PKIX - Internet PKI profile. FPKI - (US) Federal PKI profile. MISSI - US DoD profile. ISO 15782 - Banking - Certificate Management Part 1: Public Key Certificates. TeleTrust/MailTrusT - German MailTrusT profile for TeleTrusT (it really is capitalised that way). German SigG Profile - Profile to implement the German digital signature law (the certificate profile SigI is particularly good, providing not just the usual specification but also examples of each certificate field and extension including the encoded forms). ISIS Profile - Another German profile. Australian Profile - Profile for the Australian PKAF (this may be the same as DR 98410, which I haven't seen yet). SS 61 43 31 Electronic ID Certificate - Swedish profile. FINEID S3 - Finnish profile. ANX Profile - Automotive Network Exchange profile. Microsoft Profile - This isn't a real profile, but the software is widespread enough and nonstandard enough that it constitutes a significant de facto profile. No standard or clause in a standard has a divine right of existence -- A Microsoft PKI architect explaining Microsoft's position on standards compliance. Unfortunately the official profiles tend to work like various monotheistic religions where you either do what we say or burn in hell (that is, conforming to one profile generally excludes you from claiming conformance with any others unless they happen to match exactly). This means that you need to either create a chameleon-like implementation which can change its behaviour at a whim, or restrict yourself to a single profile which may not be accepted in some locales. There is (currently) no way to mark a certificate to indicate that it should be processed in a manner conformant to a particular profile, which makes it difficult for a relying party to know how their certificate will be processed by a particular implementation. Interoperability Testing. Conclusion: It doesn't work -- Richard Lampard, CESG, talking about UK government PKI experiences Although I've tried to take into account the various "Use of this feature will result in the immediate demise of all small furry animals in an eight-block radius"-type warnings contained in various standards documents to find a lowest common denominator set of rules which should result in the least pain for all concerned if they're adhered to, the existence of conflicting profiles makes this a bit difficult. The idea behind the guide is to at least try to present a "If you do this, you should be OK" set of guidelines, rather than a "You're theoretically allowed to do this if you can find an implementation which supports it" feature list. Finally, the guide contains a (rather lengthy) list of implementation errors, bugs, and problems to look out for with various certificates and the related software in order to allow implementors to create workarounds. The conventions used in the text are: - All encodings follow the DER unless otherwise noted. - Most of the formats are ASN.1, or close enough to it to be understandable (the goal was to make it easily understandable, not perfectly grammatically correct). Occasionally 15 levels of indirection are cut out to make things easier to understand. The resulting type and value of an instance of use of the new value notation is determined by the value (and the type of the value) finally assigned to the distinguished local reference identified by the keyword VALUE, according to the processing of the macrodefinition for the new type notation followed by that for the new value notation. -- ISO 8824:1988, Annex A Certificate ----------- Certificate ::= SEQUENCE { tbsCertificate TBSCertificate, signatureAlgorithm AlgorithmIdentifier, signature BIT STRING } The goal of a cert is to identify the holder of the corresponding private key, in a fashion meaningful to relying parties. -- Stephen Kent By the power vested in me, I now declare this text string and this bit string 'name' and 'key'. What RSA has joined, let no man put asunder. -- Bob Blakley The encoding of the Certificate may follow the BER rather than the DER. At least one implementation uses the indefinite-length encoding form for the SEQUENCE. TBSCertificate -------------- The default tagging for certificates varies depending on which standard you're using. The original X.509v1 definition used the ASN.1 default of explicit tags, with X.509v3 extensions in a separate module with implicit tags. The PKIX definition is quite confusing because the ASN.1 definitions in the appendices use TAGS IMPLICIT but mix in X.509v3 definitions which use explicit tags. Appendix A has such a mixture of implied implicit and implied explicit tags that it's not really possible to tell what tagging you're supposed to use. Appendix B (which first appeared in draft 7, March 1998) is slightly better, but still confusing in that it starts with TAGS IMPLICIT, but tries to partially switch to TAGS EXPLICIT for some sections (for example the TBSCertificate has an 'EXPLICIT' keyword in the definition which is probably intended to signify that everything within it has explicit tagging, except that it's not valid ASN.1). The definitions given in the body of the document use implicit tags, and the definitions of TBSCertificate and and TBSCertList have both EXPLICIT and IMPLICIT tags present. To resolve this, you can either rely entirely on Appendix B with the X.509v1 sections moved into a separate section declared without 'IMPLICIT TAGS', or use the X.509v3 definitions. The SET definitions consistently use implicit tags. Zaphod felt he was teetering on the edge of madness and wondered whether he shouldn't just jump over and have done with it. -- Douglas Adams, "The Restaurant at the End of the Universe" TBSCertificate ::= SEQUENCE { version [ 0 ] Version DEFAULT v1(0), serialNumber CertificateSerialNumber, signature AlgorithmIdentifier, issuer Name, validity Validity, subject Name, subjectPublicKeyInfo SubjectPublicKeyInfo, issuerUniqueID [ 1 ] IMPLICIT UniqueIdentifier OPTIONAL, subjectUniqueID [ 2 ] IMPLICIT UniqueIdentifier OPTIONAL, extensions [ 3 ] Extensions OPTIONAL } Version ------- Version ::= INTEGER { v1(0), v2(1), v3(2) } This field is used mainly for marketing purposes to claim that software is X.509v3 compliant (even when it isn't). The default version is v1(0), if the issuerUniqueID or subjectUniqueID are present than the version must be v2(1) or v3(2). If extensions are present than the version must be v3(2). An implementation should target v3 certificates, which is what everyone is moving towards. I was to learn later in life that we tend to meet any new situation by reorganizing: and a wonderful method it can be for creating the illusion of progress, while producing confusion, inefficiency and demoralization -- Petronius Arbiter, ~60 A.D Note that the version numbers are one less than the actual X.509 version because in the ASN.1 world you start counting from 0, not 1 (although it's not necessary to use sequences of integers for version numbers. X.420, for example, is under the impression that 2 is followed by 22 rather than the more generally accepted 3). If your software generates v1 certificates, it's a good idea to actually mark them as such and not just mark everything as v3 whether it is or not. Although no standard actually forbids marking a v1 certificate as v3, backwards- compatibility (as well as truth-in-advertising) considerations would indicate that a v1 certificate should be marked as such. SerialNumber ------------ CertificateSerialNumber ::= INTEGER This should be unique for each certificate issued by a CA (typically a CA will keep a counter in persistent store somewhere, perhaps a config file under Unix and in the registry under Windows). A better way is to take the current time in seconds and subtract some base time like the first time you ran the software, to keep the numbers manageable. This has the further advantage over a simple sequential numbering scheme that it doesn't allow tracking of the number of certificates which have been signed by a CA, which can have nasty consequences both if various braindamaged government regulation attempts ever come to fruition, and because by using sequential numbers a CA ends up revealing just how few certs it's actually signing (at the cost of a cert per week, the competition can find out exactly how many certs are being issued each week). Although this is never mentioned in any standards document, using negative serial numbers is probably a bit silly (note the caveat about encoding INTEGER values in the section on SubjectPublicKeyInfo). Serial numbers aren't necessarily restricted to 32-bit quantitues. For example the RSADSI Commercial Certification Authority serial number is 0x0241000016, which is larger than 32 bits, and Verisign seem to like using 128 or 160-bit hashes as serial numbers. If you're writing certificate-handling code, just treat the serial number as a blob which happens to be an encoded integer (this is particularly important for the case of the vendors who have forgotten that the high bit of an integer is the sign bit, and generate negative serial numbers for their certificates). Signature --------- This rather misnamed field contains the algorithm identifier for the signature algorithm used by the CA to sign the certificate. There doesn't seem to be much use for this field, although you should check that the algorithm identifier matches the one of the signature on the cert (if someone can forge the signature on the cert then they can also change the inner algorithm identifier, it's possible that this was included because of some obscure attack where someone who could convince (broken) signature algorithm A to produce the same signature value as (secure) algorithm B could change the outer, unprotected algorithm identifier from B to A, but couldn't change the inner identifier without invalidating the signature. What this would achieve is unclear). Be very careful with your use of Object Identifiers. In many cases there are a great many OIDs available for the same algorithm, but the exact OID you're supposed to use varies somewhat. You see, the conditional modifers depend on certain variables like the day of the week, the number of players, chair positions, things like that. [...] There can't be more than a dozen or two that are pertinent. -- Robert Asprin, "Little Myth Marker" Your best bet is to copy the OIDs everyone else uses and/or use the RSADSI or X9 OIDs (rather than the OSI or OIW or any other type of OID). OTOH if you want to be proprietary while still pretending to follow a standard, use OSI OID's which are often underspecified, so you can do pretty much whatever you want with things like block formatting and padding. Another pitfall to be aware of is that algorithms which have no parameters have this specified as a NULL value rather than omitting the parameters field entirely. The reason for this is that when the 1988 syntax for AlgorithmIdentifier was translated into the 1997 syntax, the OPTIONAL associated with the AlgorithmIdentifier parameters got lost. Later it was recovered via a defect report, but by then everyone thought that algorithm parameters were mandatory. Because of this the algorithm parameters should be specified as NULL, regardless of what you read elsewhere. The trouble is that things *never* get better, they just stay the same, only more so -- Terry Pratchett, "Eric" Name ---- Name ::= SEQUENCE OF RelativeDistinguishedName RelativeDistinguishedName ::= SET OF AttributeValueAssertion AttributeValueAssertion ::= SEQUENCE { attributeType OBJECT IDENTIFIER, attributeValue ANY } This is used to encode that wonderful ISO creation, the Distinguished Name (DN), a path through an X.500 directory information tree (DIT) which uniquely identifies everything on earth. Although the RelativeDistinguishedName (RDN) is given as a SET OF AttributeValueAssertion (AVA) each set should only contain one element. However you may encounter other people's certs which could contain more than one AVA per set, there has been a reported sighting of a certificate which contained more than one element in the SET. When the X.500 revolution comes, your name will be lined up against the wall and shot -- John Gilmore They can't be read, written, assigned, or routed. Other than that, they're perfect -- Marshall Rose When encoding sets with cardinality > 1, you need to take care to follow the DER rules which say that they should be ordered by their encoded values (although ASN.1 says a SET is unordered, the DER adds ordering rules to ensure it can be encoded in an unambiguous manner). What you need to do is encode each value in the set, then sort them by the encoded values, and output them wrapped up in the SET OF encoding, First things first, but not necessarily in that order. -- Dr.Who however your software really shouldn't be producing these sorts of RDN entries. In theory you don't have to use a Name for the subject name if you don't want to; there is a subjectAltName extension which allows use of email addresses or URL's. In theory if you want to do this you can make the Name an empty sequence and include a subjectAltName extension and mark it critical, but this will break a lot of implementations. Because it is possible to do this, you should be prepared to accept a zero-length sequence for the subject name in version 3 certificates. Since the DN is supposed to encode the location of the certificate in a DIT, having a null issuer name would mean you couldn't actually locate the certificate, so CAs will need to use proper DNs. The S/MIME certificate spec codifies this by requiring that all issuer DNs be non- null (so only an end-user certificate can have a null DN, and even then it's not really recommended), and this requirement was back-ported to the PKIX profile shortly before it was finalised. The reason for requiring issuer DNs is that S/MIME v2 and several related standards identify certificates by issuer and serial number, so all CA certificates must contain an issuer DN (S/MIME v3 allows subjectKeyIdentifiers, but they're almost never used). SET provides an eminently sensible definition for DNs: Name ::= SEQUENCE SIZE(1..5) OF RelativeDistinguishedName RelativeDistinguishedName ::= SET SIZE(1) OF AttributeTypeAndValue AttributeTypeAndValue ::= { OID, C | O | OU | CN } This means that when you see a SET DN it'll be in a fixed, consistent, and easy-to-process format (note in particular the fixed maximum size, the requirement for a single element per AVA, and the restriction to sensible element types). Note that the (issuer name, serialNumber (with a possible side order of issuerUniqueID, issuerAltName, and keyUsage extension)) tuple uniquely identifies a certificate and can be used as a key to retrieve certificates from an information store. The subject name alone does not uniquely identify a certificate because a subject can own multiple certificates. You would normally expect to find the following types of AVAs in an X.509 certificate, starting from the top: countryName ::= SEQUENCE { { 2 5 4 6 }, StringType( SIZE( 2 ) ) } organization ::= SEQUENCE { { 2 5 4 10 }, StringType( SIZE( 1...64 ) ) } organizationalUnitName ::= SEQUENCE { { 2 5 4 11 }, StringType( SIZE( 1...64 ) ) } commonName ::= SEQUENCE { { 2 5 4 3 }, StringType( SIZE( 1...64 ) ) } You might also find: localityName ::= SEQUENCE { { 2 5 4 7 }, StringType( SIZE( 1...64 ) ) } stateOrProvinceName ::= SEQUENCE { { 2 5 4 8 }, StringType( SIZE( 1...64 ) ) } Some profiles require at least some of these AVAs to be present, for example the German profile requires at least a countryName and commonName, and in some cases also an organization name. This is a reasonable requirement, as a minimum you should always include the country and common name. Finally, you'll frequently also run into: emailAddress ::= SEQUENCE { { 1 2 840 113549 1 9 1 }, IA5String } from PKCS #9, although this shouldn't be there. I can't afford to make exceptions. Once word leaks out that a pirate has gone soft, people begin to disobey you and it's nothing but work, work, work all the time -- The Dread Pirate Roberts, "The Princess Bride" The reason why oddball components like the emailAddress have no place in a DN created as per the original X.500 vision is because the whole DN is intended to be a strictly heirarchical construction specifying a path through a DIT. Unfortunately the practice adopted by many CAs of tacking on an emailAddress, an element which has no subordinate relationship to the other components of the DN, creates a meaningless mishmash which doesn't follow this hierarchical model. For this reason the ITU defined the GeneralName, which specifically allows for components such as email addresses, URL's, and other non-DN items. GeneralNames are discussed in "Extensions" below. Since the GeneralName provides a proper means of specifying information like email addresses, your software shouldn't populate DNs with these components, however for compatibility with legacy implementations you need to be able to accept existing certificates which contain odd things in the DN. Currently all mailers appear to be able to handle an rfc822Name in an altName, so storing it in the correct location shouldn't present any interoperability problems. One problem with email address handling is that many mailers will accept not only 'J.Random Luser' as a valid emailAddress/rfc822Name but will be equally happy with 'President William Jefferson Clinton '. The former is simply invalid, but the latter can be downright dangerous because it completely bypasses the stated purpose of email certificates, which is to identify the other party in an email exchange. Both PKIX and S/MIME explicitly require that an rfc822Name only contain an RFC 822 addr-spec which is defined as local-part@domain, so the mailbox form 'Personal Name ' isn't allowed (many S/MIME implementations don't enforce this though). Unfortunately X.509v3 just requires "an Internet electronic mail address defined in accordance with Internet RFC 822" without tying it down any further, so it could be either an addr-spec or a mailbox. Okay, I'm going home to drink moderately and then pass out. -- Steve Rhoades, "Married with Children" The countryName is the ISO 3166 code for the country. Noone seems to know how to specify non-country-aligned organisations, it's possible that 'EU' will be used at some point but there isn't any way to encode a non-country code although some organisations have tried using 'INT'. Actually noone really even knows what a countryName is supposed to refer to (let alone something as ambiguous as "locality"), for example it could be your place of birth, country of citizenship, country of current residence, country of incorporation, country where corporate HQ is located, country of choice for tax and/or jurisdictional issues, or a number of other possibilities (moving from countryName to stateOrProvinceName, people in the US military can choose a state as their official "residence" for tax purposes even if they don't own any property in that state, and politicians are allowed to run for office in one state while their wives claim residence and run for office in another state). The details of the StringType are discussed further down. It's a good idea to actually limit the string lengths to 64 characters as required by X.520 because, although many implementations will accept longer encoded strings in certs, some can't manipulate them once they've been decoded by the software, and you'll run into problems with LDAP as well. This means residents of places like Taumatawhakatangihangakoauotamateaturipukakapikimaungahoronukupokai- whenuakitanataha are out of luck when it comes to getting X.509 certs. Comparing two DNs has its own special problems, and is dealt with in the rather lengthy "Comparing DNs" section below. There appears to be some confusion about what format a Name in a certificate should take. Insufficient facts always invite danger -- Spock, "Space Seed" In theory it should be a full, proper DN, which traces a path through the X.500 DIT, eg: C=US, L=Area 51, O=Hanger 18, OU=X.500 Standards Designers, CN=John Doe but since the DIT's usually don't exist, exactly what format the DN should take seems open to debate. A good guideline to follow is to organize the namespace around the C, O, OU, and CN attribute types, but this is directed primarily at corporate structures. You may also need to use ST(ate) and L(ocality) RDNs. Some implementations seem to let you stuff anything with an OID into a DN, which is not good. There is nothing in any of these standards that would prevent me from including a 1 gigabit MPEG movie of me playing with my cat as one of the RDN components of the DN in my certificate. -- Bob Jueneman on IETF-PKIX (There is a certificate of this form available from http://www.cs.auckland.ac.nz/~pgut001/pubs/ {dave_ca|dave}.der, although the MPEG is limited to just over 1MB) With a number of organisations moving towards the use of LDAP-based directory services, it may be that we'll actually see X.500 directories in our lifetime, Well, it just so happens that your friend here is only mostly dead. There's a big difference between mostly dead and all dead. Now, mostly dead is slightly alive. -- Miracle Max, "The Princess Bride" which means you should make an attempt to have a valid DN in the certificate. LDAP uses the RFC 1779 form of DN, which is the opposite endianness to the ISO 9594 form used above: CN=John Doe, OU=X.500 Standards Designers, O=Hanger 18, L=Area 51, C=US There are always alternatives -- Spock, "The Galileo Seven" In order to work with LDAP implementations, you should ensure you only have a single AVA per RDN (which also avoids the abovementioned DER-encoding hassle). As the above text has probably indicated, DNs don't really work - there's no clear idea of what they should look like, most users don't know about (and don't want to know about) X.500 and its naming conventions, and as a consequence of this the DN can end up containing just about anything. At the moment they seem to be heading in two main directions: - Public CAs typically set C=CA country, O=CA name, OU=certificate type, CN=user name - A small subset of CAs in Europe which issue certs in accordance with various signature laws and profiles with their own peculiar requirements can have all sorts of oddities in the DN. You won't run into many of these in the wild. - A small subsets of CAs will modify the DN by adding a unique ID value to the CN to make it a truly Distinguished Name. See the Bugs and Peculiarities sections for more information on this. - Private CAs (mostly people or organisations signing their own certs) typically set any DN fields supported by their software to whatever makes sense for them (some software requires all fields in the set {C,O,OU,SP,L,CN} to be filled in, leading to strange or meaningless entries as people try and guess what a Locality is supposed to be). Generally you'll only run into certs from public CAs, for which the general rule is that the cert is identified by the CN and/or email address. Some CAs issue certs with identical CN's and use the email address to disambiguate them, others modify the CN to make it unique. The accepted user interface seems to be to let users search on the CN and/or email address (and sometimes also the serial number, which doesn't seem terribly useful), display a list of matches, and let the user pick the cert they want. Probably the best strategy for a user interface which handles certs is: if( email address known ) get a cert which matches the email address (any one should do); elseif( name known ) search for all certs with CN=name; if( multiple matches ) display email addresses for matched certs to user, let them choose; else error; If you need something unique to use as an identifier (for example for a database key) and you know your own software (or more generally software which can do something useful with the identifier) will be used, use an X.500 serialNumber in a subjectAltName directoryName or use a subjectAltName otherName (which was explicitly created to allow user-defined identifiers). For internal cert lookups, encode the cert issuer and serial number as a PKCS #7 issuerAndSerialNumber, hash it down to a fixed size with SHA-1 (you can either use the full 20 bytes or some convenient truncated form like 64 bits), and use that to identify the cert. This works because the internal structure of the DN is irrelevant anyway, and having a fixed-size unique value makes it very easy to perform a lookup in various data structures (for example the random hash value generated leads to probabalistically balanced search trees without requiring any extra effort). Validity -------- Validity ::= SEQUENCE { notBefore UTCTIME, notAfter UTCTIME } This field denotes the time at which you have to pay your CA a renewal fee to get the certificate reissued. The IETF originally recommended that all times be expressed in GMT and seconds not be encoded, giving: YYMMDDHHMMZ as the time encoding. This provided an unambiguous encoding because a value of 00 seconds was never encoded, which meant that if you read a UTCTime value generated by an implementation which didn't use seconds and wrote it out again with an implementation which did, it would have the same encoding because the 00 wouldn't be encoded. However newer standards (starting with the Defence Messaging System (DMS), SDN.706), require the format to be: YYMMDDHHMMSSZ even if the seconds are 00. The ASN.1 encoding rules were in late 1996 amended so that seconds are always encoded, with a special note that midnight is encoded as ...000000Z and not ...240000Z. You should therefore be prepared to encounter UTCTimes with and without the final 00 seconds field, however all newer certificates encode 00 seconds. If you read and then write out an existing object you may need to remember whether the seconds were encoded or not in the original because adding the 00 will invalidate the signature (this problem is slowly disappearing as pre-00 certificates expire). A good workaround for this problem when generating certificates is to ensure that you never generate a certificate with the seconds set to 00, which means that even if other software re-encodes your certificate, it can't get the encoding wrong. At least one widely-used product generated incorrect non-GMT encodings so you may want to consider handling the "+/-xxxx" time offset format, but you should flag it as a decoding error nonetheless. In coming up with the worlds least efficient machine-readable time encoding format, the ISO nevertheless decided to forgo the encoding of centuries, a problem which has been kludged around by redefining the time as UTCTime if the date is 2049 or ealier, and GeneralizedTime if the date is 2050 or later (the original plan was to cut over in 2015, but it was felt that moving it back to 2050 would ensure that the designers were either retired or dead by the time the issue became a serious problem, leaving someone else to take the blame). To decode a date, if it's UTCTime and the year is less than or equal to 49 it's 20xx, if it's UTCTime and the year is equal to or greater than 50 it's 19xx, and if it's GeneralizedTime it's encoded properly (but shouldn't really be used for dates before 2050 because you could run into interoperability problems with existing software). Yuck. To make this even more fun, another spec at one time gave the cutover date as 2050/2051 rather than 2049/2050, and allowed GeneralizedTime to be used before 2050 if you felt you could get away with it. It's likely that a lot of conforming systems will briefly become nonconforming systems in about half a centuries time, in a kind of security-standards equivalent of the age-old paradox in which Christians and Moslems will end up in the other side's version of hell. Confusion now hath made his masterpiece. -- Macduff, "Macbeth", II.i. Another issue to be aware of is the problem of issuer certificates which have a different validity time than the subject certificates they are used to sign. Although this isn't specified in any standard, some software requires validity period nesting, in which the subject validity period lies inside the issuer validity period. Most software however performs simple pointwise checking in which it checks whether a cert chain is valid at a certain point in time (typically the current time). Maintaining the validity nesting requires that a certain amount of care be used in designing overlapping validity periods between successive generations of certificates in a hierarchy. Further complications arise when an existing CA is re-rooted or re-parented (for example a divisional CA is subordinated to a corporate CA). Australian and New Zealand readers will appreciate the significance of using the term "re-rooted" to describe this operation. Finally, CAs are handling the problem of expiring certificates by reissuing current ones with the same name and key but different validity periods. In some cases even CA roots have been reissued with the only different being extended validity periods. This can result in multiple identical-seeming certificates being valid at one time (in one case three certificates with the same DN and key were valid at once). The semantics of these certificates/keys are unknown. Perhaps Validity could simply be renamed to RenewalFeeDueDate to reflect its actual usage. An alternative way to avoid expiry problems is to give the certificate an expiry date several decades in the future. This is popular for CA certs which don't require an annual renewal fee. SubjectPublicKeyInfo -------------------- This contains the public key, either a SEQUENCE of values or a single INTEGER. Keep in mind that ASN.1 integers are signed, so if any integers you want to encode have the high bit set you need to add a single zero octet to the start of the encoded value to ensure that the high bit isn't mistaken for a sign bit. In addition you are allowed at most a single 0 byte at the start of an encoded value (and that only when the high bit is set), if the internal representation you use contains zero bytes at the start you have to remove them on encoding. This is a bit of a nuisance when encoding signatures which have INTEGER values, since you can't tell how big the encoded signature will be without actually generating it. UniqueIdentifier ---------------- UniqueIdentifier ::= BITSTRING These were added in X509v2 to handle the possible reuse of subject and/or issuer names over time. Their use is deprecated by the IETF, so you shouldn't generate these in your certificates. If you're writing certificate-handling code, just treat them as a blob which happens to be an encoded bitstring. Extensions ---------- Extensions ::= SEQUENCE OF Extension Extension ::= SEQUENCE { extnid OBJECT IDENTIFIER, critical BOOLEAN DEFAULT FALSE, extnValue OCTETSTRING } X.509 certificate extensions are like a LISP property list: an ISO-standardised place to store crufties. Extensions can consist of key and policy information, certificate subject and issuer attributes, certificate path constraints, CRL distribution points, and private extensions. X.509v3 and the X.509v4 draft contains the ASN.1 formats for the standard V3 Certificate, V2 CRL and V2 CRLEntry extensions. In theory you should be able to handle all of these, but there are large numbers of them and some may not be in active use, or may be meaningless in some contexts. 'It's called a shovel,' said the Senior Wrangler. 'I've seen the gardeners use them. You stick the sharp end in the ground. Then it gets a bit technical' -- Terry Pratchett, "Reaper Man" The extensions are encoded with IMPLICIT tags, it's traditional to specify this in some other part of the standard which is at least 20 pages away from the section where the extension is actually defined (but see the comments above about the mixture of explicit and implicit tags in ASN.1 definitions). There are a whole series of superseded and deprecated OIDs for extensions, often going back through several generations. Older software and certificates (and buggy newer software) will still use obsolete OIDs, any new software should try and emit attributes tagged with the OID du jour rather than using deprecated OIDs. We can break extensions into two types, constraint extensions and informational extensions. Constraint extensions limit the way in which the key in a certificate, or the certificate itself, can be used. For example they may limit the key usage to digital signatures only, or limit the DNs for which a CA may issue certificates. The most common constraint extensions are basic constraints, key usage and extended key usage, certificate policies (modified by policy mappings and policy constraints), and name constraints. In contrast, informational extensions contain information which may or may not be useful for certificate users, but which doesn't limit the certificate use in any way. For example an informational extension may contain additional information which identifies the CA which issued it. The most common informational extensions are key identifiers and alternative names. The processing of these extensions is mostly specified in three different standards, which means that there are three often subtly incompatible ways to handle them. In theory, constraint extensions should be enforced religiously, however the three standards which cover certificates sometimes differ both in how they specify the interpretation of the critical flag, and how they require constraint extensions to be enforced. We could not get it out of our minds that some subtle but profoundly alien element had been added to the aesthetic feeling behind the technique. -- H.P.Lovecraft, "At the Mountains of Madness" The general idea behind the critical flag is that it is used to protect the issuing CA against any assumptions made by software which doesn't implement support for a particular extension (none of the X.509-related standards provide much of a definition for what a minimally, average, and fully compliant implementation needs to support, so it's something of a hit and miss proposition for an implementation to rely on the correct handling of a particular extension). One commentator has compared the various certificate contraints as serving as the equivalent of a Miranda warning ("You have the right to remain silent, you have the right to an attorney, ...") to anyone using the certificate. Without the critical flag, an issuer who believes that the information contained in an extension is particularly important has no real defence if the end users software chooses to ignore the extension. The original X.509v3 specification requires that a certificate be regarded as invalid if an unrecognised critical extension is encountered. As for the extension itself, if it's non-critical you can use whatever interpretation you choose (that is, the extension is specified as being of an advisory nature only). This means that if you encounter constraints which require that a key be used only for digital signatures, you're free to use it for encryption anyway. If you encounter a key which is marked as being a non-CA key, you can use it as a CA key anyway. The X.509v3 interpretation of extensions is a bit like the recommended 130 km/h speed limit on autobahns, the theoretical limit is 130, you're sitting there doing 180, and you're getting overtaken by Porsches doing about 250. The problem with the original X.509v3 definitions is that although they specify the action to take when you don't recognise an extension, they don't really define the action when you do recognise it. Using this interpretation, it's mostly pointless including non-critical extensions because everyone is free to ignore them (for example the text for the keyUsage extension says that "it is an advisory field and does not imply that usage of the key is restricted to the purpose indicated", which means that the main message it conveys is "I want to bloat up the certificate unnecessarily"). The second interpretation of extensions comes from the IETF PKIX profile. Like X.509v3, this also requires that a certificate be regarded as invalid if an unrecognised critical extension is encountered. However it seems to imply that a critical extension must be processed, and probably considers non-critical extensions to be advisory only. Unfortunately the wording is ambiguous enough that a number of interpretations exist. Section 4.2 says that "CAs are required to support ", but the degree of support is left open, and what non-CAs are supposed to do isn't specified. The paragraph which follows this says that implementations "shall recognise extensions", which doesn't imply any requirement to actually act on what you recognise. Even the term "process" is somewhat vague, since processing an extension can consist of popping up a warning dialog with a message which may or may not make sense to the user, with an optional "Don't display this warning again" checkbox. In this case the application certainly recognised the extension and arguably even processed it, but it didn't force compliance with the intent of the extension, which was probably what was intended by the terms "recognise" and "process". The third interpretation comes from S/MIME, which requires that implementations correctly handle a subset of the constraint and informational extensions. However, as with PKIX, "correctly handle" isn't specified, so it's possible to "correctly handle" an extension as per X.509v3, as per PKIX (choose the interpretation you prefer), or as per S/MIME, which leaves the issue open (it specifies that implementations may include various bits and pieces in their extensions, but not how they should be enforced). S/MIME seems to place a slightly different interpretation on the critical flag, limiting its use to the small subset of extensions which are mentioned in the S/MIME spec, so it's not possible to add other critical extensions to an S/MIME certificate. "But it izz written!" bellowed Beelzebub. "But it might be written differently somewhere else" said Crowley. "Where you can't read it". "In bigger letters" said Aziraphale. "Underlined" Crowley added. "Twice" suggested Aziraphale. -- Neil Gaiman and Terry Pratchett, "Good Omens" Finally, the waters are further muddied by CA policies, which can add their own spin to the above interpretations. For example the Verisign CPS, section 2.4.3, says that "all persons shall process the extension [...] or else ignore the extension", which would seem to cover all the bases. Other policies are somewhat more specific, for example Netscapes certificate extension specification says that the keyUsage extension can be ignored if it's not marked critical, but Netscape Navigator does appear to enforce the basicConstraints extension in most cases. The whole issue is complicated by the fact that implementations from a large vendor will reject a certificate which contains critical constraint extensions, so that even if you interpret the critical flag to mean "this extension must be enforced" (rather than just "reject this certificate if you don't recognise the extension"), you can't use it because it will render the certificate unusable. These implementations provide yet another interpretation of the critical flag, "reject this certificate if you encounter a critical extension". The same vendor also has software which ignores the critical flag entirely, making the software essentially useless to relying parties who can't rely on it to perform as required (the exact behaviour depends on the software and version, so one version might reject a certificate with a critical extension while another would ignore a critical extension). Zaphod stared at him as if expecting a cuckoo to leap out of his forehead on a small spring. -- Douglas Adams, "The Restaurant at the End of the Universe" Because of this confusion, it's probably impossible to include a set of constraint extensions in a certificate which will be handled properly by different implementations. Because of problems like this, the digital signature laws of some countries are requiring certification of the software being used as part of compliance with the law, so that you can't just claim that your software "supports X.509v3 certificates" (everyone claims this whether they actually do or not), you actually have to prove that it supports what's required by the particular countries' laws. If you're in a country which has digital signature legislation, make sure the software you're using has been certified to conform to the legal requirements. The best interpretation of constraint extensions is that if a certificate is marked as an X.509v3 certificate, constraints should always be enforced. This includes enforcing implied settings if the extension is missing, so that a certificate being used in a CA role which has no basicConstraints extension present should be regarded as being invalid (note however the problem with PKIX-compliant certificates described later on). However even if one of the standards is reworded to precisely define extension handling, there are still plenty of other standards and interpretations which can be used. The only solution to this would be to include a critical policy extension which requires that all constraint extensions up and down the cert chain be enforced. Going back to the autobahn analogy, this mirrors the situation at the Austrian border, where everyone slows down to the strictly enforced speed limit as soon as they cross the border. Currently the only way to include a constraint enforcement extension is to make it a critical policy extension. This is somewhat unfortunate since including some other random policy may make the extension unrecognisable, causing it, and the entire certificate, to be rejected (as usual, what constitutes an unrecognisable extension is open to debate: if you can process all the fields in an extension but don't recognise the contents of one of the fields, it's up to you whether you count this as being unrecognisable or not). A better alternative would be to define a new extension, enforceConstraints: enforceConstraints EXTENSION ::= { SYNTAX EnforceConstraintsSyntax IDENTIFIED BY id-ce-enforceConstraints } EnforceConstraintsSyntax ::= BOOLEAN DEFAULT FALSE This makes the default setting compatible with the current "do whatever you feel like" enforcement of extensions. Enforcing constraints is defined as enforcing all constraints contained in constraint extensions, incuding implied settings if the extension is missing, as part of the certificate chain validation process (which means that they should be enforced up and down the cert chain). Recognising/supporting/handling/ an extension is defined as processing and acting on all components of all fields of an extension in a manner which is compliant with the semantic intent of the extension. 'Where was I?' said Zaphod Beeblebrox the Fourth. 'Pontificating' said Zaphod Beeblebrox. 'Oh yes'. -- Douglas Adams, "The Restaurant at the End of the Universe" Just to mention a further complication with critical extensions, there are instances in which it's possible to create certificates which are always regarded as being invalid due to conflicts with extensions. For example a generation n-1 critical extension might be replaced by a generation n critical extension, resulting in a mixture of certs with generation n-1 extensions, generation n-1 and generation n extensions (for compatibility) and (eventually) generation n extensions only. However until every piece of software is upgraded, generation n-1 software will be forced to reject all certs with generation n extensions, even the (supposedly) backwards-compatibile certs with both generations of extension in them. 'Mr.Beeblebrox, sir', said the insect in awed wonder, 'you're so weird you should be in movies'. -- Douglas Adams, "The Restaurant at the End of the Universe" Key Usage, Extended Key Usage, and Netscape cert-type X.509 and PKIX use keyUsage and extKeyUsage to select the key to use from a selection of keys unless the extension is marked critical, in which case it's treated as a usage restriction. Microsoft claims to support key usage enforcement, although experimentation with implementations has shown that it's mostly ignored (see the entry on Microsoft bugs further on). In addition if an extKeyUsage extension is present, all certificates in the chain up to the CA root must also support the same extKeyUsage (so that, for example, a general- purpose CA can't sign a server gated crypto certificate - the reasoning behind this is obvious). As it turns out though, extKeyUsage seems to be mostly ignored just like keyUsage. Netscape uses keyUsage as a key selection mechanism, and uses the Netscape cert-type extension in a complex manner described in the Netscape certificate extension specification. Since the cert-type extension includes the equivalent of the basicConstraints CA flag, it's possible to specify some types of CA with the cert-type extension. If you do this, you should be careful to synchronise the basicConstraints CA flag with the setting of the cert-type extension because some implementations (you can probably guess which one) will allow a Netscape CA-like usage to override a non-CA keyUsage value, treating the certificate as if it were a CA certificate. In addition Netscape also enforces the same extKeyUsage chaining as Microsoft. Unfortunately the extKeyUsage chaining interpretation is wrong according to PKIX, since the settings apply to the key in the certificate (ie the CA's key) rather than the keys in the certificates it issues. In other words an extKeyUsage of emailProtection would indicate that the CA's certificate is intended for S/MIME encryption, not that the CA can issue S/MIME certificates. Both of the major implementators of certificate-handling software use the chaining interpretation, but there also exist implementations which use the PKIX interpretation, so the two main camps will fail to verify the other side's cert chains unless they're in the (smaller) third camp which just ignores extKeyUsage. For keyUsage there is much disagreement over the use of the digitalSignature and nonRepuduation bits since there was no clear definition in X.509 of when the nonrepudiation flag should be used alongside or in place of the digital signature flag. One school of thought holds that digitalSignature should be used for ephemeral authentication (something which occurs automatically and frequently) and nonRepuduation for legally binding long-term signatures (something which is performed consciously and less frequently). Another school of thought holds that nonRepuduation should act as an additional function for the digitalSignature mechanism, with digitalSignature being a mechanism bit and nonRepuduation being a service bit. The different profiles are split roughly 50:50 on this, with some complicating things by specifying that both bits should be set but the certificate not be used for one or the other purpose. Probably the best usage is to use digitalSignature for "digital signature for authentication purposes" and nonRepudiation for "digital signature for nonrepudiation purposes". "I think" said the Metatron, "that I shall need to seek further instructions". "I alzzo" said Beelzebub. -- Neil Gaiman and Terry Pratchett, "Good Omens" In terms of profiles, MISSI and FPKI follow the above recommendation, PKIX uses nonRepudiation strictly for nonrepudiation and digitalSignature for everything else, ISO uses digitalSignature for entity authentication and nonRepudiation strictly for nonrepudiation (leaving digital signatures for data authentication without nonrepudiation hanging), and others use something in between. When this issue was debated on PKI lists in mid-1998, over 100 messages were exchanged without anyone really being able to uncontestably define what digitalSignature and nonRepudiation really signified. The issue is further confused by the fact that noone can agree on what the term "nonRepudiation" actually means, exemplified by a ~200-message debate in mid-1999 which couldn't reach any useful conclusion. He had attached the correct colour-coded wires to the correct pins; he'd checked that it was the right amperage fuse; he'd screwed it all back together. So far, no problems. He plugged it into the socket. Then he switched the socket on. Every light in the house went out. -- Neil Gaiman and Terry Pratchett, "Good Omens" Although everyone has their own interpretation, a good practical definition is "Nonrepudiation is anything which fails to go away when you stop believing in it". Put another way, if you can convince a user that it isn't worth trying to repudiate a signature then you have nonrepudiation. This can take the form of having them sign a legal agreement saying they won't try to repudiate any of their signatures, giving them a smart card and convincing them that it's so secure that any attempt to repudiate a signature generated with it would be futile, threatening to kill their kids, or any other method which has the desired effect. One advantage (for vendors) is that you can advertise just about anything as providing nonrepudiation, since there's sure to be some definition which matches whatever it is you're doing (there are "nonrepudiation" schemes in use today which employ a MAC using a secret shared between the signer and the verifier, which must be relying on a particularly creative definition of nonrepudiation). Bei ihnen auf dem Server muesste irgendwie ein Key rumliegen, den ich mit Netscape vermutlich erzeugt hab. Wenn da mein Name drin steht, dann wird er das schon sein. Koennten sie mir den zertifizieren? -- endergone Zwiebeltuete One might as well add a "crimeFree" (CF) bit with usage specified as 'The crimeFree bit is asserted when subject public key is used to verify digital signatures for transactions that are not a perpetration of fraud or other illegal activities' -- Tony Bartoletti on ietf-pkix I did have the idea that we mandate that CAs MUST set this bit randomly whenever a keyUsage extension is present, just to stop people who argue that its absence has a meaning. -- Stephen Farrell on ietf-pkix Basic Constraints This is used to specify whether a certificate is a CA certificate or not. You should always mark this critical, because otherwise some implementations will ignore it and allow a non-CA certificate to act as a CA. Alternative Names The subject and issuer alternative names are used to specify all the things which aren't suitable for a DN, which for most purposes means practically everything of any use on the Internet (X.509v3 defines the alternative names (or, more specifically, the GeneralName type) for use in specifying identifying information which isn't suited for, or part of, a DN). This includes email addresses, URL's for web pages, server addresses, and other odds and ends like X.400 and EDI addresses. There's also a facility to include your postal address, physical address, phone, fax and pager numbers, and of course the essential MPEG of your cat. The alternative names can be used for certificate identification in place of the DNs, however the exact usage is somewhat unclear. In particular if an altName is used for certificate chaining purposes, there's a roughly 50/50 split of opinion as to whether all the items in the altName must match or any one of the items must match. For example if an altName contains three URL's in the issuer and one in the client (which matches one of the issuer URL's), noone really knows whether this counts as a valid altName match or not. Eventually someone will make a decision one way or the other, probably the S/MIME standards group who are rather reliant on altNames for some things (the S/MIME group have requested that the PKIX group make DNs mandatory in order to allow proper certificate chaining, and the S/MIME specs themselves require DNs for CAs). Until this is sorted out, it's not a good idea to rely on altNames for chaining. This confusion is caused by the fact that an altName is serving two conflicting purposes. The first of these is to provide extra information on the certificate owner which can't be specified in the DN, including things like their phone number, email address, and physical address. The second is to provide an alternative to the ITU-managed (or, strictly speaking, non-managed) DN space. For example a DNS name or IP address, which falls outside the range of ITU (non-)management, is controlled by the IETF, which has jurisdiction over the name space of the Internet-related altName components. However since an altName can only specify a collection of names with a single critical attribute to cover all of them, there's no way to differentiate between something which really is critical (for example an rfc822Name being used in place of DN) and something which is merely present to provide extra details on the certificate owner (an rfc822Name being provided as a contact address). One IETF draft overloaded this even further by forcing authorityInfoAccess semantics onto some of the altName components. This ambiguity is complicated by the presence of other attributes like path processing constraints. For example does an included or excluded subtree constraint containing a DN cover the subjectName DN, the altName directoryName, or both?. More seriously, since a subjectName and altName directoryName have the same form, it's possible to specify an identical DN in two different ways across an issuer and subject cert, leading to the same problem described below in the name constraints section in which it's possible to evade constraint checks by using alternative encodings for the same item. The solution to this problem would be to split the altName into two new extensions, a true altName which provides a single alternative to the subjectName (for example a single dNSName or rfc822Name) and which is used only when the subject DN is empty, and a collection of other information about the subject which follows the current altName syntax but which is used strictly for informational purposes. The true altName provides a single alternative for the subjectName, and the informational altName provides any extra identification information which the subject may want to include with their certificate. A different (and much uglier) solution is to try and stuff everything imaginable into a DN. The problem with this approach is that it completely destroys any hope of interoperability with directories, especially X.500 directories which rely on search arguments being predefined as a type of filter. Unless you have this predefined filter, you can't easily search the directory for a match, so it's necessary to have some limits placed on the types of names (or schemas in X.500-speak) which are acceptable. Unfortunately the "stuff everything into a DN" approach violates this principle, making the result un-searchable within a directory, which voids the reason for having the DN in the first place. Certificate Policies and Policy Mappings and Constraints The general idea behind the certificate policies extension is simple enough, it provides a means for a CA to identify which policy a certificate was issued under. This means that when you check a certificate, you can ensure that each certificate in the chain was issued under a policy you feel comfortable with (certain security precautions taken, vetting of employees, physical security of the premises, no loud ties, that sort of thing). The certificatePolicies extension (in its minimal form) provides a means of identifying the policy a certificate was issued under, and the policyMappings extension provides a means of mapping one policy to another (that is, for a CA to indicate that policy A, under which it is issuing a certificate, is equivalent to policy B, which is required by the certificate user). Unfortunately on top of this there are qualifiers for the certificatePolicies and the policyConstraints extension to muddy the waters. As a result, a certificate policy often consists of a sequence of things identified by unknown object identifiers, each with another sequence of things identified by partially known, partially unknown object identifiers, with optional extra attachments containing references to other things which software is expected to know about by magic or possibly some form of quantum tunnelling. Marx Brothers fans will possibly recall the scene in "A Day at the Races" in which Groucho, intending to put his money on Sun-up, is induced instead to buy a coded tip from Chico and is able to establish the identity of the horse only, at some cost in terms of time and money, by successive purchases from Chico of the code book, the master code book, the breeders' guide and various other works of reference, by the end of which the race is over, Sun-up having won. -- Unknown, forwarded by Ed Gerck to cert-talk This makes it rather difficult to make validity decisions for a certificate which have anything more complex than a basic policyIdentifier present. Because of this, you should only use a single policyIdentifier in a certificate, and forgo the use of policy qualifiers and other odds and ends. Currently noone but Verisign appears to use these, the presence of these qualifiers in the PKIX profile may be related to the presence of Verisign in the PKIX profiling process. Name Constraints The name constraints are used to constrain the certificates' DN to lie inside or outside a particular subtree, with the excludedSubtrees field taking precedence over the permittedSubtrees field. The principal use for this extension is to allow balkanization of the certificate namespace, so that a CA can restrict the ability of any CAs it certifies to issue certificates outside a very restricted domain. Unfortunately if the X.500 string encoding rules are followed, it's always possible to avoid the excludedSubtrees by choosing unusual (but perfectly valid) string encodings which don't appear to match the excludedSubtrees (see the section on string encodings for more details on this problem). Although PKIX constrains the string encodings to allow the nameConstraints to function, it's a good idea to rely on permittedSubtrees rather than excludedSubtrees for constraint enforcement (actually since virtually nothing supports this extension, it's probably not a good idea to rely too much on either type of constraint, but when it is supported, use permitted rather than excluded subtrees). Subject and Authority Key Identifiers These are used to provide additional identification information for a certificate. Unfortunately it's specified in a somewhat complex manner which requires additional ASN.1 constraints or text to explain it, you should treat it as if it were specified with the almost-ASN.1 of: AuthorityKeyIdentifier ::= CHOICE { keyIdentifier [ 0 ] OCTET STRING, authorityCert [ 1 ] GeneralNames, authoritySerialNumber [ 2 ] INTEGER } X.509v3 allows you to use both types of identifier, but other standards and profiles either recommend against this or explicitly disallow it, allowing only the keyIdentifier. Various profiles have at various times required the use of the SHA-1 hash of the public key (whatever that constitutes), the SHA-1 hash of the subjectPublicKeyInfo data (for some reason this has to be done *without* the tag and length at the start), the SHA-1 hash of the subjectPublicKey (again without the tag, length, and unused bits portion of the BIT STRING, which leaves just the raw public key data but omits the algorithm identifier and parameters so that two keys for different algorithms with different parameters which happen to share the same public key field end up with the same hash), a 64-bit hash of the subjectPublicKeyInfo (presumably with the tag and length), a 60-bit hash of the subjectPublicKey (again with tag and length) with the first four bits set to various values to indicate MISSI key types, and some sort of unique value such as a monotonically increasing sequence. Several newer profiles have pretty much given up on this and simply specify "a unique value". RFC 2459 also allows "a monotomically increasing sequence of integers", which is a rather bad move since the overall supply of unique small integers is somewhat limited and this scheme will break as soon as a second CA decides to issue a cert with a "unique" subjectKeyIdentifier of the same value. To balance the problems caused by this mess of conflicting and incompatible standards, it appears that most implementations either ignore the keyIdentifier entirely or don't bother decoding it, because in 1997 and 1998 a widely-used CA accidentally issued certificates with an incorrect encoding of the keyIdentifier (it wasn't even valid ASN.1 data, let alone X.509-conformant ASN.1) without anyone noticing. Although a few standards require that a keyIdentifier be used, its absence doesn't seem to cause any problems for current implementations. Recommendation: Don't even try and decode the authorityKeyIdentifier field, just treat everything inside the OCTET STRING hole as an opaque blob. Given that most current implementations seem to ignore this extension, don't create certificate chains which require it to be processed in order for the chaining to work. The claimed reason for using the keyIdentifier rather than the issuerAndSerialNumber is because it allows a certificate chain to be re-rooted when an intermediate CA changes in some manner (for example when its responsibilities are handed off from one department in an organisation to another). If the certificate is identified through the keyIdentifier, no nameConstraints are present, the certificate policies are identical or mapped from one to the other, the altNames chain correctly, and no policyConstraints are present, then this type of re-rooting is possible (in case anyone missed the sarcasm in there, the gist is that it's highly unlikely to work). Other Extensions There are a wide variety of other extensions defined in various profiles. These are in effect proprietary extensions because unless you can track down something which recognises them (typically a single-vendor or small-group-of- vendors product), you won't be able to do much with them - most software will either ignore them completely or reject the certificate if the extension is marked critical and the software behaves as required. Unless you can mandate the use of a given piece of certificate-processing software which you've carefully tested to ensure it processes the extension correctly, and you can block the use of all other software, you shouldn't rely on these extensions. Obviously if you're in a closed, carefully controlled environment (for example a closed shop EDI environment which requires the use of certain extensions such as reliance limits) the use of specialised extensions isn't a problem, but otherwise you're on your own. In addition to the use of other people's extensions, you may feel the need to define your own. In short if you're someone other than Microsoft (who can enforce the acceptance of whatever extensions they feel like), don't do it. Not only is it going to be practically impossible to find anything to support them (unless you write it yourself), but it's also very easy to stuff all sorts of irrelevant, unnecessary, and often downright dangerous information into certificates without thinking about it. The canonical example of something which has no place in a certificate is Microsoft's cAKeyCertIndexPair extension, which records the state of the CA software running on a Windows 2000 machine at the time the certificate was generated (in other words it offloads the CA backup task from the machine's administrator to anyone using one of the certificates). Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it. -- Linus Torvalds The canonical example of a dangerous certificate extension is one which indicates whether the owner is of legal age for some purpose (buying alcohol/driving/entry into nightclubs/whatever). Using something like a drivers license for this creates a booming demand for forged licenses which, by their very nature, are difficult to create and tied to an individual through a photo ID. Doing the same thing with a certificate creates a demand for those over the age limit to make their keys (trivially copied en masse and not tied to an individual) available to those under the age limit, or for those under the age limit to avail themselves of the keys in a surreptitious manner. The fact that the borrowed key which is being used to buy beer or rent a porn movie can also be used to sign a legally binding contract or empty a bank account probably won't be of concern until it's too late. This is a good example of the law of unintended consequences in action. If scientists can be counted on for anything, it's for creating unintended consequences. -- Michael Dougan A related concern about age indicators in certificates, which was one of the many nails in X.500's coffin, is the fact that giving a third party the information needed to query a certificate directory in order to locate, for example, all teenage girls in your localityName, probably wouldn't be seen as a feature by most certificate holders. Similar objections were made to the use of titles in DNs, for example a search on a title of "Ms" would have allowed someone to locate all single women in their localityName, and full-blown X.500 would have provided their home addresses and probably phone numbers to boot. Until early 1999 this type of extension only existed as a hypothetical case, but it's now present as a mandatory requirement in at least one digital signature law, which also has as a requirement that all CAs publish their certificates in some form of openly-accessible directory. I saw, and I heard an eagle, flying in mid heaven, saying with a loud voice, "Woe! Woe! Woe for those who dwell on the earth" -- Revelations 8:15 Character Sets -------------- Character strings are used in various places (most notably in DNs), and are encumbered by the fact that ASN.1 defines a whole series of odd subsets of ASCII/ISO 646 as character string types, but only provides a few peculiar and strange oddball character encodings for anything outside this limited character range. The protruding upper halves of the letters now appear to read, in the local language, "Go stick your head in a pig", and are no longer illuminated, except at times of special celebration. -- Douglas Adams, "The Restaurant at the End of the Universe" To use the example of DNs, the allowed string types are: DirectoryString ::= CHOICE { teletexString TeletexString (SIZE (1..maxSize)), printableString PrintableString (SIZE (1..maxSize)), bmpString BMPString (SIZE (1..maxSize)), universalString UniversalString (SIZE (1..maxSize)) } The easiest one to use, if you can get away with it, is IA5String, which is basically 7-bit ASCII (including all the control codes), but with the dollar sign potentially replaced with a "currency symbol". A more sensible alternative is VisibleString (aka ISO646String), which is IA5String without the control codes (this has the advantage that you can't use it to construct macro viruses using ANSI control sequences). In the DirectoryString case, you have to make do with PrintableString, which is one of the odd ASCII/ISO 646 subsets (for example you can't encode an '@', which makes it rather challenging to encode email addresses). Beyond that there is the T.61/TeletexString, which can select different character sets using escape codes (this is one of the aforementioned "peculiar and strange oddball encodings"). The character sets are Japanese Kanji (JIS C 6226-1983, set No. 87), Chinese (GB 2312-80, set No. 58), and Greek, using shifting codes specified in T.61 or the international version, ISO 6937 (strictly speaking T61String isn't defined in T.61 but in X.680, which defines it by profiling ISO 2022 character set switching). Some of the characters have a variable-length encoding (so it takes 2 bytes to encode a character, with the interpretation being done in a context-specific manner). The problem isn't helped by the fact that the T.61 specification has changed over the years as new character sets were added, and since the T.61 spec has now been withdrawn by the ITU there's no real way to find out exactly what is supposed to be in there (but see the previous comment on T.61 vs T61String - a T61String isn't really a T.61 string). Even using straight 8859-1 in a T61String doesn't always work, for example the 8859-1 character code for the Norwegian OE (slashed O) is defined using a T.61 escape sequence which, if present in a certificate, may cause a directory to reject the certificate. And then there came the crowning horror of all - the unbelievable, unthinkable, almost unmentionable thing. -- H.P.Lovecraft, "The Statement of Randolph Carter" For those who haven't reached for a sick bag yet, one definition of T61String is given in ISO 1990 X.208 which indicates that it contains registered character sets 87, 102 (a minimalist version of ASCII), 103 (a character set with the infamous "floating diacritics" which means things like accented characters are encoded as " + " rather than with a single character code), 106 and 107 (two useless sets containing control characters which noone would put in a name), SPACE + DELETE. The newer ITU-T 1997 and ISO 1998 X.680 adds the character sets 6, 126, 144, 150, 153, 156, 164, 165, and 168 (the reason for some of these additions is because once a character set is registered it can never change except by "clarifying" it, which produces a completely new character set with a new number (as with sex, once you make a mistake you end up having to support it for the rest of your life)). In fact there are even more definitions of T61String than that: The original CCITT 1984 ASN.1 spec defined T61String by reference to a real T.61 recommendation (from which finding the actual permitted characters is challenging, to put it mildly), then the ISO 1986 spec defined them by reference to the international register, then the CCITT 1988 spec changed them again (the ISO 1990 spec described above may be identical to the CCITT 1988 one), and finally they were changed again for ISO/ITU-T 1994 (this 1994 spec may again be the same as ITU-T 1997 and ISO 1998). I'm not making this up! The disciples came to him privately, saying, "Tell us, what is the sign of your coming, and of the end of the world?" [...] "You will hear of wars and rumors of wars; there will be famines, plagues, and earthquakes in various places; the sun will be darkened, the moon will not give her light, the stars will fall from the sky, the powers of the heavens will be shaken; certificates will be issued with floating diacritics in their DNs; and then the end will come". -- Matthew 24 (mostly) The encoding for this mess is specified in X.209 which indicates that the default character sets at the start of a string are 102, 106 and 107, although in theory you can't really make this assumption without the appropriate escape sequences to invoke the correct character set. The general consensus amoung the X.500/ISODE directory crowd is that you assume that set 103 is used by default, although Microsoft and Netscape had other ideas for their LDAPv2 products. In certificates, the common practice seems to be to use straight latin-1, which is set numbers 6 and 100, the latter not even being an allowed T61String set. There are those who will say Danforth and I were utterly mad not to flee for our lives after that; since our conclusions were now completely fixed, and of a nature I need not even mention to those who have read my account so far. -- H.P.Lovecraft, "At the Mountains of Madness" Next are the BMPString and UniversalString, with BMPString having 16-bit characters (UCS-2) and UniversalString having 32-bit characters (UCS-4), both encoded in big-endian format. BMPString is a subset of UniversalString, being the 16-bit character range in the 0/0 plane (ie the UniversalString characters in which the 16 high bits are 0), corresponding to straight ISO 10646/Unicode characters. The ASN.1 standard says that UniversalString should only be used if the encoding possibilities are constrained, it's better to avoid it entirely and only use BMPString/ISO 10646/Unicode. However, there is a problem with this: at the moment few implementors know how to handle or encode BMPStrings, and people have made all sorts of guesses as to how Unicode strings should be encoded: with or without Unicode byte order marks (BOMs), possibly with a fixed endianness, and with or without the terminating null character. I might as well be frank in stating what we saw; though at the time we felt that it was not to be admitted even to each other. The words reaching the reader can never even suggest the awfulness of the sight itself. -- H.P.Lovecraft, "At the Mountains of Madness" The correct format for BMPStrings is: big-endian 16-bit characters, no Unicode byte order marks (BOMs), and no terminating null character (ISO 8825-1 section 8.20). An exception to this is PFX/PKCS #12, where the passwords are converted to a Unicode BMPString before being hashed. However both Netscape and Microsoft's early implementations treated the terminating null characters as being part of the string, so the PKCS #12 standard was retroengineered to specify that the null characters be included in the string. A final string type which is presently only in the PKIX profile but which should eventually appear elsewhere is UTF-8, which provides a means of encoding 7, 8, 16, and 32-bit characters into a single character string. Since ASN.1 already provides character string types which cover everything except some of the really weird 32-bit characters which noone ever uses, It was covered in symbols that only eight other people in the world would have been able to comprehend; two of them had won Nobel prizes, and one of the other six dribbled a lot and wasn't allowed anything sharp because of what he might do with it. -- Neil Gaiman and Terry Pratchett, "Good Omens" the least general encoding rule means that UTF-8 strings will practically never be used. The original reason they were present in the PKIX profile is because of an IETF rule which required that all new IETF standards support UTF-8, but a much more compelling argument which recently emerged is that, since most of the other ASN.1 character sets are completely unusable, UTF-8 would finally breathe a bit of sanity into the ASN.1 character set nightmare. Unfortunately, because it's quite a task to find ASN.1 compilers (let alone certificate handling software) which supports UTF-8, you should avoid this string type for now. PKIX realised the problems which would arise and specified a cutover date of 1 January 2004 for UTF-8 use. Some drafts have appeared which recommend the use of RFC 2482 language tags, but these should be avoided since they have little value (they're only needed for machine processing, if they appear in a text string intended to be read by a human they'll either understand it or they won't and a language tag won't help). In addition UTF-8 language tags are huge (about 30 bytes) due to the fact that they're located out in plane 14 in the character set (although I don't have the appropriate reference to hand, plane 14 is probably either Gehenna or Acheron), so the tag would be much larger than the string being tagged in most cases. One final problem with UTF-8 is that it shares some of the T.61 string problems in which it's possible for a malicious encoder to evade checks on strings either by using different code points which produce identical-looking characters when displayed or by using suboptimal encodings (in ASN.1 terms, non-distinguished encodings) of a code point. They are aided in this by the standard, which says (page 47, section 3.8 of the Unicode 3.0 standard) that "when converting from UTF-8 to a Unicode scalar value, implementations do not need to check that the shortest encoding is being used. This simplifies the conversion algorithm". What this means is that it's possible to encode a particular character in a dozen different ways in order to evade a check which uses a straight byte-by-byte comparison as specified in RFC 2459. Although some libraries such as glibc 2.2 use "safe" UTF-8 decoders which will reject non-distinguished encodings, it's not a good idea to assume that everyone does this. Because of these problems, the SET designers produced their own alternative, SETString, for places were DNs weren't required for compatibility purposes. The design goals for the SETString were to both provide the best coverage of ASCII and national-language character sets, and also to minimise implementation pain. The SETString type is defined as: SETString ::= CHOICE { visibleString VisibleString (SIZE (1..maxSIZE)), bmpString BMPString (SIZE (1..maxSIZE)) } This provides complete ASCII/ISO 646 support using single byte characters, and national language support through Unicode, which is in common use by industry. In addition the SET designers decided to create their own version of the DirectoryString which is a proper subset of the X.500 version. The initial version was just an X.500 DirectoryString with a number of constraints applied to it, but just before publication this was changed to: DirectoryString ::= CHOICE { printableString PrintableString (SIZE(1..maxSIZE)), bmpString BMPString (SIZE(1..maxSIZE)) } You must unlearn what you have learned. -- Yoda It was felt that this improved readablility and interoperability (and sanity). T61String was never seriously considered in the design, and UniversalString with its four-byte characters had no identifiable industry support and required too much overhead. If you want to produce certs which work for both generic X.509 and SET, then using the SET version of the DirectoryString is a good idea. It's trivial to convert an ISO 8859-1 T61String to a BMPString and back (just add/subtract a 0 byte every other byte). MISSI also subsets the string types, allowing only PrintableString and T61String in DNs. When dealing with these character sets you should use the "least inclusive" set when trying to determine which encoding to use. This means trying to encode as PrintableString first, then T61String, and finally BMPString/UniversalString. SET requires that either PrintableStrings or BMPStrings be used, with TeletexStrings and UniversalStrings being forbidden. From this we can build the following set of recommendations: - Use PrintableString if possible (or VisibleString or IA5String if this is allowed, because it's rather more useful than PrintableString). - If you use a T61String (and assuming you don't require SET compliance), avoid the use of anything involving shifting and escape codes at any cost and just treat it as a pure ISO 8859-1 string. If you need anything other than 8859-1, use a BMPString. - If it won't go into one of the above, try for a BMPString. - Avoid UniversalStrings. Version 7 of the PKIX draft dropped the use of T61String altogether (probably in response to this writeup :-), but this may be a bit extreme since the extremely limited character range allowed by PrintableString will result in many simple strings blowing out to BMPStrings, which causes problems on a number of systems which have little Unicode support. In 2004, you can switch to UTF-8 strings and forget about this entire section of the guide. I saw coming out of the mouth of the dragon, and out of the mouth of the beast, and out of the mouth of the false prophet, three unclean spirits, something like frogs; for they are spirits of demons, performing signs -- Biblical explanation of the origins of character set problems, Revelations 16:13-14, recommended rendition: Diamanda Galas, The Plague Mass. Comparing DNs ------------- This is an issue which is so problematic that it requires a section of its own to cover it fully. According to X.500, to compare a DN: - The number of RDNs must match. - RDNs must have the same number of AVAs. - Corresponding AVAs must match for equality: - Leading and trailing spaces are ignored. - Multiple internal spaces are treated as a single internal space. - Characters (not code points, which are a particular encoding of a character) are matched in a case-insensitive manner. As it turns out, this matching is virtually impossible to perform (more specifically, it is virtually impossible to accurately compare two DNs for equivalence). This, many claim, is not merely impossible but clearly insane, which is why the advertising executives of the star system of Bastablon came up with this quote: 'If you've done six impossible things this morning, why not round it off with breakfast at Milliways, the Restaurant at the End of the Universe?'. -- Douglas Adams, "The Restaurant at the End of the Universe" The reason for this is that, with the vast number of character sets, encodings, and alternative encodings (code points) for the same character, and the often very limited support for non-ASCII character sets available on many systems, it isn't possible to accurately and portably compare any RDNs other than those containing one of the basic ASCII string types. The best you can probably do is to use the strategy outlined below. First, check whether the number of RDNs is equal. If they match, break up the DNs into RDNs and check that the RDN types match. If they also match, you need to compare the text in each RDN in turn. This is where it gets tricky. He walked across to the window and suddenly stumbled because at that moment his Joo-Janta 200 Super-Chromatic Peril Sensitive sunglasses had turned utterly black. -- Douglas Adams, "The Restaurant at the End of the Universe" First, take both strings and convert them to either ASCII (ISO646String) or Unicode (BMPString) using the "least inclusive" rule. This is quite a task in itself, because several implementations aren't too picky about what they'll put into a string, and will stuff T61String characters into a PrintableString, or (more commonly) Unicode characters into a T61String or anything into a BMPString. Finding a T61String in a PrintableString or an 8-bit character set string in a BMPString is relatively easy, but the best guess you can take at detecting a Unicode string in a T61String is to check whether either the string length is odd or all the characters are ASCII or ASCII with the high bit set. If neither are true, it might be a Unicode string disguised as a T61String. Once this is done, you need to canonicalise the strings into a format in which a comparison can be done, either to compare strings of different types (eg 8-bit character set or DBCS string to BMPString) or strings with the same type but different encodings (eg two T61Strings using alternative encodings). To convert ASCII-as-Unicode to ASCII, just drop the odd-numbered bytes. Converting a T61String to Unicode is a bit more tricky. Under Windows 95 and NT, you can use MultiByteToWideChar(), although the conversion will depend on the current code page in use. On systems with widechar support, you can use mbstowcs() directly if the widechar size is the same as the BMPString char size (which it generally isn't), otherwise you need to first convert the string to a widechar string with mbstowcs() and then back down again to a BMPString, taking the machine endianness into account. Again, the behaviour of mbstowcs() will depend on the current locale in use. If the system doesn't have widechar support, the best you can do is a brute-force conversion to Unicode by hoping it's ISO 8859-1 and adding a zero byte every other byte. Now that you might have the strings in a format where they can be compared, you can actually try and compare them. Again, this often ends up as pure guesswork if the system doesn't support the necessary character sets, or if the characters use weird encodings which result in identical characters being located at different code points. First, check the converted character sets: If one is Unicode and the other isn't, then the strings probably differ (depending on how well the canonicalisation step went). If the types are the same, strip leading, trailing, and repeated internal spaces from the string, which isn't as easy as it sounds since there are several possible code points allocated to a space. Once you've had a go at stripping spaces, you can try to compare the strings. If the string is a BMPString then under Windows NT (but not Windows 95) you can use CompareString(), with the usual caveat that the comparison depends on the current locale. On systems which support towlower() you can extract the characters from the string into widechars (taking machine endianness into account) and compare the towlower() forms, with the usual caveat about locale and the added problem that towlower() implementations vary from bare-bones (8859-1 only under Solaris, HPUX, AIX) to vague (Unicode under Win95, OSF/1). If there's no support for towlower(), the best you can do is use the normal tolower() if the characters have a high byte of zero (some systems will support 8859-1 for tolower(), the worst which can happen is that the characters will be returned unchanged), and compare the code points directly if it isn't an 8-bit value. Zaphods skin was crawling all over his body as if it was trying to get off. -- Douglas Adams, "The Restaurant at the End of the Universe" Finally, if it's an ASCII string, you can just use a standard case-insensitive string comparison function. As you can see, there's almost no way to reliably compare two RDN elements. In particular, no matter what you do: - Some malicious users will deliberately pass DN checks with weird encodings. - Some normal users will accidentally fail DN checks with weird encodings. This becomes an issue when certain security checks depend on a comparison of DNs (for example with excluded subtrees in the Name Constraints extension) because it's possible to create multiple strings which are displayed identically to the user (especially if you know which platform and/or software to target) assuming they get displayed at all, but which compare as different strings. If you want to be absolutely certain about DN comparisons, you might need to set a certificate policy of only allowing PrintableStrings in DNs, because they're the only ones which can be reliably compared. PKCS #10 -------- According to the PKCS #10 spec, the attributes field is mandatory, so if it's empty it's encoded as a zero-length field. The example however assumes that if there are no attributes, the field shouldn't be present, treating it like an OPTIONAL field. A number of vendors conform to the example rather than the specification, but just to confuse the issue RSADSI, who produced PKCS #10, regard things the other way around, with the spec being right and the example being wrong. The most obvious effect of this is that TIPEM (which was the only available toolkit for PKCS#10 for a long time) follows the spec and does it "wrong (sense #2)", whereas more recent independant implementations follow the example and do it "wrong (sense #1)". Unfortunately it's difficult to handle certificate requests correctly and be lenient on decoding. Because a request could be reencoded into DER before checking the signature, funny things can happen to your request at the remote end if the two interpretations of PKCS #10 differ. Because of this confusion, you should be prepared to accept either version when decoding, but at the moment it's not possible to provide any recommendation for encoding. When encountering a particularly fascist parser which isn't aware of the PKCS #10 duality, it may be necessary to submit two versions of the request to determine which one works. No, no. Yes. No, I tried that. Yes, both ways. No, I don't know. No again. Are there any more questions? -- Xena, "Been There, Done That" PKCS #10 also dates from the days of X.509v1 and includes references to obsolete and deprecated objects and data formats. PKCS #6 extended certificates are a workaround for the abscence of certificate extensions in X.509v1 and shouldn't be used at all, and it's probably a good idea to avoid the use of PKCS #9 extended attributes as well (some certification request protocols bypass the use of PKCS #9 by wrapping extra protocol layers containing PKCS #9 elements around the outside of PKCS #10). Instead, you should use of the CMMF draft, which defines a new attribute identified by the OID {pkcs-9 14}, with a value of SEQUENCE OF Extension which allows X.509v3 attributes to be encoded into a PKCS #10 certification request. The complete encoding used to encode X.509v3 extensions into a PKCS #10 certification request is therefore: [0] IMPLICIT SET OF { -- attributes from PKCS #10 SEQUENCE { -- Attribute from X.501 OBJECT IDENTIFIER, -- type, {pkcs-9 14} SET OF { -- values SEQUENCE OF { -- ExtensionReq from CMMF draft } } } } As of late 1998, virtually all CAs ignore this information and at best add a few predefined extensions based on the options selected on the web page which was used to trigger the certification process. There are one or two implementations which do support it, and these provide a convenient means of specifying attributes for a certificate which don't involve kludges via HTML requests. Microsoft started supporting something like it in mid-1999, although they used their own incompatible OID in place of the PKCS #9 one to ensure non- compatibility with any other implementation (the extensions are encoded in the standard format, they're just identified in a way which means nothing else can read them). Unfortunately since PKCS #10 doesn't mention X.509v3 at all, there's no clear indication of what is and isn't valid as an attribute for X.509v3, but common sense should indicate what you can and can't use. For example a subjectAltName should be treated as a valid attribute, a basicConstraint may or may not be treated as valid depending on the CA's certification policy, and an authorityKeyIdentifier would definitely be an invalid attribute. SET provides its own version of PKCS #10 which uses a slightly different encoding to the above and handles the X.509v3 extensions keyUsage, privateKeyUsagePeriod (whose use is deprecated by PKIX for some reason), subjectAltName, and the SET extensions certificateType, tunneling, and additionalPolicy. Correctly handling the SET extensions while at the same time avoiding Microsoft's broken extensions which look very similar (see the "Known Bugs/Peculiarities" section) provides a particularly entertaining exercise for implementors. ASN.1 Design Guidelines ----------------------- This section contains some guidelines on what I consider good ASN.1 style. This was motivated both by the apparent lack of such guidelines in existing documents covering ASN.1, and by my seeing the ASN.1 definition of the X.400 ORAddress (Originator/Recipient Address). Although there are a number of documents which explain how to use ASN.1, there doesn't seem to be much around on ASN.1 style, or at least nothing which is easily accessible. Because of this I've noted down a few guidelines on good ASN.1 style, tuned towards the kind of ASN.1 elements which are used in certificate-related work. In most cases I'll use the X.400 ORAddress as an example of bad style (I usually use PFX for this since it's such a target-rich environment, but in this case I'll make an exception). The whole ORAddress definition is far too long to include here (it requires pages and pages of definitions just to allow the encoding of the equivalent of an email address), but I'll include excerpts where required. If you can't be a good example, then you'll just have to be a horrible warning. -- Catherine Aird Addendum: Recently I discovered a source of ASN.1 even worse than PFX and X.400, even worse than the pathological ASN.1 I created for the April 1 GUPP RFC, which was meant to be the most awful I could come up with. It can be found in the NIST "Government Smart Card Interoperability Specification", in case anyone's interested (look at sections 6 and 7). Truly impressive. To start off, keep your structure as simple as possible. Everyone always says this, but when working with ASN.1 it's particularly important because the notation gives you the freedom to design incredibly complicated structures which are almost impossible to work with. Bud, if dynamite was dangerous do you think they'd sell it to an idiot like me? -- Al Bundy, "Married with Children" Look at the whole ORAddress for an example. What we did see was something altogether different, and immeasurably more hideous and detestable. It was the utter, objective embodiment of the fantastic novelists 'thing that should not be'. -- H.P.Lovecraft, "At the Mountains of Madness" This includes provisions for every imaginable type of field and element which anyone could conceivably want to include in an address. Now although it's easy enough to feed the whole thing into an ASN.1 compiler and produce an enormous chunk of code which encodes and decodes the whole mess, it's still necessary to manually generate the code to interpret the semantic intent of the content. This is a complex and error-prone process which isn't helped by the fact that the structure contains dozens of little-understood and rarely-used fields, all of which need to be handled correctly by a compliant implementation. Given the difficulty of even agreeing on the usage of common fields in certificate extensions, the problems which will be raised by obscure fields buried fifteen levels deep in some definition aren't hard to imagine. ASN.1 *WHAM* is not *WHAM* COBOL *WHAM* *WHAM* *WHAM*. The whole point of an abstract syntax notation is that it's not tied to any particular representation of the underlying data types. An extreme example of reverse-engineering data type dependancy back into ASN.1 is X9.42's: OCTET STRING SIZE(4) -- (Big-endian) Integer Artificially restricting an ASN.1 element to fall within the particular limitations of the hardware you're using creates all sorts of problems for other users, and for the future when people decide that 640K isn't all the memory anyone will ever need. The entire point of ASN.1 is that it not be tied to a particular implementation, and that users not have to worry about the underlying data types. It can also create ambiguous encodings which void the DER guarantee of identical encodings for identical values: Although the ANSI/SET provision for handling currencies which may be present in amounts greater than 10e300 (requiring the amtExp10 field to extend the range of the ASN.1 INTEGER in the amount field) is laudable, it leads to nondeterministic encodings in which a single value can be represented in a multitude of ways, making it impossible to produce a guaranteed, repeatable encoding. Careful with that tagging Eugene! In recent ASN.1 work it seems to have become fashionable to madly tag everything which isn't nailed down, sometimes two or three times recursively for good measure (see the next point). The entire set of PDU's are defined using an incredible amount of gratuitous and unnecessary tagging. Were the authors being paid by the tag for this? -- Peter Gutmann on ietf-pkix For example consider the following ORAddress ExtensionAttribute: ExtensionAttribute ::= SEQUENCE { extension-attribute-type [0] INTEGER, extension-attribute-value [1] ANY DEFINED BY extension-attribute-type } (this uses the 1988 ASN.1 syntax, more recent versions change this somewhat). Both of the tags are completely unnecessary, and do nothing apart from complicating the encoding and decoding process. Another example of this problem are extensions like authorityKeyIdentifier, cRLDistributionPoints, and issuingDistributionPoint which, after several levels of nesting, have every element in a sequence individually tagged even though, since they're all distinct, there's no need for any of the tags. Another type of tagging is used for the ORAddress BuiltInStandardAttributes: BuiltInStandardAttributes ::= SEQUENCE { countryName [APPLICATION 1] CHOICE { ... } OPTIONAL, ... } Note the strange nonstandard tagging - even if there's a need to tag this element (there isn't), the tag should be a context-specific tag and not an application-specific one (this particular definition mixes context-specific and application-specific tags apparently at random). For tagging fields in sequences or sets, you should always use context-specific tags. Speaking of sequences and sets, if you want to specify a collection of items in data which will be signed or otherwise authenticated, use a SEQUENCE rather than a SET, since the encoding of sets causes serious problems under the DER. You can see the effect of this in newer PKCS #7 revisions, which substitute SEQUENCE almost everywhere where the older versions used a SET because it's far easier to work with the former even though what's actually being represented is really a SET and not a SEQUENCE. If you have optional elements in a sequence, it's always possible to eliminate the tag on the first element (provided it's not itself tagged), since it can be uniquely decoded without the tag. For example consider privateKeyUsagePeriod: PrivateKeyUsagePeriod :: = SEQUENCE { notBefore [ 0 ] GeneralizedTime OPTIONAL, notAfter [ 1 ] GeneralizedTime OPTIONAL } The first tag is unnecessary since it isn't required for the decoding, so it could be rewritten: PrivateKeyUsagePeriod :: = SEQUENCE { notBefore GeneralizedTime OPTIONAL, notAfter [ 0 ] GeneralizedTime OPTIONAL } saving an unneeded tag. Because of the ability to specify arbitrarily nested and redefined elements in ASN.1, some of the redundancy built into a definition may not be immediately obvious. For example consider the use of a DN in an IssuingDistributionPoint extension, which begins: IssuingDistributionPoint ::= SEQUENCE { distributionPoint [0] DistributionPointName OPTIONAL, ... } DistributionPointName ::= CHOICE { fullName [0] GeneralNames, ... } GeneralNames ::= SEQUENCE OF GeneralName GeneralName ::= CHOICE { ... directoryName [4] Name, ... } Name ::= CHOICE { rdnSequence RDNSequence } RDNSequence ::= SEQUENCE OF RelativeDistinguishedName RelativeDistinguishedName ::= SET OF AttributeTypeAndValue [It] was of a baroque monstrosity not often seen outside the Maximegalon Museum of Diseased Imaginings. -- Douglas Adams, "The Restaurant at the End of the Universe" Once we reach AttributeTypeAndValue we finally get to something which contains actual data - everything before that point is just wrapping. Now consider a typical use of this extension, in which you encode the URL at which CA information is to be found. This is encoded as: SEQUENCE { [0] { [0] { SEQUENCE { [6] "http://.." } } } } All this just to specify a URL! It looks like they were trying to stress-test their ASN.1 compilers. -- Roger Schlafly on stds-p1363 It smelled like slow death in there, malaria, nightmares. This was the end of the river alright. -- Captain Willard, "Apocalypse Now" Unfortunately because of the extremely broad definition used (a SEQUENCE OF GeneralName can encode arbitrary quantities of almost anything imaginable, for example you could include the contents of an entire X.500 directory in this extension), producing the code to correctly process every type of field and item which could occur is virtually impossible, and indeed the semantics for many types of usage are undefined (consider how you would use a physical delivery address or a fax number to access a web server). Because of the potential for creating over-general definitions, once you've written down the definition in its full form, also write it out in the compressed form I've used above, and alongside this write down the encoded form of some typical data. This will very quickly show up areas in which there's unnecessary tagging, nesting, and generality, as the above example does. An extreme example of the misuse of nesting, tagging, and generality is the ORName, which when fully un-nested is: ORName ::= [APPLICATION 0] SEQUENCE { [0] { SEQUENCE OF SET OF AttributeTypeAndValue OPTIONAL } } (it's not even possible to write all of this on a single line). This uses unnecessary tagging, nonstandard tagging, and unnecessary nesting all in a single definition. It will founder upon the rocks of iniquity and sink headfirst to vanish without trace into the seas of oblivion. -- Neil Gaiman and Terry Pratchett, "Good Omens" The actual effect of the above is pretty close to: ORName = Anything Another warning sign that you've constructed something which is too complex to be practical is the manner in which implementations handle its encoding. If you (or others) are treating portions of an object as a blob (without bothering to encode or decode the individual fields in it) then that's a sign that it's too complex to work with. An example of this is the policyQualifiers portion of the CertificatePolicies extension which, in the two implementations which have so far come to light which actually produce qualifiers, treat them as a fixed, opaque blob with none of the fields within it actually being encoded or decoded. In this case the entire collection of qualifiers could just as easily be replaced by a BOOLEAN DEFAULT FALSE to indicate whether they were there or not. Another warning sign that something is too complex is when your definition requires dozens of paragraphs of accompanying text and/or extra constraint specifications to explain how the whole thing works or to constrain the usage to a subset of what's specified. If it requires four pages of explanatory text to indicate how something is meant to work, it's probably too complex for practical use. No matter how grandiose, how well-planned, how apparently foolproof an evil plan, the inherent sinfulness will by definition rebound upon its instigators. -- Neil Gaiman and Terry Pratchett, "Good Omens" Finally, stick to standard elements and don't reinvent your own way of doing things. Taking the ORAddress again, it provides no less than three different incompatible ways of encoding a type-and-value combination for use in different parts of the ORAddress. The standard way of encoding this (again using the simpler 1988 syntax) is: Attribute ::= SEQUENCE { type OBJECT IDENTIFIER, value ANY DEFINED BY type } The standard syntax for field names is to use biCapitalised words, with the first letter in lowercase, for example: md5WithRSAEncryption certificateHold permittedSubtrees Let's take an example. Say you wanted to design an extension for yet another online certificate validation protocol which specifies a means of submitting a certificate validity check request. This is used so a certificate user can query the certificate issuer about the status of the certificate they're using. A first attempt at this might be: StatusCheck ::= SEQUENCE { statusCheckLocations [0] GeneralNames } Eliminating the unnecessary nesting and tagging we get: StatusCheck ::= GeneralNames However taking a typical encoding (a URL) we see that it comes out as: StatusCheck ::= SEQUENCE { [6] "http://..." } In addition the use of a SEQUENCE OF GeneralName makes the whole thing far to vague to be useful (someone would be perfectly within their rights to specify a pigeon post address using this definition, and I don't even want to get into what it would require for an implementation to claim it could "process" this extension). Since it's an online check it only really makes sense to do it via HTTP (or at least something which can be specified through a URL), so we simplify it down again to: StatusCheck ::= SEQUENCE OF IA5String -- Contains a URL We've now reached an optimal way of specifying the status check which is easily understandable by anyone reading the definition, and doesn't require enormous amounts of additional explanatory text (what to do with the URL and how to handle the presence of multiple URL's is presumably specified as part of the status-check protocol - all we're interested in is how to specify the location(s) at which the status check is performed). base64 Encoding --------------- Many programs allow certificate objects to be encoded using the base64 format popularised in PEM and MIME for transmission of binary data over text-only channels. The format for this is: -----BEGIN