JSON RFC 8259: subtle details that trip up production parsers

JSON is smaller than you think

The JSON grammar in RFC 8259 is tiny: objects, arrays, strings, numbers, and the three literals true, false, and null. That simplicity is why it is everywhere — and also why two surprises catch people off guard. First, any value can be the whole document: 42, "hello", and true are each complete, valid JSON. The older RFC 4627 required the top level to be an object or array, so some legacy tools still reject a bare value.

Second, the spec deliberately leaves several behaviours unspecified. It defines the syntax precisely but says little about duplicate keys, number precision, or ordering. Production parsers fill those gaps differently, and that is exactly where interoperability bugs live.

Duplicate keys: undefined behaviour

RFC 8259 says object names SHOULD be unique, not MUST. When a key repeats, the result is unspecified, so parsers diverge: most take the last value, some keep the first, a few raise an error, and a rare one keeps both. None of them is wrong by the spec.

This is a real security hazard. If one service validates {"role":"user","role":"admin"} by reading the first role and another service acts on the last, an attacker can slip a privilege past validation. Treat duplicate keys as an error anywhere the decision matters, and never depend on which value 'wins'.

Numbers: no integers, just trouble

JSON has a single number type. The RFC does not define a precision or a range — it only notes that for interoperability, implementations are expected to handle values that fit IEEE 754 double precision. In practice that means anything above 2^53 (9007194254740992... that is, 9007199254740992) can lose precision.

The classic bug: a 64-bit ID like 9007199254740993 becomes 9007199254740992 after JSON.parse in JavaScript, because numbers are doubles. The fix is to carry large integers as strings. A few more rules people forget: no leading zeros (007 is invalid), no leading +, no trailing decimal point (1. is invalid), and NaN and Infinity are not valid JSON — serialise them as null or a string.

Strings, Unicode, and encoding

Strings must use double quotes, and control characters from U+0000 to U+001F must be escaped. The forward slash may optionally be escaped as \/, which is why some encoders emit <\/script> — a legal way to avoid breaking an HTML script tag. RFC 8259 mandates UTF-8 for JSON exchanged between systems.

Characters outside the Basic Multilingual Plane (BMP), such as most emoji, are written as a surrogate pair of \u escapes rather than a single one. Lone, unpaired surrogates are technically invalid, but some parsers accept them — a portability trap that surfaces only when your data crosses into a stricter implementation.

Things that look like JSON but are not

The most common mistakes come from treating JSON like JavaScript. Trailing commas are invalid. Comments do not exist in JSON — JSON5 and JSONC add them, but a strict RFC 8259 parser will reject them. Single-quoted strings and unquoted object keys are also invalid, however familiar they look.

A subtler one is the byte-order mark (BOM). RFC 8259 says producers MUST NOT add a BOM and consumers MAY ignore one, but the BOM is not part of JSON, so a UTF-8 BOM at the start of a file can make a strict parser fail on the very first byte. Insignificant whitespace is limited to space, tab, line feed, and carriage return — nothing else.

Takeaways

Reject duplicate keys anywhere a security or correctness decision depends on the value.

Carry 64-bit integers as strings so they survive double-precision parsers like JavaScript's.

Do not rely on object key order — the spec defines objects as unordered, even if most parsers preserve insertion order.

Remember any value is valid top-level JSON under RFC 8259, not just objects and arrays.

If you want comments or trailing commas, you want JSON5/JSONC — convert before a strict parser sees the data. The JSON formatter on this site validates against the strict RFC 8259 grammar, so these issues surface immediately.