././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5694425 rfc8785-0.1.4/LICENSE0000644000000000000000000002367614675557122010611 0ustar00 Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5704424 rfc8785-0.1.4/README.md0000644000000000000000000000414714675557122011053 0ustar00# rfc8785.py [![CI](https://github.com/trailofbits/rfc8785.py/actions/workflows/tests.yml/badge.svg)](https://github.com/trailofbits/rfc8785.py/actions/workflows/tests.yml) [![PyPI version](https://badge.fury.io/py/rfc8785.svg)](https://pypi.org/project/rfc8785) [![Packaging status](https://repology.org/badge/tiny-repos/python:rfc8785.svg)](https://repology.org/project/python:rfc8785/versions) A pure-Python, no-dependency implementation of [RFC 8785], a.k.a. JSON Canonicalization Scheme or JCS. This implementation should be behaviorally comparable to [Andrew Rundgren's reference implementation], with the following added constraints: 1. This implementation does not transparently convert non-`str` dictionary keys into strings. Users must explicitly perform this conversion. 1. No support for indentation, pretty-printing, etc. is provided. The output is always minimally encoded. 2. All APIs produce UTF-8-encoded `bytes` objects or `bytes` I/O. ## Installation ```bash python -m pip install rfc8785 ``` ## Usage See the full API documentation [here]. ```python import rfc8785 foo = { "key": "value", "another-key": 2, "a-third": [1, 2, 3, [4], (5, 6, "this works too")], "more": [None, True, False], } rfc8785.dumps(foo) ``` yields: ```python b'{"a-third":[1,2,3,[4],[5,6,"this works too"]],"another-key":2,"key":"value","more":[null,true,false]}' ``` For direct serialization to an I/O sink, use `rfc8785.dump` instead: ```python import rfc8785 with open("/some/file", mode="wb") as io: rfc8785.dump([1, 2, 3, 4], io) ``` All APIs raise `rfc8785.CanonicalizationError` or a subclass on serialization failures. ## Licensing Apache License, Version 2.0. Where noted, parts of this implementation are adapted from [Andrew Rundgren's reference implementation], which is also licensed under the Apache License, Version 2.0. [RFC 8785]: https://datatracker.ietf.org/doc/html/rfc8785 [Andrew Rundgren's reference implementation]: https://github.com/cyberphone/json-canonicalization/tree/master/python3 [here]: https://trailofbits.github.io/rfc8785.py ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5704424 rfc8785-0.1.4/pyproject.toml0000644000000000000000000000416714675557122012512 0ustar00[build-system] requires = ["flit_core >=3.5,<4"] build-backend = "flit_core.buildapi" [project] name = "rfc8785" dynamic = ["version"] description = "A pure-Python implementation of RFC 8785 (JSON Canonicalization Scheme)" readme = "README.md" license = { file = "LICENSE" } authors = [{ name = "Trail of Bits", email = "opensource@trailofbits.com" }] classifiers = [ "Development Status :: 4 - Beta", "License :: OSI Approved :: Apache Software License", "Programming Language :: Python :: 3", "Topic :: File Formats :: JSON", "Topic :: Security :: Cryptography", ] dependencies = [] requires-python = ">=3.8" [project.optional-dependencies] doc = ["pdoc"] test = ["pytest", "pytest-cov", "coverage"] lint = [ # NOTE: ruff is under active development, so we pin conservatively here # and let Dependabot periodically perform this update. "ruff ~= 0.3", "mypy >= 1.0", "interrogate", ] dev = ["rfc8785[doc,test,lint]", "build"] [project.urls] Homepage = "https://pypi.org/project/rfc8785" Documentation = "https://trailofbits.github.io/rfc8785.py/" Issues = "https://github.com/trailofbits/rfc8785.py/issues" Source = "https://github.com/trailofbits/rfc8785.py" [tool.flit.module] name = "rfc8785" [tool.flit.sdist] include = ["test"] [tool.mypy] mypy_path = "src" packages = "rfc8785" allow_redefinition = true check_untyped_defs = true disallow_incomplete_defs = true disallow_untyped_defs = true ignore_missing_imports = true no_implicit_optional = true show_error_codes = true sqlite_cache = true strict_equality = true warn_no_return = true warn_redundant_casts = true warn_return_any = true warn_unreachable = true warn_unused_configs = true warn_unused_ignores = true [tool.ruff] line-length = 100 [tool.ruff.lint] select = ["E", "F", "I", "W", "UP"] [tool.ruff.lint.per-file-ignores] "test/**/*.py" = [ "D", # no docstrings in tests "S101", # asserts are expected in tests ] [tool.interrogate] # don't enforce documentation coverage for packaging, testing, the virtual # environment, or the CLI (which is documented separately). exclude = ["env", "test"] ignore-semiprivate = true fail-under = 100 ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5704424 rfc8785-0.1.4/src/rfc8785/__init__.py0000644000000000000000000000076014675557122013577 0ustar00""" The `rfc8785` APIs. See [RFC 8785](https://datatracker.ietf.org/doc/html/rfc8785) for a full definition of the JSON Canonicalization Scheme. ## Quick start ```python import rfc8785 rfc8785.dumps({"anything that can be json serialized": "here"}) ``` """ __version__ = "0.1.4" from ._impl import CanonicalizationError, FloatDomainError, IntegerDomainError, dump, dumps __all__ = [ "CanonicalizationError", "IntegerDomainError", "FloatDomainError", "dump", "dumps", ] ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5704424 rfc8785-0.1.4/src/rfc8785/_impl.py0000644000000000000000000001612314675557122013140 0ustar00""" Internal implementation module for `rfc8785`. This module is NOT a public API, and is not considered stable. """ from __future__ import annotations import math import re import typing from io import BytesIO _Scalar = typing.Union[bool, int, str, float, None] _Value = typing.Union[ _Scalar, typing.Sequence["_Value"], typing.Tuple["_Value"], typing.Mapping[str, "_Value"], ] _INT_MAX = 2**53 - 1 _INT_MIN = -(2**53) + 1 # These are adapted from Andrew Rundgren's reference implementation, # which is licensed under the Apache License, version 2.0. # See: # See: _ESCAPE = re.compile(r'[\x00-\x1f\\"\b\f\n\r\t]') _ESCAPE_DCT = { "\\": "\\\\", '"': '\\"', "\b": "\\b", "\f": "\\f", "\n": "\\n", "\r": "\\r", "\t": "\\t", } for i in range(0x20): _ESCAPE_DCT.setdefault(chr(i), f"\\u{i:04x}") class CanonicalizationError(ValueError): """ The base error for all errors during canonicalization. """ pass class IntegerDomainError(CanonicalizationError): """ The given integer exceeds the true integer precision of an IEEE 754 double-precision float, which is what JSON uses. """ def __init__(self, n: int) -> None: """ Initialize an `IntegerDomainError`. """ super().__init__(f"{n} exceeds safe integer domain for JSON floats") class FloatDomainError(CanonicalizationError): """ The given float cannot be represented in JCS, typically because it's infinite, NaN, or an invalid representation. """ def __init__(self, f: float) -> None: """ Initialize an `FloatDomainError`. """ super().__init__(f"{f} is not representable in JCS") def _serialize_str(s: str, sink: typing.IO[bytes]) -> None: """ Serialize a string as a JSON string, per RFC 8785 3.2.2.2. """ def _replace(match: re.Match) -> str: return _ESCAPE_DCT[match.group(0)] sink.write(b'"') try: # Encoding to UTF-8 means that we'll reject surrogates and other # non-UTF-8-isms. sink.write(_ESCAPE.sub(_replace, s).encode("utf-8")) except UnicodeEncodeError as e: raise CanonicalizationError("input contains non-UTF-8 codepoints") from e sink.write(b'"') def _serialize_float(f: float, sink: typing.IO[bytes]) -> None: """ Serialize a floating point number to a stable string format, as defined in ECMA 262 7.1.12.1 and amended by RFC 8785 3.2.2.3. """ # NaN and infinite forms are prohibited. if math.isnan(f) or math.isinf(f): raise FloatDomainError(f) # Python does not distinguish between +0 and -0. if f == 0: sink.write(b"0") return # Negatives get serialized by prepending the sign marker and serializing # the positive form. if f < 0: sink.write(b"-") _serialize_float(-f, sink) return # The remainder of this implementation is adapted from # Andrew Rundgren's reference implementation. # Now we should only have valid non-zero values stringified = str(f) exponent_str = "" exponent_value = 0 q = stringified.find("e") if q > 0: # Grab the exponent and remove it from the number exponent_str = stringified[q:] if exponent_str[2:3] == "0": # Suppress leading zero on exponents exponent_str = exponent_str[:2] + exponent_str[3:] stringified = stringified[0:q] exponent_value = int(exponent_str[1:]) # Split number in first + dot + last first = stringified dot = "" last = "" q = stringified.find(".") if q > 0: dot = "." first = stringified[:q] last = stringified[q + 1 :] # Now the string is split into: first + dot + last + exponent_str if last == "0": # Always remove trailing .0 dot = "" last = "" if exponent_value > 0 and exponent_value < 21: # Integers are shown as is with up to 21 digits first += last last = "" dot = "" exponent_str = "" q = exponent_value - len(first) while q >= 0: q -= 1 first += "0" elif exponent_value < 0 and exponent_value > -7: # Small numbers are shown as 0.etc with e-6 as lower limit last = first + last first = "0" dot = "." exponent_str = "" q = exponent_value while q < -1: q += 1 last = "0" + last sink.write(f"{first}{dot}{last}{exponent_str}".encode()) def dumps(obj: _Value) -> bytes: """ Perform JCS serialization of `obj`, returning the canonical serialization as `bytes`. """ # TODO: Optimize this? sink = BytesIO() dump(obj, sink) return sink.getvalue() def dump(obj: _Value, sink: typing.IO[bytes]) -> None: """ Perform JCS serialization of `obj` into `sink`. """ if obj is None: sink.write(b"null") elif isinstance(obj, bool): obj = bool(obj) if obj is True: sink.write(b"true") else: sink.write(b"false") elif isinstance(obj, int): obj = int(obj) if obj < _INT_MIN or obj > _INT_MAX: raise IntegerDomainError(obj) sink.write(str(obj).encode("utf-8")) elif isinstance(obj, str): # NOTE: We don't coerce with `str(...)`` here, since that will do # the wrong thing for `(str, Enum)` subtypes where `__str__` is # `Enum.__str__`. _serialize_str(obj, sink) elif isinstance(obj, float): obj = float(obj) _serialize_float(obj, sink) elif isinstance(obj, (list, tuple)): obj = list(obj) if not obj: # Optimization for empty lists. sink.write(b"[]") return sink.write(b"[") for idx, elem in enumerate(obj): if idx > 0: sink.write(b",") dump(elem, sink) sink.write(b"]") elif isinstance(obj, dict): obj = dict(obj) if not obj: # Optimization for empty dicts. sink.write(b"{}") return # RFC 8785 3.2.3: Objects are sorted by key; keys are ordered # by their UTF-16 encoding. The spec isn't clear about which endianness, # but the examples imply that the big endian encoding is used. try: obj_sorted = sorted(obj.items(), key=lambda kv: kv[0].encode("utf-16be")) except AttributeError: # Failing to call `encode()` indicates that a key isn't a string. raise CanonicalizationError("object keys must be strings") sink.write(b"{") for idx, (key, value) in enumerate(obj_sorted): if idx > 0: sink.write(b",") _serialize_str(key, sink) sink.write(b":") dump(value, sink) sink.write(b"}") else: raise CanonicalizationError(f"unsupported type: {type(obj)}") ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5704424 rfc8785-0.1.4/src/rfc8785/py.typed0000644000000000000000000000000014675557122013150 0ustar00././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5704424 rfc8785-0.1.4/test/README.md0000644000000000000000000000053714675557122012031 0ustar00# Testsuite for `rfc8785` This directory contains unit tests for the `rfc8785` package. `numgen.go` and the pre-computed test inputs and known answers are taken verbatim from [the reference implementation], which is licensed under the Apache License, version 2.0. [the reference implementation]: https://github.com/cyberphone/json-canonicalization ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5704424 rfc8785-0.1.4/test/__init__.py0000644000000000000000000000000014675557122012644 0ustar00././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5704424 rfc8785-0.1.4/test/assets/input/arrays.json0000644000000000000000000000007614675557122015405 0ustar00[ 56, { "d": true, "10": null, "1": [ ] } ] ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5704424 rfc8785-0.1.4/test/assets/input/french.json0000644000000000000000000000022614675557122015346 0ustar00{ "peach": "This sorting order", "péché": "is wrong according to French", "pêche": "but canonicalization MUST", "sin": "ignore locale" } ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5704424 rfc8785-0.1.4/test/assets/input/structures.json0000644000000000000000000000021214675557122016317 0ustar00{ "1": {"f": {"f": "hi","F": 5} ,"\n": 56.0}, "10": { }, "": "empty", "a": { }, "111": [ {"e": "yes","E": "no" } ], "A": { } }././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5704424 rfc8785-0.1.4/test/assets/input/unicode.json0000644000000000000000000000004714675557122015530 0ustar00{ "Unnormalized Unicode":"A\u030a" } ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5704424 rfc8785-0.1.4/test/assets/input/values.json0000644000000000000000000000026614675557122015404 0ustar00{ "numbers": [333333333.33333329, 1E30, 4.50, 2e-3, 0.000000000000000000000000001], "string": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/", "literals": [null, true, false] }././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5704424 rfc8785-0.1.4/test/assets/input/weird.json0000644000000000000000000000043314675557122015213 0ustar00{ "\u20ac": "Euro Sign", "\r": "Carriage Return", "\u000a": "Newline", "1": "One", "\u0080": "Control\u007f", "\ud83d\ude02": "Smiley", "\u00f6": "Latin Small Letter O With Diaeresis", "\ufb33": "Hebrew Letter Dalet With Dagesh", "": "Browser Challenge" } ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5714424 rfc8785-0.1.4/test/assets/outhex/arrays.txt0000644000000000000000000000014014675557122015420 0ustar005b 35 36 2c 7b 22 31 22 3a 5b 5d 2c 22 31 30 22 3a 6e 75 6c 6c 2c 22 64 22 3a 74 72 75 65 7d 5d ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5714424 rfc8785-0.1.4/test/assets/outhex/french.txt0000644000000000000000000000060614675557122015373 0ustar007b 22 70 65 61 63 68 22 3a 22 54 68 69 73 20 73 6f 72 74 69 6e 67 20 6f 72 64 65 72 22 2c 22 70 c3 a9 63 68 c3 a9 22 3a 22 69 73 20 77 72 6f 6e 67 20 61 63 63 6f 72 64 69 6e 67 20 74 6f 20 46 72 65 6e 63 68 22 2c 22 70 c3 aa 63 68 65 22 3a 22 62 75 74 20 63 61 6e 6f 6e 69 63 61 6c 69 7a 61 74 69 6f 6e 20 4d 55 53 54 22 2c 22 73 69 6e 22 3a 22 69 67 6e 6f 72 65 20 6c 6f 63 61 6c 65 22 7d ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5714424 rfc8785-0.1.4/test/assets/outhex/structures.txt0000644000000000000000000000044614675557122016353 0ustar007b 22 22 3a 22 65 6d 70 74 79 22 2c 22 31 22 3a 7b 22 5c 6e 22 3a 35 36 2c 22 66 22 3a 7b 22 46 22 3a 35 2c 22 66 22 3a 22 68 69 22 7d 7d 2c 22 31 30 22 3a 7b 7d 2c 22 31 31 31 22 3a 5b 7b 22 45 22 3a 22 6e 6f 22 2c 22 65 22 3a 22 79 65 73 22 7d 5d 2c 22 41 22 3a 7b 7d 2c 22 61 22 3a 7b 7d 7d ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5714424 rfc8785-0.1.4/test/assets/outhex/unicode.txt0000644000000000000000000000013214675557122015546 0ustar007b 22 55 6e 6e 6f 72 6d 61 6c 69 7a 65 64 20 55 6e 69 63 6f 64 65 22 3a 22 41 cc 8a 22 7d ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5714424 rfc8785-0.1.4/test/assets/outhex/values.txt0000644000000000000000000000054214675557122015424 0ustar007b 22 6c 69 74 65 72 61 6c 73 22 3a 5b 6e 75 6c 6c 2c 74 72 75 65 2c 66 61 6c 73 65 5d 2c 22 6e 75 6d 62 65 72 73 22 3a 5b 33 33 33 33 33 33 33 33 33 2e 33 33 33 33 33 33 33 2c 31 65 2b 33 30 2c 34 2e 35 2c 30 2e 30 30 32 2c 31 65 2d 32 37 5d 2c 22 73 74 72 69 6e 67 22 3a 22 e2 82 ac 24 5c 75 30 30 30 66 5c 6e 41 27 42 5c 22 5c 5c 5c 5c 5c 22 2f 22 7d ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5714424 rfc8785-0.1.4/test/assets/outhex/weird.txt0000644000000000000000000000120214675557122015231 0ustar007b 22 5c 6e 22 3a 22 4e 65 77 6c 69 6e 65 22 2c 22 5c 72 22 3a 22 43 61 72 72 69 61 67 65 20 52 65 74 75 72 6e 22 2c 22 31 22 3a 22 4f 6e 65 22 2c 22 3c 2f 73 63 72 69 70 74 3e 22 3a 22 42 72 6f 77 73 65 72 20 43 68 61 6c 6c 65 6e 67 65 22 2c 22 c2 80 22 3a 22 43 6f 6e 74 72 6f 6c 7f 22 2c 22 c3 b6 22 3a 22 4c 61 74 69 6e 20 53 6d 61 6c 6c 20 4c 65 74 74 65 72 20 4f 20 57 69 74 68 20 44 69 61 65 72 65 73 69 73 22 2c 22 e2 82 ac 22 3a 22 45 75 72 6f 20 53 69 67 6e 22 2c 22 f0 9f 98 82 22 3a 22 53 6d 69 6c 65 79 22 2c 22 ef ac b3 22 3a 22 48 65 62 72 65 77 20 4c 65 74 74 65 72 20 44 61 6c 65 74 20 57 69 74 68 20 44 61 67 65 73 68 22 7d ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5714424 rfc8785-0.1.4/test/assets/output/arrays.json0000644000000000000000000000004014675557122015575 0ustar00[56,{"1":[],"10":null,"d":true}]././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5714424 rfc8785-0.1.4/test/assets/output/french.json0000644000000000000000000000020214675557122015541 0ustar00{"peach":"This sorting order","péché":"is wrong according to French","pêche":"but canonicalization MUST","sin":"ignore locale"}././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5714424 rfc8785-0.1.4/test/assets/output/structures.json0000644000000000000000000000014214675557122016522 0ustar00{"":"empty","1":{"\n":56,"f":{"F":5,"f":"hi"}},"10":{},"111":[{"E":"no","e":"yes"}],"A":{},"a":{}}././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5714424 rfc8785-0.1.4/test/assets/output/unicode.json0000644000000000000000000000003614675557122015727 0ustar00{"Unnormalized Unicode":"Å"}././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5714424 rfc8785-0.1.4/test/assets/output/values.json0000644000000000000000000000016614675557122015604 0ustar00{"literals":[null,true,false],"numbers":[333333333.3333333,1e+30,4.5,0.002,1e-27],"string":"€$\u000f\nA'B\"\\\\\"/"}././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5714424 rfc8785-0.1.4/test/assets/output/weird.json0000644000000000000000000000032614675557122015415 0ustar00{"\n":"Newline","\r":"Carriage Return","1":"One","":"Browser Challenge","€":"Control","ö":"Latin Small Letter O With Diaeresis","€":"Euro Sign","😂":"Smiley","דּ":"Hebrew Letter Dalet With Dagesh"}././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5714424 rfc8785-0.1.4/test/conftest.py0000644000000000000000000000124014675557122012741 0ustar00from __future__ import annotations from collections.abc import Callable from pathlib import Path import pytest _ASSETS = Path(__file__).parent / "assets" assert _ASSETS.is_dir() @pytest.fixture def vector() -> Callable[[str], tuple[bytes, bytes, bytes]]: def _vector(name: str) -> tuple[bytes, bytes, bytes]: input = _ASSETS / f"input/{name}.json" output = _ASSETS / f"output/{name}.json" outhex = _ASSETS / f"outhex/{name}.txt" return (input.read_bytes(), output.read_bytes(), bytearray.fromhex(outhex.read_text())) return _vector @pytest.fixture def es6_test_file() -> Path: return _ASSETS / "es6testfile100m.txt.gz" ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5714424 rfc8785-0.1.4/test/test_impl.py0000644000000000000000000001214414675557122013121 0ustar00""" Internal implementation tests. """ import gzip import json import struct import sys from enum import Enum, IntEnum from io import BytesIO import pytest import rfc8785._impl as impl @pytest.mark.parametrize( ("hex_ieee", "expected"), [ # hex_ieee is the raw 64-bit IEEE 754 float, in big-endian order. ("0000000000000000", b"0"), ("8000000000000000", b"0"), ("0000000000000001", b"5e-324"), ("8000000000000001", b"-5e-324"), ("7fefffffffffffff", b"1.7976931348623157e+308"), ("ffefffffffffffff", b"-1.7976931348623157e+308"), ("4340000000000000", b"9007199254740992"), ("c340000000000000", b"-9007199254740992"), ("4430000000000000", b"295147905179352830000"), ("7fffffffffffffff", None), ("7ff0000000000000", None), ("44b52d02c7e14af5", b"9.999999999999997e+22"), ("44b52d02c7e14af6", b"1e+23"), ("44b52d02c7e14af7", b"1.0000000000000001e+23"), ("444b1ae4d6e2ef4e", b"999999999999999700000"), ("444b1ae4d6e2ef4f", b"999999999999999900000"), ("444b1ae4d6e2ef50", b"1e+21"), ("3eb0c6f7a0b5ed8c", b"9.999999999999997e-7"), ("3eb0c6f7a0b5ed8d", b"0.000001"), ("41b3de4355555553", b"333333333.3333332"), ("41b3de4355555554", b"333333333.33333325"), ("41b3de4355555555", b"333333333.3333333"), ("41b3de4355555556", b"333333333.3333334"), ("41b3de4355555557", b"333333333.33333343"), ("becbf647612f3696", b"-0.0000033333333333333333"), ("43143ff3c1cb0959", b"1424953923781206.2"), ], ) def test_es6_float_stringification(hex_ieee, expected): bytes_ieee = bytearray.fromhex(hex_ieee) (float_ieee,) = struct.unpack(">d", bytes_ieee) sink = BytesIO() if expected is None: with pytest.raises(impl.FloatDomainError): impl._serialize_float(float_ieee, sink) else: impl._serialize_float(float_ieee, sink) actual = sink.getvalue() assert actual == expected def test_es6_float_stringification_full(es6_test_file): if not es6_test_file.is_file(): pytest.skip(f"no {es6_test_file}, skipping") # TODO: Thread or otherwise chunk this; it's ridiculously slow for # 100M testcases. with gzip.open(es6_test_file, mode="rt") as io: for line in io: line = line.rstrip() hex_ieee, expected = line.split(",", 1) # `hex_ieee` is not consistently padded, so we have to do # things the annoying way: convert it into an int, pack the int # as u64be, and then unpack into a float64be. (float_ieee,) = struct.unpack(">d", struct.pack(">Q", int(hex_ieee, 16))) sink = BytesIO() impl._serialize_float(float_ieee, sink) actual = sink.getvalue().decode() assert actual == expected def test_integer_domain(): impl.dumps(impl._INT_MAX) with pytest.raises(impl.IntegerDomainError): impl.dumps(impl._INT_MAX + 1) impl.dumps(impl._INT_MIN) with pytest.raises(impl.IntegerDomainError): impl.dumps(impl._INT_MIN - 1) def test_string_invalid_utf8(): # escaped surrogate is fine impl.dumps("\\udead") with pytest.raises(impl.CanonicalizationError): impl.dumps("\udead") def test_dumps_invalid_type(): with pytest.raises(impl.CanonicalizationError): # set is not serializable impl.dumps({1, 2, 3}) def test_dumps_intenum(): # IntEnum is a subclass of int, so this should work transparently. class X(IntEnum): A = 1 B = 2 C = 9001 raw = impl.dumps([X.A, X.B, X.C]) assert json.loads(raw) == [1, 2, 9001] @pytest.mark.skipif(sys.version_info < (3, 11), reason="StrEnum added in 3.11+") def test_dumps_strenum(): from enum import StrEnum # StrEnum is a subclass of str, so this should work transparently. class X(StrEnum): A = "foo" B = "bar" C = "baz" raw = impl.dumps([X.A, X.B, X.C]) assert json.loads(raw) == ["foo", "bar", "baz"] def test_dumps_enum_multiple_inheritance(): # Manually inheriting str, Enum should also work. class X(str, Enum): A = "foo" B = "bar" C = "baz" raw = impl.dumps([X.A, X.B, X.C]) assert json.loads(raw) == ["foo", "bar", "baz"] # Same for other JSON-able enum types. class Y(dict, Enum): A = {"A": "foo"} B = {"B": "bar"} C = {"C": "baz"} raw = impl.dumps([Y.A, Y.B, Y.C]) assert json.loads(raw) == [{"A": "foo"}, {"B": "bar"}, {"C": "baz"}] class Z(int, Enum): A = 1 B = 2 C = 3 raw = impl.dumps([Z.A, Z.B, Z.C]) assert json.loads(raw) == [1, 2, 3] def test_dumps_bare_enum_fails(): class X(Enum): A = "1" B = 2 C = 3.0 # Python's json doesn't allow this, so we don't either. with pytest.raises(impl.CanonicalizationError, match="unsupported type"): impl.dumps([X.A, X.B, X.C]) def test_dumps_nonstring_key(): with pytest.raises(impl.CanonicalizationError, match="object keys must be strings"): impl.dumps({1: 2, None: 3}) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1727454801.5714424 rfc8785-0.1.4/test/test_init.py0000644000000000000000000000212614675557122013122 0ustar00"""Initial testing module.""" import json import pytest import rfc8785 def test_version() -> None: version = getattr(rfc8785, "__version__", None) assert version is not None assert isinstance(version, str) @pytest.mark.parametrize("name", ["arrays", "french", "structures", "unicode", "values", "weird"]) def test_roundtrip(vector, name): input, output, outhex = vector(name) py_input = json.loads(input) # Each input, when canonicalized, matches the exact bytes expected. actual_output = rfc8785.dumps(py_input) assert output == actual_output == outhex actual_deserialized = json.loads(actual_output) assert actual_deserialized == py_input def test_exception_hierarchy(): assert issubclass(rfc8785.CanonicalizationError, ValueError) assert issubclass(rfc8785.IntegerDomainError, rfc8785.CanonicalizationError) assert issubclass(rfc8785.FloatDomainError, rfc8785.CanonicalizationError) assert not issubclass(rfc8785.IntegerDomainError, rfc8785.FloatDomainError) assert not issubclass(rfc8785.FloatDomainError, rfc8785.IntegerDomainError) rfc8785-0.1.4/PKG-INFO0000644000000000000000000000646300000000000010617 0ustar00Metadata-Version: 2.1 Name: rfc8785 Version: 0.1.4 Summary: A pure-Python implementation of RFC 8785 (JSON Canonicalization Scheme) Author-email: Trail of Bits Requires-Python: >=3.8 Description-Content-Type: text/markdown Classifier: Development Status :: 4 - Beta Classifier: License :: OSI Approved :: Apache Software License Classifier: Programming Language :: Python :: 3 Classifier: Topic :: File Formats :: JSON Classifier: Topic :: Security :: Cryptography Requires-Dist: rfc8785[doc,test,lint] ; extra == "dev" Requires-Dist: build ; extra == "dev" Requires-Dist: pdoc ; extra == "doc" Requires-Dist: ruff ~= 0.3 ; extra == "lint" Requires-Dist: mypy >= 1.0 ; extra == "lint" Requires-Dist: interrogate ; extra == "lint" Requires-Dist: pytest ; extra == "test" Requires-Dist: pytest-cov ; extra == "test" Requires-Dist: coverage ; extra == "test" Project-URL: Documentation, https://trailofbits.github.io/rfc8785.py/ Project-URL: Homepage, https://pypi.org/project/rfc8785 Project-URL: Issues, https://github.com/trailofbits/rfc8785.py/issues Project-URL: Source, https://github.com/trailofbits/rfc8785.py Provides-Extra: dev Provides-Extra: doc Provides-Extra: lint Provides-Extra: test # rfc8785.py [![CI](https://github.com/trailofbits/rfc8785.py/actions/workflows/tests.yml/badge.svg)](https://github.com/trailofbits/rfc8785.py/actions/workflows/tests.yml) [![PyPI version](https://badge.fury.io/py/rfc8785.svg)](https://pypi.org/project/rfc8785) [![Packaging status](https://repology.org/badge/tiny-repos/python:rfc8785.svg)](https://repology.org/project/python:rfc8785/versions) A pure-Python, no-dependency implementation of [RFC 8785], a.k.a. JSON Canonicalization Scheme or JCS. This implementation should be behaviorally comparable to [Andrew Rundgren's reference implementation], with the following added constraints: 1. This implementation does not transparently convert non-`str` dictionary keys into strings. Users must explicitly perform this conversion. 1. No support for indentation, pretty-printing, etc. is provided. The output is always minimally encoded. 2. All APIs produce UTF-8-encoded `bytes` objects or `bytes` I/O. ## Installation ```bash python -m pip install rfc8785 ``` ## Usage See the full API documentation [here]. ```python import rfc8785 foo = { "key": "value", "another-key": 2, "a-third": [1, 2, 3, [4], (5, 6, "this works too")], "more": [None, True, False], } rfc8785.dumps(foo) ``` yields: ```python b'{"a-third":[1,2,3,[4],[5,6,"this works too"]],"another-key":2,"key":"value","more":[null,true,false]}' ``` For direct serialization to an I/O sink, use `rfc8785.dump` instead: ```python import rfc8785 with open("/some/file", mode="wb") as io: rfc8785.dump([1, 2, 3, 4], io) ``` All APIs raise `rfc8785.CanonicalizationError` or a subclass on serialization failures. ## Licensing Apache License, Version 2.0. Where noted, parts of this implementation are adapted from [Andrew Rundgren's reference implementation], which is also licensed under the Apache License, Version 2.0. [RFC 8785]: https://datatracker.ietf.org/doc/html/rfc8785 [Andrew Rundgren's reference implementation]: https://github.com/cyberphone/json-canonicalization/tree/master/python3 [here]: https://trailofbits.github.io/rfc8785.py