pax_global_header00006660000000000000000000000064144743546530014531gustar00rootroot0000000000000052 comment=77cb45eb8bebb5a01b590b935e2b3093754b98d3 jsonlines-4.0.0/000077500000000000000000000000001447435465300135365ustar00rootroot00000000000000jsonlines-4.0.0/.editorconfig000066400000000000000000000001661447435465300162160ustar00rootroot00000000000000root = true [*.py] charset = utf-8 indent_style = space indent_size = 4 insert_final_newline = true end_of_line = lf jsonlines-4.0.0/.gitignore000066400000000000000000000002401447435465300155220ustar00rootroot00000000000000# Python cruft *.py[co] __pycache__/ # Testing /.coverage /coverage.xml /htmlcov/ /.tox/ # Packaging /.cache/ /*.egg-info/ /build/ /dist/ # Docs /doc/build/ jsonlines-4.0.0/.readthedocs.yaml000066400000000000000000000002221447435465300167610ustar00rootroot00000000000000--- version: "2" build: os: "ubuntu-22.04" tools: python: "3.11" python: install: - path: . sphinx: configuration: doc/conf.py jsonlines-4.0.0/LICENSE.rst000066400000000000000000000030071447435465300153520ustar00rootroot00000000000000*(This is the OSI approved 3-clause "New BSD License".)* Copyright © 2016, wouter bolsterlee All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the author nor the names of the contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. jsonlines-4.0.0/MANIFEST.in000066400000000000000000000000161447435465300152710ustar00rootroot00000000000000include *.rst jsonlines-4.0.0/README.rst000066400000000000000000000013051447435465300152240ustar00rootroot00000000000000.. image:: https://pepy.tech/badge/jsonlines :target: https://pepy.tech/project/jsonlines .. image:: https://pepy.tech/badge/jsonlines/month :target: https://pepy.tech/project/jsonlines .. image:: https://anaconda.org/anaconda/anaconda/badges/installer/conda.svg :target: https://anaconda.org/anaconda/jsonlines ========= jsonlines ========= ``jsonlines`` is a Python library to simplify working with jsonlines_ and ndjson_ data. .. _jsonlines: http://jsonlines.org/ .. _ndjson: http://ndjson.org/ * Documentation: https://jsonlines.readthedocs.io/ * Python Package Index (PyPI): https://pypi.python.org/pypi/jsonlines/ * Source code and issue tracker: https://github.com/wbolster/jsonlines jsonlines-4.0.0/doc/000077500000000000000000000000001447435465300143035ustar00rootroot00000000000000jsonlines-4.0.0/doc/conf.py000066400000000000000000000001671447435465300156060ustar00rootroot00000000000000extensions = [ "sphinx.ext.autodoc", ] master_doc = "index" project = "jsonlines" copyright = "wouter bolsterlee" jsonlines-4.0.0/doc/index.rst000066400000000000000000000172741447435465300161570ustar00rootroot00000000000000========= jsonlines ========= .. py:currentmodule:: jsonlines ``jsonlines`` is a Python library to simplify working with jsonlines_ and ndjson_ data. .. _jsonlines: http://jsonlines.org/ .. _ndjson: http://ndjson.org/ This data format is straight-forward: it is simply one valid JSON value per line, encoded using UTF-8. While code to consume and create such data is not that complex, it quickly becomes non-trivial enough to warrant a dedicated library when adding data validation, error handling, support for both binary and text streams, and so on. This small library implements all that (and more!) so that applications using this format do not have to reinvent the wheel. Features ======== * Sensible behaviour for most use cases * transparently handles ``str`` and ``bytes``, both for input and output * supports multiple JSON libraries, e.g. ``json`` (standard library), ``orjson``, ``ujson`` * transparently handles UTF-8 BOM (if present) * useful error messages * prevents gotchas, e.g. uses standard-compliant line breaking, unlike `str.splitlines`_ .. _str.splitlines: https://docs.python.org/3/library/stdtypes.html#str.splitlines * Convenient :py:func:`~jsonlines.open()` function * makes simple cases trivial to write * takes a file name and a mode * returns either a :py:class:`~jsonlines.Reader` or :py:class:`~jsonlines.Writer` instance * can be used as a context manager * Flexible :py:class:`~jsonlines.Reader` * wraps a file-like object or any other iterable yielding lines * can read lines directly via the :py:meth:`~jsonlines.Reader.read()` method * can be used as an iterator, either directly or via the :py:meth:`~jsonlines.Reader.iter()` method * can validate data types, including `None` checks * can skip invalid lines during iteration * provides decent error messages * can be used as a context manager * allows complete control over decoding using a custom ``loads`` callable * Flexible :py:class:`~jsonlines.Writer` * wraps a file-like object * can produce compact output * can sort keys (deterministic output) * can flush the underlying stream after each write * can be used as a context manager * allows complete control over encoding using a custom ``dumps`` callable Installation ============ :: pip install jsonlines The supported Python versions are 3.8+. User guide ========== Import the ``jsonlines`` module to get started: .. code-block:: python import jsonlines The convenience function :py:func:`jsonlines.open()` takes a file name and returns either a reader or writer, making simple cases extremely simple:: with jsonlines.open('input.jsonl') as reader: for obj in reader: ... with jsonlines.open('output.jsonl', mode='w') as writer: writer.write(...) A :py:class:`Reader` typically wraps a file-like object:: fp = io.BytesIO(...) # readable file-like object reader = jsonlines.Reader(fp) first = reader.read() second = reader.read() reader.close() fp.close() Instead of a file-like object, any iterable yielding JSON encoded strings can be provided:: lines = ['1', '2', '3'] reader = jsonlines.Reader(lines) While the :py:meth:`Reader.read` method can be used directly, it is often more convenient to use iteration:: for obj in reader: ... Custom iteration flags, such as type checks, can be specified by calling :py:meth:`Reader.iter()` instead:: for obj in reader.iter(type=dict, skip_invalid=True): ... A :py:class:`Writer` wraps a file-like object, and can write a single object, or multiple objects at once:: fp = io.BytesIO() # writable file-like object writer = jsonlines.Writer(fp) writer.write(...) writer.write_all([ ..., ..., ..., ]) writer.close() fp.close() Both readers and writers can be used as a context manager, in which case they will be closed automatically. Note that this will not close a passed-in file-like object since that object’s life span is controlled by the calling code. Example:: fp = io.BytesIO() # file-like object with jsonlines.Writer(fp) as writer: writer.write(...) fp.close() Note that the :py:func:`jsonlines.open()` function *does* close the opened file, since the open file is not explicitly opened by the calling code. That means no ``.close()`` is needed there:: with jsonlines.open('input.jsonl') as reader: ... This should be enough to get started. See the API docs below for more details. API === .. autofunction:: jsonlines.open .. autoclass:: jsonlines.Reader :members: :inherited-members: .. autoclass:: jsonlines.Writer :members: :inherited-members: .. autoclass:: jsonlines.Error :members: .. autoclass:: jsonlines.InvalidLineError :members: Contributing ============ The source code and issue tracker for this package can be found on GitHub: https://github.com/wbolster/jsonlines Version history =============== * 4.0.0, released at 2023-09-01 * use ‘orjson’ or ‘ujson’ for reading if available (`#81 `_) * drop support for end-of-life Python versions; this package is now Python 3.8+ only. (`#80 `_, `#80 `_) * 3.1.0, released at 2022-07-01 * Return number of chars/bytes written by :py:meth:`Writer.write()` and :py:meth:`~Writer.write_all()` (`#73 `_) * allow ``mode='x'`` in :py:func:`~jsonlines.open()` to open a file for exclusive creation (`#74 `_) * 3.0.0, released at 2021-12-04 * add type annotations; adopt mypy in strict mode (`#58 `_, `#62 `_) * ignore UTF-8 BOM sequences in various scenarios (`#69 `_) * support ``dumps()`` callables returning bytes again (`#64 `_) * add basic support for rfc7464 text sequences (`#61 `_) * drop support for ``numbers.Number`` in ``type=`` arguments (`#63 `_) * 2.0.0, released at 2021-01-04 * drop support for end-of-life Python versions; this package is now Python 3.6+ only. (`#54 `_, `#51 `_) * 1.2.0, released at 2017-08-17 * allow ``mode='a'`` in :py:func:`~jsonlines.open()` to allow appending to an existing file (`#31 `_) * 1.1.3, released at 2017-07-19 * fix incomplete iteration when given list containing empty strings (`#30 `_) * 1.1.2, released at 2017-06-26 * documentation tweaks * enable building universal wheels * 1.1.1, released at 2017-06-04 * include licensing information in sdist (`#27 `_) * doc tweaks * 1.1.0, released at 2016-10-07 * rename first argument to :py:class:`Reader` since it is not required to be a file-like object * actually check that the reader/writer is not closed when performing operations * improved `repr()` output * doc tweaks * 1.0.0, released at 2016-10-05 * minimum Python versions are Python 3.4+ and Python 2.7+ * implemented lots of configuration options * add proper exceptions handling * add proper documentation * switch to semver * 0.0.1, released at 2015-03-02 * initial release with basic functionality License ======= .. include:: ../LICENSE.rst jsonlines-4.0.0/jsonlines/000077500000000000000000000000001447435465300155425ustar00rootroot00000000000000jsonlines-4.0.0/jsonlines/__init__.py000066400000000000000000000004021447435465300176470ustar00rootroot00000000000000""" Module for the jsonlines data format. """ # expose only public api from .jsonlines import ( Error, InvalidLineError, Reader, Writer, open, ) __all__ = [ "Error", "InvalidLineError", "Reader", "Writer", "open", ] jsonlines-4.0.0/jsonlines/jsonlines.py000066400000000000000000000466671447435465300201430ustar00rootroot00000000000000""" jsonlines implementation """ import builtins import codecs import enum import io import json import os import types import typing from typing import ( Any, Callable, Dict, Iterable, Iterator, List, Literal, Optional, Tuple, Type, TypeVar, Union, cast, overload, ) import attr orjson: Optional[types.ModuleType] try: import orjson except ImportError: orjson = None ujson: Optional[types.ModuleType] try: import ujson except ImportError: ujson = None VALID_TYPES = { bool, dict, float, int, list, str, } # Characters to skip at the beginning of a line. Note: at most one such # character is skipped per line. SKIPPABLE_SINGLE_INITIAL_CHARS = ( "\x1e", # RFC7464 text sequence codecs.BOM_UTF8.decode(), ) class DumpsResultConversion(enum.Enum): LeaveAsIs = enum.auto() EncodeToBytes = enum.auto() DecodeToString = enum.auto() # https://docs.python.org/3/library/functions.html#open Openable = Union[str, bytes, int, os.PathLike] LoadsCallable = Callable[[Union[str, bytes]], Any] DumpsCallable = Callable[[Any], Union[str, bytes]] # Currently, JSON structures cannot be typed properly: # - https://github.com/python/typing/issues/182 # - https://github.com/python/mypy/issues/731 JSONCollection = Union[Dict[str, Any], List[Any]] JSONScalar = Union[bool, float, int, str] JSONValue = Union[JSONCollection, JSONScalar] TJSONValue = TypeVar("TJSONValue", bound=JSONValue) TRW = TypeVar("TRW", bound="ReaderWriterBase") # Default to using the fastest JSON library for reading, falling back to the # standard library (always available) if none are installed. if orjson is not None: default_loads = orjson.loads elif ujson is not None: default_loads = ujson.loads else: default_loads = json.loads # For writing, use the stdlib. Other packages may be faster but their behaviour # (supported types etc.) and output (whitespace etc.) are not the same as the # stdlib json module, so this should be opt-in via the ‘dumps=’ arg. def default_dumps(obj: Any) -> str: """ Fake ``dumps()`` function to use as a default marker. """ raise NotImplementedError # pragma: no cover @attr.s(auto_exc=True, auto_attribs=True) class Error(Exception): """ Base error class. """ message: str @attr.s(auto_exc=True, auto_attribs=True, init=False) class InvalidLineError(Error, ValueError): """ Error raised when an invalid line is encountered. This happens when the line does not contain valid JSON, or if a specific data type has been requested, and the line contained a different data type. The original line itself is stored on the exception instance as the ``.line`` attribute, and the line number as ``.lineno``. This class subclasses both ``jsonlines.Error`` and the built-in ``ValueError``. """ #: The invalid line line: Union[str, bytes] #: The line number lineno: int def __init__(self, message: str, line: Union[str, bytes], lineno: int) -> None: self.line = line.rstrip() self.lineno = lineno super().__init__(f"{message} (line {lineno})") @attr.s(auto_attribs=True, repr=False) class ReaderWriterBase: """ Base class with shared behaviour for both the reader and writer. """ _fp: Union[typing.IO[str], typing.IO[bytes], None] = attr.ib( default=None, init=False ) _closed: bool = attr.ib(default=False, init=False) _should_close_fp: bool = attr.ib(default=False, init=False) def close(self) -> None: """ Close this reader/writer. This closes the underlying file if that file has been opened by this reader/writer. When an already opened file-like object was provided, the caller is responsible for closing it. """ if self._closed: return self._closed = True if self._fp is not None and self._should_close_fp: self._fp.close() def __repr__(self) -> str: cls_name = type(self).__name__ wrapped = self._repr_for_wrapped() return f"" def _repr_for_wrapped(self) -> str: raise NotImplementedError # pragma: no cover def __enter__(self: TRW) -> TRW: return self def __exit__( self, exc_type: Optional[Type[BaseException]], exc_val: Optional[BaseException], exc_tb: Optional[types.TracebackType], ) -> None: self.close() @attr.s(auto_attribs=True, repr=False) class Reader(ReaderWriterBase): """ Reader for the jsonlines format. The first argument must be an iterable that yields JSON encoded strings. Usually this will be a readable file-like object, such as an open file or an ``io.TextIO`` instance, but it can also be something else as long as it yields strings when iterated over. Instances are iterable and can be used as a context manager. The `loads` argument can be used to replace the standard json decoder. If specified, it must be a callable that accepts a (unicode) string and returns the decoded object. :param file_or_iterable: file-like object or iterable yielding lines as strings :param loads: custom json decoder callable """ _file_or_iterable: Union[ typing.IO[str], typing.IO[bytes], Iterable[Union[str, bytes]] ] _line_iter: Iterator[Tuple[int, Union[bytes, str]]] = attr.ib(init=False) _loads: LoadsCallable = attr.ib(default=default_loads, kw_only=True) def __attrs_post_init__(self) -> None: if isinstance(self._file_or_iterable, io.IOBase): self._fp = cast( Union[typing.IO[str], typing.IO[bytes]], self._file_or_iterable, ) self._line_iter = enumerate(self._file_or_iterable, 1) # No type specified, None not allowed @overload def read( self, *, type: Literal[None] = ..., allow_none: Literal[False] = ..., skip_empty: bool = ..., ) -> JSONValue: ... # pragma: no cover # No type specified, None allowed @overload def read( self, *, type: Literal[None] = ..., allow_none: Literal[True], skip_empty: bool = ..., ) -> Optional[JSONValue]: ... # pragma: no cover # Type specified, None not allowed @overload def read( self, *, type: Type[TJSONValue], allow_none: Literal[False] = ..., skip_empty: bool = ..., ) -> TJSONValue: ... # pragma: no cover # Type specified, None allowed @overload def read( self, *, type: Type[TJSONValue], allow_none: Literal[True], skip_empty: bool = ..., ) -> Optional[TJSONValue]: ... # pragma: no cover # Generic definition @overload def read( self, *, type: Optional[Type[Any]] = ..., allow_none: bool = ..., skip_empty: bool = ..., ) -> Optional[JSONValue]: ... # pragma: no cover def read( self, *, type: Optional[Type[Any]] = None, allow_none: bool = False, skip_empty: bool = False, ) -> Optional[JSONValue]: """ Read and decode a line. The optional `type` argument specifies the expected data type. Supported types are ``dict``, ``list``, ``str``, ``int``, ``float``, and ``bool``. When specified, non-conforming lines result in :py:exc:`InvalidLineError`. By default, input lines containing ``null`` (in JSON) are considered invalid, and will cause :py:exc:`InvalidLineError`. The `allow_none` argument can be used to change this behaviour, in which case ``None`` will be returned instead. If `skip_empty` is set to ``True``, empty lines and lines containing only whitespace are silently skipped. """ if self._closed: raise RuntimeError("reader is closed") if type is not None and type not in VALID_TYPES: raise ValueError("invalid type specified") try: lineno, line = next(self._line_iter) while skip_empty and not line.rstrip(): lineno, line = next(self._line_iter) except StopIteration: raise EOFError from None if isinstance(line, bytes): try: line = line.decode("utf-8") except UnicodeDecodeError as orig_exc: exc = InvalidLineError( f"line is not valid utf-8: {orig_exc}", line, lineno ) raise exc from orig_exc if line.startswith(SKIPPABLE_SINGLE_INITIAL_CHARS): line = line[1:] try: value: JSONValue = self._loads(line) except ValueError as orig_exc: exc = InvalidLineError( f"line contains invalid json: {orig_exc}", line, lineno ) raise exc from orig_exc if value is None: if allow_none: return None raise InvalidLineError("line contains null value", line, lineno) if type is not None: valid = isinstance(value, type) if type is int and isinstance(value, bool): # isinstance() is not sufficient, since bool is an int subclass valid = False if not valid: raise InvalidLineError( "line does not match requested type", line, lineno ) return value # No type specified, None not allowed @overload def iter( self, *, type: Literal[None] = ..., allow_none: Literal[False] = ..., skip_empty: bool = ..., skip_invalid: bool = ..., ) -> Iterator[JSONValue]: ... # pragma: no cover # No type specified, None allowed @overload def iter( self, *, type: Literal[None] = ..., allow_none: Literal[True], skip_empty: bool = ..., skip_invalid: bool = ..., ) -> Iterator[JSONValue]: ... # pragma: no cover # Type specified, None not allowed @overload def iter( self, *, type: Type[TJSONValue], allow_none: Literal[False] = ..., skip_empty: bool = ..., skip_invalid: bool = ..., ) -> Iterator[TJSONValue]: ... # pragma: no cover # Type specified, None allowed @overload def iter( self, *, type: Type[TJSONValue], allow_none: Literal[True], skip_empty: bool = ..., skip_invalid: bool = ..., ) -> Iterator[Optional[TJSONValue]]: ... # pragma: no cover # Generic definition @overload def iter( self, *, type: Optional[Type[TJSONValue]] = ..., allow_none: bool = ..., skip_empty: bool = ..., skip_invalid: bool = ..., ) -> Iterator[Optional[TJSONValue]]: ... # pragma: no cover def iter( self, type: Optional[Type[Any]] = None, allow_none: bool = False, skip_empty: bool = False, skip_invalid: bool = False, ) -> Iterator[Optional[JSONValue]]: """ Iterate over all lines. This is the iterator equivalent to repeatedly calling :py:meth:`~Reader.read()`. If no arguments are specified, this is the same as directly iterating over this :py:class:`Reader` instance. When `skip_invalid` is set to ``True``, invalid lines will be silently ignored. See :py:meth:`~Reader.read()` for a description of the other arguments. """ try: while True: try: yield self.read( type=type, allow_none=allow_none, skip_empty=skip_empty ) except InvalidLineError: if not skip_invalid: raise except EOFError: pass def __iter__(self) -> Iterator[Any]: """ See :py:meth:`~Reader.iter()`. """ return self.iter() def _repr_for_wrapped(self) -> str: if self._fp is not None: return repr_for_fp(self._fp) class_name = type(self._file_or_iterable).__name__ return f"<{class_name} at 0x{id(self._file_or_iterable):x}>" @attr.s(auto_attribs=True, repr=False) class Writer(ReaderWriterBase): """ Writer for the jsonlines format. Instances can be used as a context manager. The `fp` argument must be a file-like object with a ``.write()`` method accepting either text (unicode) or bytes. The `compact` argument can be used to to produce smaller output. The `sort_keys` argument can be used to sort keys in json objects, and will produce deterministic output. For more control, provide a a custom encoder callable using the `dumps` argument. The callable must produce (unicode) string output. If specified, the `compact` and `sort` arguments will be ignored. When the `flush` argument is set to ``True``, the writer will call ``fp.flush()`` after each written line. :param fp: writable file-like object :param compact: whether to use a compact output format :param sort_keys: whether to sort object keys :param dumps: custom encoder callable :param flush: whether to flush the file-like object after writing each line """ _fp: Union[typing.IO[str], typing.IO[bytes]] = attr.ib(default=None) _fp_is_binary: bool = attr.ib(default=False, init=False) _compact: bool = attr.ib(default=False, kw_only=True) _sort_keys: bool = attr.ib(default=False, kw_only=True) _flush: bool = attr.ib(default=False, kw_only=True) _dumps: DumpsCallable = attr.ib(default=default_dumps, kw_only=True) _dumps_result_conversion: DumpsResultConversion = attr.ib( default=DumpsResultConversion.LeaveAsIs, init=False ) def __attrs_post_init__(self) -> None: if isinstance(self._fp, io.TextIOBase): self._fp_is_binary = False elif isinstance(self._fp, io.IOBase): self._fp_is_binary = True else: try: self._fp.write("") # type: ignore[call-overload] except TypeError: self._fp_is_binary = True else: self._fp_is_binary = False if self._dumps is default_dumps: self._dumps = json.JSONEncoder( ensure_ascii=False, separators=(",", ":") if self._compact else (", ", ": "), sort_keys=self._sort_keys, ).encode # Detect if str-to-bytes conversion (or vice versa) is needed for the # combination of this file-like object and the used dumps() callable. # This avoids checking this for each .write(). Note that this # deliberately does not support ‘dynamic’ return types that depend on # input and dump options, like simplejson on Python 2 in some cases. sample_dumps_result = self._dumps({}) if isinstance(sample_dumps_result, str) and self._fp_is_binary: self._dumps_result_conversion = DumpsResultConversion.EncodeToBytes elif isinstance(sample_dumps_result, bytes) and not self._fp_is_binary: self._dumps_result_conversion = DumpsResultConversion.DecodeToString def write(self, obj: Any) -> int: """ Encode and write a single object. :param obj: the object to encode and write :return: number of characters or bytes written """ if self._closed: raise RuntimeError("writer is closed") line = self._dumps(obj) # This handles either str or bytes, but the type checker does not know # that this code always passes the right type of arguments. if self._dumps_result_conversion == DumpsResultConversion.EncodeToBytes: line = line.encode() # type: ignore[union-attr] elif self._dumps_result_conversion == DumpsResultConversion.DecodeToString: line = line.decode() # type: ignore[union-attr] fp = self._fp fp.write(line) # type: ignore[arg-type] fp.write(b"\n" if self._fp_is_binary else "\n") # type: ignore[call-overload] if self._flush: fp.flush() return len(line) + 1 # including newline def write_all(self, iterable: Iterable[Any]) -> int: """ Encode and write multiple objects. :param iterable: an iterable of objects :return: number of characters or bytes written """ return sum(self.write(obj) for obj in iterable) def _repr_for_wrapped(self) -> str: return repr_for_fp(self._fp) @overload def open( file: Openable, mode: Literal["r"] = ..., *, loads: Optional[LoadsCallable] = ..., ) -> Reader: ... # pragma: no cover @overload def open( file: Openable, mode: Literal["w", "a", "x"], *, dumps: Optional[DumpsCallable] = ..., compact: Optional[bool] = ..., sort_keys: Optional[bool] = ..., flush: Optional[bool] = ..., ) -> Writer: ... # pragma: no cover @overload def open( file: Openable, mode: str = ..., *, loads: Optional[LoadsCallable] = ..., dumps: Optional[DumpsCallable] = ..., compact: Optional[bool] = ..., sort_keys: Optional[bool] = ..., flush: Optional[bool] = ..., ) -> Union[Reader, Writer]: ... # pragma: no cover def open( file: Openable, mode: str = "r", *, loads: Optional[LoadsCallable] = None, dumps: Optional[DumpsCallable] = None, compact: Optional[bool] = None, sort_keys: Optional[bool] = None, flush: Optional[bool] = None, ) -> Union[Reader, Writer]: """ Open a jsonlines file for reading or writing. This is a convenience function to open a file and wrap it in either a :py:class:`Reader` or :py:class:`Writer` instance, depending on the specified `mode`. Additional keyword arguments will be passed on to the reader and writer; see their documentation for available options. The resulting reader or writer must be closed after use by the caller, which will also close the opened file. This can be done by calling ``.close()``, but the easiest way to ensure proper resource finalisation is to use a ``with`` block (context manager), e.g. :: with jsonlines.open('out.jsonl', mode='w') as writer: writer.write(...) :param file: name or ‘path-like object’ of the file to open :param mode: whether to open the file for reading (``r``), writing (``w``), appending (``a``), or exclusive creation (``x``). """ if mode not in {"r", "w", "a", "x"}: raise ValueError("'mode' must be either 'r', 'w', 'a', or 'x'") cls = Reader if mode == "r" else Writer encoding = "utf-8-sig" if mode == "r" else "utf-8" fp = builtins.open(file, mode=mode + "t", encoding=encoding) kwargs = dict( loads=loads, dumps=dumps, compact=compact, sort_keys=sort_keys, flush=flush, ) kwargs = {key: value for key, value in kwargs.items() if value is not None} instance: Union[Reader, Writer] = cls(fp, **kwargs) instance._should_close_fp = True return instance def repr_for_fp(fp: typing.IO[Any]) -> str: """ Helper to make a useful repr() for a file-like object. """ name = getattr(fp, "name", None) if name is not None: return repr(name) else: return repr(fp) jsonlines-4.0.0/jsonlines/py.typed000066400000000000000000000000001447435465300172270ustar00rootroot00000000000000jsonlines-4.0.0/mypy.ini000066400000000000000000000006541447435465300152420ustar00rootroot00000000000000[mypy] check_untyped_defs = True disallow_any_generics = True disallow_incomplete_defs = True disallow_subclassing_any = True disallow_untyped_calls = True disallow_untyped_decorators = True disallow_untyped_defs = True no_implicit_optional = True no_implicit_reexport = True show_error_codes = True strict_equality = True warn_redundant_casts = True warn_return_any = True warn_unused_configs = True warn_unused_ignores = True jsonlines-4.0.0/requirements-dev.txt000066400000000000000000000001071447435465300175740ustar00rootroot00000000000000black flake8 mypy orjson pytest>=3 pytest-cov sphinx types-ujson ujson jsonlines-4.0.0/setup.cfg000066400000000000000000000017561447435465300153700ustar00rootroot00000000000000[metadata] name = jsonlines version = 4.0.0 author = wouter bolsterlee author_email = wouter@bolsterl.ee license = BSD license_file = LICENSE.rst description = Library with helpers for the jsonlines file format long_description = file: README.rst url = https://github.com/wbolster/jsonlines classifiers = Development Status :: 5 - Production/Stable Intended Audience :: Developers Intended Audience :: System Administrators License :: OSI Approved :: BSD License Programming Language :: Python Programming Language :: Python :: 3 Programming Language :: Python :: 3 :: Only Topic :: Internet :: Log Analysis Topic :: Software Development :: Libraries :: Python Modules Topic :: System :: Logging Topic :: Utilities [options] packages = jsonlines python_requires = >=3.8 install_requires = attrs>=19.2.0 [options.package_data] jsonlines = py.typed [build_sphinx] source-dir = doc/ build-dir = doc/build/ [flake8] max-line-length = 88 extend-ignore = E203 jsonlines-4.0.0/setup.py000066400000000000000000000000461447435465300152500ustar00rootroot00000000000000from setuptools import setup setup() jsonlines-4.0.0/tests/000077500000000000000000000000001447435465300147005ustar00rootroot00000000000000jsonlines-4.0.0/tests/test_jsonlines.py000066400000000000000000000213641447435465300203230ustar00rootroot00000000000000""" Tests for the jsonlines library. """ import codecs import collections import io import json import tempfile import jsonlines import pytest SAMPLE_BYTES = b'{"a": 1}\n{"b": 2}\n' SAMPLE_TEXT = SAMPLE_BYTES.decode("utf-8") def is_json_decode_error(exc: object) -> bool: if type(exc).__module__ == "ujson": # The ujson package has its own ujson.JSONDecodeError; because of the # line above this function also works if it's not installed. import ujson return isinstance(exc, ujson.JSONDecodeError) else: # Otherwise, this should be a stdlib json.JSONDecodeError, which also # works for orjson since orjson.JSONDecodeError inherits from it. return isinstance(exc, json.JSONDecodeError) def test_reader() -> None: fp = io.BytesIO(SAMPLE_BYTES) with jsonlines.Reader(fp) as reader: it = iter(reader) assert next(it) == {"a": 1} assert next(it) == {"b": 2} with pytest.raises(StopIteration): next(it) with pytest.raises(EOFError): reader.read() def test_reading_from_iterable() -> None: with jsonlines.Reader(["1", b"{}"]) as reader: assert list(reader) == [1, {}] assert "wrapping None: fp = io.BytesIO(b'\x1e"a"\x0a\x1e"b"\x0a') with jsonlines.Reader(fp) as reader: assert list(reader) == ["a", "b"] def test_reader_utf8_bom_bytes() -> None: """ UTF-8 BOM is ignored, even if it occurs in the middle of a stream. """ chunks = [ codecs.BOM_UTF8, b"1\n", codecs.BOM_UTF8, b"2\n", ] fp = io.BytesIO(b"".join(chunks)) with jsonlines.Reader(fp) as reader: assert list(reader) == [1, 2] def test_reader_utf8_bom_text() -> None: """ Text version of ``test_reader_utf8_bom_bytes()``. """ chunks = [ "1\n", codecs.BOM_UTF8.decode(), "2\n", ] fp = io.StringIO("".join(chunks)) with jsonlines.Reader(fp) as reader: assert list(reader) == [1, 2] def test_reader_utf8_bom_bom_bom() -> None: """ Too many UTF-8 BOM BOM BOM chars cause BOOM 💥 BOOM. """ reader = jsonlines.Reader([codecs.BOM_UTF8.decode() * 3 + "1\n"]) with pytest.raises(jsonlines.InvalidLineError) as excinfo: reader.read() exc = excinfo.value assert "invalid json" in str(exc) assert is_json_decode_error(exc.__cause__) def test_writer_text() -> None: fp = io.StringIO() with jsonlines.Writer(fp) as writer: writer.write({"a": 1}) writer.write({"b": 2}) assert fp.getvalue() == SAMPLE_TEXT def test_writer_binary() -> None: fp = io.BytesIO() with jsonlines.Writer(fp) as writer: writer.write_all( [ {"a": 1}, {"b": 2}, ] ) assert fp.getvalue() == SAMPLE_BYTES def test_closing() -> None: reader = jsonlines.Reader([]) reader.close() with pytest.raises(RuntimeError): reader.read() writer = jsonlines.Writer(io.BytesIO()) writer.close() writer.close() # no-op with pytest.raises(RuntimeError): writer.write(123) def test_invalid_lines() -> None: data = "[1, 2" with jsonlines.Reader(io.StringIO(data)) as reader: with pytest.raises(jsonlines.InvalidLineError) as excinfo: reader.read() exc = excinfo.value assert "invalid json" in str(exc) assert exc.line == data assert is_json_decode_error(exc.__cause__) def test_skip_invalid() -> None: fp = io.StringIO("12\ninvalid\n34") reader = jsonlines.Reader(fp) it = reader.iter(skip_invalid=True) assert next(it) == 12 assert next(it) == 34 def test_empty_strings_in_iterable() -> None: input = ["123", "", "456"] it = iter(jsonlines.Reader(input)) assert next(it) == 123 with pytest.raises(jsonlines.InvalidLineError): next(it) with pytest.raises(StopIteration): next(it) it = jsonlines.Reader(input).iter(skip_empty=True) assert list(it) == [123, 456] def test_invalid_utf8() -> None: with jsonlines.Reader([b"\xff\xff"]) as reader: with pytest.raises(jsonlines.InvalidLineError) as excinfo: reader.read() assert "line is not valid utf-8" in str(excinfo.value) def test_empty_lines() -> None: data_with_empty_line = b"1\n\n2\n" with jsonlines.Reader(io.BytesIO(data_with_empty_line)) as reader: assert reader.read() with pytest.raises(jsonlines.InvalidLineError): reader.read() assert reader.read() == 2 with pytest.raises(EOFError): reader.read() with jsonlines.Reader(io.BytesIO(data_with_empty_line)) as reader: assert list(reader.iter(skip_empty=True)) == [1, 2] def test_typed_reads() -> None: with jsonlines.Reader(io.StringIO('12\ntrue\n"foo"\n')) as reader: assert reader.read(type=int) == 12 with pytest.raises(jsonlines.InvalidLineError) as excinfo: reader.read(type=int) exc = excinfo.value assert "does not match requested type" in str(exc) assert exc.line == "true" with pytest.raises(jsonlines.InvalidLineError) as excinfo: reader.read(type=float) exc = excinfo.value assert "does not match requested type" in str(exc) assert exc.line == '"foo"' def test_typed_read_invalid_type() -> None: reader = jsonlines.Reader([]) with pytest.raises(ValueError) as excinfo: reader.read(type="nope") # type: ignore[call-overload] exc = excinfo.value assert str(exc) == "invalid type specified" def test_typed_iteration() -> None: fp = io.StringIO("1\n2\n") with jsonlines.Reader(fp) as reader: actual = list(reader.iter(type=int)) assert actual == [1, 2] fp = io.StringIO("1\n2\n") with jsonlines.Reader(fp) as reader: it = reader.iter(type=str) with pytest.raises(jsonlines.InvalidLineError) as excinfo: next(it) exc = excinfo.value assert "does not match requested type" in str(exc) def test_writer_flags() -> None: fp = io.BytesIO() with jsonlines.Writer(fp, compact=True, sort_keys=True) as writer: writer.write( collections.OrderedDict( [ ("b", 2), ("a", 1), ] ) ) assert fp.getvalue() == b'{"a":1,"b":2}\n' def test_custom_dumps() -> None: fp = io.BytesIO() writer = jsonlines.Writer(fp, dumps=lambda obj: "oh hai") with writer: nbytes = writer.write({}) assert nbytes == len(b"oh hai\n") assert fp.getvalue() == b"oh hai\n" def test_custom_dumps_bytes() -> None: """ A custom dump function that returns bytes (e.g. ‘orjson’) should work. """ fp = io.BytesIO() writer = jsonlines.Writer(fp, dumps=lambda obj: b"some bytes") with writer: writer.write(123) assert fp.getvalue() == b"some bytes\n" def test_custom_loads() -> None: fp = io.BytesIO(b"{}\n") with jsonlines.Reader(fp, loads=lambda s: "uh what") as reader: assert reader.read() == "uh what" def test_open_reading() -> None: with tempfile.NamedTemporaryFile("wb") as fp: fp.write(b"123\n") fp.flush() with jsonlines.open(fp.name) as reader: assert list(reader) == [123] def test_open_reading_with_utf8_bom() -> None: """ The ``.open()`` helper ignores a UTF-8 BOM. """ with tempfile.NamedTemporaryFile("wb") as fp: fp.write(codecs.BOM_UTF8) fp.write(b"123\n") fp.flush() with jsonlines.open(fp.name) as reader: assert list(reader) == [123] def test_open_writing() -> None: with tempfile.NamedTemporaryFile("w+b") as fp: with jsonlines.open(fp.name, mode="w") as writer: writer.write(123) assert fp.read() == b"123\n" assert fp.name in repr(writer) def test_open_and_append_writing() -> None: with tempfile.NamedTemporaryFile("w+b") as fp: with jsonlines.open(fp.name, mode="w") as writer: nbytes = writer.write(123) assert nbytes == len(str(123)) + 1 with jsonlines.open(fp.name, mode="a") as writer: nbytes = writer.write(456) assert nbytes == len(str(456)) + 1 assert fp.read() == b"123\n456\n" def test_open_invalid_mode() -> None: with pytest.raises(ValueError) as excinfo: jsonlines.open("foo", mode="foo") assert "mode" in str(excinfo.value) def test_single_char_stripping() -> None: """ " Sanity check that a helper constant actually contains single-char strings. """ assert all(len(s) == 1 for s in jsonlines.jsonlines.SKIPPABLE_SINGLE_INITIAL_CHARS) jsonlines-4.0.0/tests/test_typing.py000066400000000000000000000047131447435465300176300ustar00rootroot00000000000000""" This file should give any type checking errors. """ import io import json import random import numbers from typing import TYPE_CHECKING, Any, Dict, Iterable, List, Optional if not TYPE_CHECKING: def reveal_type(obj: Any) -> None: pass import jsonlines def something_with_reader() -> None: reader: jsonlines.Reader reader = jsonlines.Reader(io.StringIO()) reader = jsonlines.Reader(io.BytesIO()) reader = jsonlines.Reader(['"text"']) reader = jsonlines.Reader([b'"bytes"']) r1 = reader.read() r2 = reader.read(allow_none=True) r3: numbers.Number = reader.read(type=random.choice([int, float])) # For debugging: # reveal_type(r1) # reveal_type(r2) # reveal_type(r3) some_int: int = reader.read(type=int) maybe_int: Optional[int] = reader.read(type=int, allow_none=True) some_float: float = reader.read(type=float) maybe_float: Optional[float] = reader.read(type=float, allow_none=True) some_bool: bool = reader.read(type=bool) maybe_bool: Optional[bool] = reader.read(type=bool, allow_none=True) some_dict: Dict[str, Any] = reader.read(type=dict) optional_dict: Optional[Dict[str, Any]] = reader.read(type=dict, allow_none=True) some_list: List[Any] = reader.read(type=list) maybe_list: Optional[List[Any]] = reader.read(type=list, allow_none=True) iter_int: Iterable[int] = reader.iter(type=int) iter_str: Iterable[str] = reader.iter(type=str) iter_dict: Iterable[Dict[str, Any]] = reader.iter(type=dict) iter_optional_str: Iterable[Optional[str]] = reader.iter(type=str, allow_none=True) locals() # Silence flake8 F841 def something_with_writer() -> None: writer: jsonlines.Writer writer = jsonlines.Writer(io.StringIO()) writer = jsonlines.Writer(io.BytesIO()) locals() # Silence flake8 F841 def something_with_open() -> None: name = "/nonexistent" reader: jsonlines.Reader reader = jsonlines.open(name) reader = jsonlines.open(name, "r") reader = jsonlines.open(name, mode="r") reader = jsonlines.open( name, mode="r", loads=json.loads, ) writer: jsonlines.Writer writer = jsonlines.open(name, "w") writer = jsonlines.open(name, mode="w") writer = jsonlines.open(name, "a") writer = jsonlines.open( name, mode="w", dumps=json.dumps, compact=True, sort_keys=True, flush=True, ) locals() # Silence flake8 F841 jsonlines-4.0.0/tox.ini000066400000000000000000000004551447435465300150550ustar00rootroot00000000000000[tox] envlist = py311,py310,py39,py38,linters [testenv] deps = -rrequirements-dev.txt commands = pytest {posargs} tests/ [testenv:linters] basepython = python3.11 deps = -rrequirements-dev.txt commands = flake8 jsonlines/ tests/ black --check jsonlines/ tests/ mypy --strict jsonlines/ tests/