pax_global_header 0000666 0000000 0000000 00000000064 14137100362 0014507 g ustar 00root root 0000000 0000000 52 comment=d87e341344d725fc8fc8cf02c865c95fcaf36234
python-rdata-0.5/ 0000775 0000000 0000000 00000000000 14137100362 0013765 5 ustar 00root root 0000000 0000000 python-rdata-0.5/.github/ 0000775 0000000 0000000 00000000000 14137100362 0015325 5 ustar 00root root 0000000 0000000 python-rdata-0.5/.github/workflows/ 0000775 0000000 0000000 00000000000 14137100362 0017362 5 ustar 00root root 0000000 0000000 python-rdata-0.5/.github/workflows/main.yml 0000664 0000000 0000000 00000001536 14137100362 0021036 0 ustar 00root root 0000000 0000000 name: Tests
on:
push:
pull_request:
jobs:
build:
runs-on: ${{ matrix.os }}
name: Python ${{ matrix.python-version }} on ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: ['3.7', '3.8', '3.9']
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }} on ${{ matrix.os }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
pip3 install codecov pytest-cov || pip3 install --user codecov pytest-cov;
- name: Run tests
run: |
pip3 install .
coverage run --source=rdata/ --omit=rdata/tests/ setup.py test;
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v1
python-rdata-0.5/.gitignore 0000664 0000000 0000000 00000002263 14137100362 0015760 0 ustar 00root root 0000000 0000000 # Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# pyenv
.python-version
# celery beat schedule file
celerybeat-schedule
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
python-rdata-0.5/LICENSE 0000664 0000000 0000000 00000002066 14137100362 0014776 0 ustar 00root root 0000000 0000000 MIT License
Copyright (c) 2018 Carlos Ramos Carreño
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
python-rdata-0.5/MANIFEST.in 0000664 0000000 0000000 00000000130 14137100362 0015515 0 ustar 00root root 0000000 0000000 include MANIFEST.in
include VERSION
include LICENSE
include rdata/py.typed
include *.txt python-rdata-0.5/README.rst 0000664 0000000 0000000 00000010412 14137100362 0015452 0 ustar 00root root 0000000 0000000 rdata
=====
|build-status| |docs| |coverage| |landscape| |pypi|
Read R datasets from Python.
..
Github does not support include in README for dubious security reasons, so
we copy-paste instead. Also Github does not understand Sphinx directives.
.. include:: docs/simpleusage.rst
Installation
============
rdata is on PyPi and can be installed using :code:`pip`:
.. code::
pip install rdata
It is also available for :code:`conda` using the :code:`conda-forge` channel:
.. code::
conda install -c conda-forge rdata
Documentation
=============
The documentation of rdata is in
`ReadTheDocs `_.
Simple usage
============
Read a R dataset
----------------
The common way of reading an R dataset is the following one:
>>> import rdata
>>> parsed = rdata.parser.parse_file(rdata.TESTDATA_PATH / "test_vector.rda")
>>> converted = rdata.conversion.convert(parsed)
>>> converted
{'test_vector': array([1., 2., 3.])}
This consists on two steps:
#. First, the file is parsed using the function
`parse_file`. This provides a literal description of the
file contents as a hierarchy of Python objects representing the basic R
objects. This step is unambiguous and always the same.
#. Then, each object must be converted to an appropriate Python object. In this
step there are several choices on which Python type is the most appropriate
as the conversion for a given R object. Thus, we provide a default
`convert` routine, which tries to select Python
objects that preserve most information of the original R object. For custom
R classes, it is also possible to specify conversion routines to Python
objects.
Convert custom R classes
------------------------
The basic `convert` routine only constructs a
`SimpleConverter` objects and calls its
`convert` method. All arguments of
`convert` are directly passed to the
`SimpleConverter` initialization method.
It is possible, although not trivial, to make a custom
`Converter` object to change the way in which the
basic R objects are transformed to Python objects. However, a more common
situation is that one does not want to change how basic R objects are
converted, but instead wants to provide conversions for specific R classes.
This can be done by passing a dictionary to the
`SimpleConverter` initialization method, containing
as keys the names of R classes and as values, callables that convert a
R object of that class to a Python object. By default, the dictionary used
is `DEFAULT_CLASS_MAP`, which can convert
commonly used R classes such as `data.frame` and `factor`.
As an example, here is how we would implement a conversion routine for the
factor class to `bytes` objects, instead of the default conversion to
Pandas `Categorical` objects:
>>> import rdata
>>> def factor_constructor(obj, attrs):
... values = [bytes(attrs['levels'][i - 1], 'utf8')
... if i >= 0 else None for i in obj]
...
... return values
>>> new_dict = {
... **rdata.conversion.DEFAULT_CLASS_MAP,
... "factor": factor_constructor
... }
>>> parsed = rdata.parser.parse_file(rdata.TESTDATA_PATH
... / "test_dataframe.rda")
>>> converted = rdata.conversion.convert(parsed, new_dict)
>>> converted
{'test_dataframe': class value
0 b'a' 1
1 b'b' 2
2 b'b' 3}
.. |build-status| image:: https://github.com/vnmabus/rdata/actions/workflows/main.yml/badge.svg?branch=master
:alt: build status
:scale: 100%
:target: https://github.com/vnmabus/rdata/actions/workflows/main.yml
.. |docs| image:: https://readthedocs.org/projects/rdata/badge/?version=latest
:alt: Documentation Status
:scale: 100%
:target: https://rdata.readthedocs.io/en/latest/?badge=latest
.. |coverage| image:: http://codecov.io/github/vnmabus/rdata/coverage.svg?branch=develop
:alt: Coverage Status
:scale: 100%
:target: https://codecov.io/gh/vnmabus/rdata/branch/develop
.. |landscape| image:: https://landscape.io/github/vnmabus/rdata/develop/landscape.svg?style=flat
:target: https://landscape.io/github/vnmabus/rdata/develop
:alt: Code Health
.. |pypi| image:: https://badge.fury.io/py/rdata.svg
:alt: Pypi version
:scale: 100%
:target: https://pypi.python.org/pypi/rdata/ python-rdata-0.5/VERSION 0000664 0000000 0000000 00000000003 14137100362 0015026 0 ustar 00root root 0000000 0000000 0.5 python-rdata-0.5/conftest.py 0000664 0000000 0000000 00000000036 14137100362 0016163 0 ustar 00root root 0000000 0000000 collect_ignore = ['setup.py']
python-rdata-0.5/docs/ 0000775 0000000 0000000 00000000000 14137100362 0014715 5 ustar 00root root 0000000 0000000 python-rdata-0.5/docs/.gitignore 0000664 0000000 0000000 00000000026 14137100362 0016703 0 ustar 00root root 0000000 0000000 /functions/
/modules/
python-rdata-0.5/docs/Makefile 0000664 0000000 0000000 00000001132 14137100362 0016352 0 ustar 00root root 0000000 0000000 # Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
SPHINXPROJ = rdata
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) python-rdata-0.5/docs/_templates/ 0000775 0000000 0000000 00000000000 14137100362 0017052 5 ustar 00root root 0000000 0000000 python-rdata-0.5/docs/_templates/autosummary/ 0000775 0000000 0000000 00000000000 14137100362 0021440 5 ustar 00root root 0000000 0000000 python-rdata-0.5/docs/_templates/autosummary/base.rst 0000664 0000000 0000000 00000000150 14137100362 0023100 0 ustar 00root root 0000000 0000000 {{ objname | escape | underline}}
.. currentmodule:: {{ module }}
.. auto{{ objtype }}:: {{ objname }} python-rdata-0.5/docs/_templates/autosummary/class.rst 0000664 0000000 0000000 00000001024 14137100362 0023274 0 ustar 00root root 0000000 0000000 {{ objname | escape | underline}}
.. currentmodule:: {{ module }}
.. autoclass:: {{ objname }}
{% block methods %}
{% if methods %}
.. rubric:: Methods
.. autosummary::
{% for item in methods %}
~{{ name }}.{{ item }}
{%- endfor %}
{% endif %}
.. automethod:: __init__
{% endblock %}
{% block attributes %}
{% if attributes %}
.. rubric:: Attributes
.. autosummary::
{% for item in attributes %}
~{{ name }}.{{ item }}
{%- endfor %}
{% endif %}
{% endblock %} python-rdata-0.5/docs/_templates/autosummary/module.rst 0000664 0000000 0000000 00000002155 14137100362 0023462 0 ustar 00root root 0000000 0000000 {{ objname | escape | underline}}
.. automodule:: {{ fullname }}
{% block attributes %}
{% if attributes %}
.. rubric:: {{ _('Module Attributes') }}
.. autosummary::
:toctree:
{% for item in attributes %}
{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}
{% block functions %}
{% if functions %}
.. rubric:: {{ _('Functions') }}
.. autosummary::
:toctree:
{% for item in functions %}
{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}
{% block classes %}
{% if classes %}
.. rubric:: {{ _('Classes') }}
.. autosummary::
:toctree:
{% for item in classes %}
{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}
{% block exceptions %}
{% if exceptions %}
.. rubric:: {{ _('Exceptions') }}
.. autosummary::
:toctree:
{% for item in exceptions %}
{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}
{% block modules %}
{% if modules %}
.. rubric:: Modules
.. autosummary::
:toctree:
:recursive:
{% for item in modules %}
{{ item }}
{%- endfor %}
{% endif %}
{% endblock %} python-rdata-0.5/docs/apilist.rst 0000664 0000000 0000000 00000002046 14137100362 0017116 0 ustar 00root root 0000000 0000000 API List
========
List of functions and structures
--------------------------------
A complete list of all functions and structures provided by rdata.
Parse :code:`.rda` format
^^^^^^^^^^^^^^^^^^^^^^^^^
Functions for parsing data in the :code:`.rda` format. These functions return a structure representing
the contents of the file, without transforming it to more appropiate Python objects. Thus, if a different
way of converting R objects to Python objects is needed, it can be done from this structure.
.. autosummary::
:toctree: modules
rdata.parser.parse_file
rdata.parser.parse_data
Conversion of the R objects
^^^^^^^^^^^^^^^^^^^^^^^^^^^
These objects and functions convert the parsed R objects to appropiate Python objects. The Python object
corresponding to a R object is chosen to preserve most original properties, but it could change in the
future, if a more fitting Python object is found.
.. autosummary::
:toctree: modules
rdata.conversion.Converter
rdata.conversion.SimpleConverter
rdata.conversion.convert
python-rdata-0.5/docs/conf.py 0000664 0000000 0000000 00000014712 14137100362 0016221 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# dcor documentation build configuration file, created by
# sphinx-quickstart on Tue Aug 7 12:49:32 2018.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, '/home/carlos/git/rdata/rdata')
import sys
import pkg_resources
try:
release = pkg_resources.get_distribution('rdata').version
except pkg_resources.DistributionNotFound:
print('To build the documentation, The distribution information of rdata\n'
'Has to be available. Either install the package into your\n'
'development environment or run "setup.py develop" to setup the\n'
'metadata. A virtualenv is recommended!\n')
sys.exit(1)
del pkg_resources
version = '.'.join(release.split('.')[:2])
# -- General configuration ------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
#
# needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['sphinx.ext.autodoc',
'sphinx.ext.autosummary',
'sphinx.ext.todo',
'sphinx.ext.viewcode',
'sphinx.ext.napoleon',
'sphinx.ext.mathjax',
'sphinx.ext.intersphinx']
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
# source_suffix = ['.rst', '.md']
source_suffix = '.rst'
# The master toctree document.
master_doc = 'index'
# General information about the project.
project = 'rdata'
copyright = '2018, Carlos Ramos Carreño'
author = 'Carlos Ramos Carreño'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
# version = ''
# The full version, including alpha/beta/rc tags.
# release = ''
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = 'en'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This patterns also effect to html_static_path and html_extra_path
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = True
add_module_names = False
autosummary_generate = True
# -- Options for HTML output ----------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'sphinx_rtd_theme'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#
# html_theme_options = {}
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# Custom sidebar templates, must be a dictionary that maps document names
# to template names.
#
# This is required for the alabaster theme
# refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars
html_sidebars = {
'**': [
'about.html',
'navigation.html',
'relations.html', # needs 'show_related': True theme option to display
'searchbox.html',
'donate.html',
]
}
# -- Options for HTMLHelp output ------------------------------------------
# Output file base name for HTML help builder.
htmlhelp_basename = 'rdatadoc'
# -- Options for LaTeX output ---------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#
# 'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#
# 'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#
# 'preamble': '',
# Latex figure (float) alignment
#
# 'figure_align': 'htbp',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'rdata.tex', 'rdata Documentation',
'Carlos Ramos Carreño', 'manual'),
]
# -- Options for manual page output ---------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
(master_doc, 'rdata', 'rdata Documentation',
[author], 1)
]
# -- Options for Texinfo output -------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'rdata', 'rdata Documentation',
author, 'rdata', 'One line description of project.',
'Miscellaneous'),
]
# -- Options for Epub output ----------------------------------------------
# Bibliographic Dublin Core info.
epub_title = project
epub_author = author
epub_publisher = author
epub_copyright = copyright
# The unique identifier of the text. This can be a ISBN number
# or the project homepage.
#
# epub_identifier = ''
# A unique identification for the text.
#
# epub_uid = ''
# A list of files that should not be packed into the epub file.
epub_exclude_files = ['search.html']
intersphinx_mapping = {'python': ('https://docs.python.org/3', None),
'pandas': ('http://pandas.pydata.org/pandas-docs/dev', None)}
python-rdata-0.5/docs/index.rst 0000664 0000000 0000000 00000002666 14137100362 0016570 0 ustar 00root root 0000000 0000000 rdata version |version|
=======================
|build-status| |docs| |coverage| |landscape| |pypi|
Open :code:`.rda` R data files containing datasets and convert them to the appropiate Python objects.
.. toctree::
:maxdepth: 4
:caption: Contents:
installation
simpleusage
apilist
internalapi
rdata is developed `on Github `_. Please
report `issues `_ there as well.
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
.. |build-status| image:: https://api.travis-ci.org/vnmabus/rdata.svg?branch=master
:alt: build status
:scale: 100%
:target: https://travis-ci.org/vnmabus/rdata
.. |docs| image:: https://readthedocs.org/projects/rdata/badge/?version=latest
:alt: Documentation Status
:scale: 100%
:target: https://rdata.readthedocs.io/en/latest/?badge=latest
.. |coverage| image:: http://codecov.io/github/vnmabus/rdata/coverage.svg?branch=develop
:alt: Coverage Status
:scale: 100%
:target: https://codecov.io/gh/vnmabus/rdata/branch/develop
.. |landscape| image:: https://landscape.io/github/vnmabus/rdata/develop/landscape.svg?style=flat
:target: https://landscape.io/github/vnmabus/rdata/develop
:alt: Code Health
.. |pypi| image:: https://badge.fury.io/py/rdata.svg
:alt: Pypi version
:scale: 100%
:target: https://pypi.python.org/pypi/rdata/
python-rdata-0.5/docs/installation.rst 0000664 0000000 0000000 00000000366 14137100362 0020155 0 ustar 00root root 0000000 0000000 Installation
============
rdata is on PyPi and can be installed using :code:`pip`:
.. code::
pip install rdata
It is also available for :code:`conda` using the :code:`conda-forge` channel:
.. code::
conda install -c conda-forge rdata
python-rdata-0.5/docs/internalapi.rst 0000664 0000000 0000000 00000000277 14137100362 0017763 0 ustar 00root root 0000000 0000000 Internal documentation
======================
List of modules
---------------
.. autosummary::
:toctree: modules
:recursive:
rdata.parser._parser
rdata.conversion._conversion python-rdata-0.5/docs/make.bat 0000664 0000000 0000000 00000001451 14137100362 0016323 0 ustar 00root root 0000000 0000000 @ECHO OFF
pushd %~dp0
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=.
set BUILDDIR=_build
set SPHINXPROJ=rdata
if "%1" == "" goto help
%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
goto end
:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
:end
popd
python-rdata-0.5/docs/simpleusage.rst 0000664 0000000 0000000 00000005744 14137100362 0017777 0 ustar 00root root 0000000 0000000 Simple usage
============
Read a R dataset
----------------
The common way of reading an R dataset is the following one:
>>> import rdata
>>> parsed = rdata.parser.parse_file(rdata.TESTDATA_PATH / "test_vector.rda")
>>> converted = rdata.conversion.convert(parsed)
>>> converted
{'test_vector': array([1., 2., 3.])}
This consists on two steps:
#. First, the file is parsed using the function
:func:`~rdata.parser.parse_file`. This provides a literal description of the
file contents as a hierarchy of Python objects representing the basic R
objects. This step is unambiguous and always the same.
#. Then, each object must be converted to an appropriate Python object. In this
step there are several choices on which Python type is the most appropriate
as the conversion for a given R object. Thus, we provide a default
:func:`~rdata.conversion.convert` routine, which tries to select Python
objects that preserve most information of the original R object. For custom
R classes, it is also possible to specify conversion routines to Python
objects.
Convert custom R classes
------------------------
The basic :func:`~rdata.conversion.convert` routine only constructs a
:class:`~rdata.conversion.SimpleConverter` objects and calls its
:func:`~rdata.conversion.SimpleConverter.convert` method. All arguments of
:func:`~rdata.conversion.convert` are directly passed to the
:class:`~rdata.conversion.SimpleConverter` initialization method.
It is possible, although not trivial, to make a custom
:class:`~rdata.conversion.Converter` object to change the way in which the
basic R objects are transformed to Python objects. However, a more common
situation is that one does not want to change how basic R objects are
converted, but instead wants to provide conversions for specific R classes.
This can be done by passing a dictionary to the
:class:`~rdata.conversion.SimpleConverter` initialization method, containing
as keys the names of R classes and as values, callables that convert a
R object of that class to a Python object. By default, the dictionary used
is :data:`~rdata.conversion._conversion.DEFAULT_CLASS_MAP`, which can convert
commonly used R classes such as `data.frame` and `factor`.
As an example, here is how we would implement a conversion routine for the
factor class to :class:`bytes` objects, instead of the default conversion to
Pandas :class:`~pandas.Categorical` objects:
>>> import rdata
>>> def factor_constructor(obj, attrs):
... values = [bytes(attrs['levels'][i - 1], 'utf8')
... if i >= 0 else None for i in obj]
...
... return values
>>> new_dict = {
... **rdata.conversion.DEFAULT_CLASS_MAP,
... "factor": factor_constructor
... }
>>> parsed = rdata.parser.parse_file(rdata.TESTDATA_PATH
... / "test_dataframe.rda")
>>> converted = rdata.conversion.convert(parsed, new_dict)
>>> converted
{'test_dataframe': class value
0 b'a' 1
1 b'b' 2
2 b'b' 3}
python-rdata-0.5/pyproject.toml 0000664 0000000 0000000 00000000140 14137100362 0016674 0 ustar 00root root 0000000 0000000 [build-system]
# Minimum requirements for the build system to execute.
requires = ["setuptools"] python-rdata-0.5/rdata/ 0000775 0000000 0000000 00000000000 14137100362 0015060 5 ustar 00root root 0000000 0000000 python-rdata-0.5/rdata/__init__.py 0000664 0000000 0000000 00000000414 14137100362 0017170 0 ustar 00root root 0000000 0000000 import os as _os
import pathlib as _pathlib
from . import conversion, parser
def _get_test_data_path() -> _pathlib.Path:
return _pathlib.Path(_os.path.dirname(__file__)) / "tests" / "data"
TESTDATA_PATH = _get_test_data_path()
"""
Path of the test data.
"""
python-rdata-0.5/rdata/conversion/ 0000775 0000000 0000000 00000000000 14137100362 0017245 5 ustar 00root root 0000000 0000000 python-rdata-0.5/rdata/conversion/__init__.py 0000664 0000000 0000000 00000000665 14137100362 0021365 0 ustar 00root root 0000000 0000000 from ._conversion import (RExpression, RLanguage,
convert_list, convert_attrs, convert_vector,
convert_char, convert_symbol, convert_array,
Converter, SimpleConverter,
dataframe_constructor,
factor_constructor,
ts_constructor,
DEFAULT_CLASS_MAP, convert)
python-rdata-0.5/rdata/conversion/_conversion.py 0000664 0000000 0000000 00000047406 14137100362 0022156 0 ustar 00root root 0000000 0000000 import abc
import warnings
from fractions import Fraction
from types import MappingProxyType, SimpleNamespace
from typing import (
Any,
Callable,
ChainMap,
Hashable,
List,
Mapping,
MutableMapping,
NamedTuple,
Optional,
Union,
cast,
)
import numpy as np
import pandas
import xarray
from .. import parser
from ..parser import RObject
class RLanguage(NamedTuple):
"""
R language construct.
"""
elements: List[Any]
class RExpression(NamedTuple):
"""
R expression.
"""
elements: List[RLanguage]
def convert_list(
r_list: parser.RObject,
conversion_function: Callable[
[Union[parser.RData, parser.RObject]
], Any]=lambda x: x
) -> Union[Mapping[Union[str, bytes], Any], List[Any]]:
"""
Expand a tagged R pairlist to a Python dictionary.
Parameters
----------
r_list: RObject
Pairlist R object, with tags.
conversion_function: Callable
Conversion function to apply to the elements of the list. By default
is the identity function.
Returns
-------
dictionary: dict
A dictionary with the tags of the pairwise list as keys and their
corresponding values as values.
See Also
--------
convert_vector
"""
if r_list.info.type is parser.RObjectType.NILVALUE:
return {}
elif r_list.info.type not in [parser.RObjectType.LIST,
parser.RObjectType.LANG]:
raise TypeError("Must receive a LIST, LANG or NILVALUE object")
if r_list.tag is None:
tag = None
else:
tag = conversion_function(r_list.tag)
cdr = conversion_function(r_list.value[1])
if tag is not None:
if cdr is None:
cdr = {}
return {tag: conversion_function(r_list.value[0]), **cdr}
else:
if cdr is None:
cdr = []
return [conversion_function(r_list.value[0]), *cdr]
def convert_env(
r_env: parser.RObject,
conversion_function: Callable[
[Union[parser.RData, parser.RObject]
], Any]=lambda x: x
) -> ChainMap[Union[str, bytes], Any]:
if r_env.info.type is not parser.RObjectType.ENV:
raise TypeError("Must receive a ENV object")
frame = conversion_function(r_env.value.frame)
enclosure = conversion_function(r_env.value.enclosure)
hash_table = conversion_function(r_env.value.hash_table)
dictionary = {}
for d in hash_table:
if d is not None:
dictionary.update(d)
return ChainMap(dictionary, enclosure)
def convert_attrs(
r_obj: parser.RObject,
conversion_function: Callable[
[Union[parser.RData, parser.RObject]
], Any]=lambda x: x
) -> Mapping[Union[str, bytes], Any]:
"""
Return the attributes of an object as a Python dictionary.
Parameters
----------
r_obj: RObject
R object.
conversion_function: Callable
Conversion function to apply to the elements of the attribute list. By
default is the identity function.
Returns
-------
dictionary: dict
A dictionary with the names of the attributes as keys and their
corresponding values as values.
See Also
--------
convert_list
"""
if r_obj.attributes:
attrs = cast(
Mapping[Union[str, bytes], Any],
conversion_function(r_obj.attributes),
)
else:
attrs = {}
return attrs
def convert_vector(
r_vec: parser.RObject,
conversion_function: Callable[
[Union[parser.RData, parser.RObject]], Any]=lambda x: x,
attrs: Optional[Mapping[Union[str, bytes], Any]] = None,
) -> Union[List[Any], Mapping[Union[str, bytes], Any]]:
"""
Convert a R vector to a Python list or dictionary.
If the vector has a ``names`` attribute, the result is a dictionary with
the names as keys. Otherwise, the result is a Python list.
Parameters
----------
r_vec: RObject
R vector.
conversion_function: Callable
Conversion function to apply to the elements of the vector. By default
is the identity function.
Returns
-------
vector: dict or list
A dictionary with the ``names`` of the vector as keys and their
corresponding values as values. If the vector does not have an argument
``names``, then a normal Python list is returned.
See Also
--------
convert_list
"""
if attrs is None:
attrs = {}
if r_vec.info.type not in [parser.RObjectType.VEC,
parser.RObjectType.EXPR]:
raise TypeError("Must receive a VEC or EXPR object")
value: Union[List[Any], Mapping[Union[str, bytes], Any]] = [
conversion_function(o) for o in r_vec.value
]
# If it has the name attribute, use a dict instead
field_names = attrs.get('names')
if field_names:
value = dict(zip(field_names, value))
return value
def safe_decode(byte_str: bytes, encoding: str) -> Union[str, bytes]:
"""
Decode a (possibly malformed) string.
"""
try:
return byte_str.decode(encoding)
except UnicodeDecodeError as e:
warnings.warn(
f"Exception while decoding {byte_str!r}: {e}",
)
return byte_str
def convert_char(
r_char: parser.RObject,
default_encoding: Optional[str] = None,
force_default_encoding: bool = False,
) -> Union[str, bytes, None]:
"""
Decode a R character array to a Python string or bytes.
The bits that signal the encoding are in the general pointer. The
string can be encoded in UTF8, LATIN1 or ASCII, or can be a sequence
of bytes.
Parameters
----------
r_char: RObject
R character array.
Returns
-------
string: str or bytes
Decoded string.
See Also
--------
convert_symbol
"""
if r_char.info.type is not parser.RObjectType.CHAR:
raise TypeError("Must receive a CHAR object")
if r_char.value is None:
return None
assert isinstance(r_char.value, bytes)
if not force_default_encoding:
if r_char.info.gp & parser.CharFlags.UTF8:
return safe_decode(r_char.value, "utf_8")
elif r_char.info.gp & parser.CharFlags.LATIN1:
return safe_decode(r_char.value, "latin_1")
elif r_char.info.gp & parser.CharFlags.ASCII:
return safe_decode(r_char.value, "ascii")
elif r_char.info.gp & parser.CharFlags.BYTES:
return r_char.value
if default_encoding:
return safe_decode(r_char.value, default_encoding)
else:
# Assume ASCII if no encoding is marked
warnings.warn(f"Unknown encoding. Assumed ASCII.")
return safe_decode(r_char.value, "ascii")
def convert_symbol(r_symbol: parser.RObject,
conversion_function: Callable[
[Union[parser.RData, parser.RObject]],
Any]=lambda x: x
) -> Union[str, bytes]:
"""
Decode a R symbol to a Python string or bytes.
Parameters
----------
r_symbol: RObject
R symbol.
conversion_function: Callable
Conversion function to apply to the char element of the symbol.
By default is the identity function.
Returns
-------
string: str or bytes
Decoded string.
See Also
--------
convert_char
"""
if r_symbol.info.type is parser.RObjectType.SYM:
symbol = conversion_function(r_symbol.value)
assert isinstance(symbol, (str, bytes))
return symbol
else:
raise TypeError("Must receive a SYM object")
def convert_array(
r_array: RObject,
conversion_function: Callable[
[Union[parser.RData, parser.RObject]
], Any]=lambda x: x,
attrs: Optional[Mapping[Union[str, bytes], Any]] = None,
) -> Union[np.ndarray, xarray.DataArray]:
"""
Convert a R array to a Numpy ndarray or a Xarray DataArray.
If the array has attribute ``dimnames`` the output will be a
Xarray DataArray, preserving the dimension names.
Parameters
----------
r_array: RObject
R array.
conversion_function: Callable
Conversion function to apply to the attributes of the array.
By default is the identity function.
Returns
-------
array: ndarray or DataArray
Array.
See Also
--------
convert_vector
"""
if attrs is None:
attrs = {}
if r_array.info.type not in {parser.RObjectType.LGL,
parser.RObjectType.INT,
parser.RObjectType.REAL,
parser.RObjectType.CPLX}:
raise TypeError("Must receive an array object")
value = r_array.value
shape = attrs.get('dim')
if shape is not None:
# R matrix order is like FORTRAN
value = np.reshape(value, shape, order='F')
dimnames = attrs.get('dimnames')
if dimnames:
dimension_names = ["dim_" + str(i) for i, _ in enumerate(dimnames)]
coords: Mapping[Hashable, Any] = {
dimension_names[i]: d
for i, d in enumerate(dimnames) if d is not None}
value = xarray.DataArray(value, dims=dimension_names, coords=coords)
return value
def dataframe_constructor(
obj: Any,
attrs: Mapping[Union[str, bytes], Any],
) -> pandas.DataFrame:
return pandas.DataFrame(obj, columns=obj)
def _factor_constructor_internal(
obj: Any,
attrs: Mapping[Union[str, bytes], Any],
ordered: bool,
) -> pandas.Categorical:
values = [attrs['levels'][i - 1] if i >= 0 else None for i in obj]
return pandas.Categorical(values, attrs['levels'], ordered=ordered)
def factor_constructor(
obj: Any,
attrs: Mapping[Union[str, bytes], Any],
) -> pandas.Categorical:
return _factor_constructor_internal(obj, attrs, ordered=False)
def ordered_constructor(
obj: Any,
attrs: Mapping[Union[str, bytes], Any],
) -> pandas.Categorical:
return _factor_constructor_internal(obj, attrs, ordered=True)
def ts_constructor(
obj: Any,
attrs: Mapping[Union[str, bytes], Any],
) -> pandas.Series:
start, end, frequency = attrs['tsp']
frequency = int(frequency)
real_start = Fraction(int(round(start * frequency)), frequency)
real_end = Fraction(int(round(end * frequency)), frequency)
index = np.arange(real_start, real_end + Fraction(1, frequency),
Fraction(1, frequency))
if frequency == 1:
index = index.astype(int)
return pandas.Series(obj, index=index)
Constructor = Callable[[Any, Mapping], Any]
default_class_map_dict: Mapping[Union[str, bytes], Constructor] = {
"data.frame": dataframe_constructor,
"factor": factor_constructor,
"ordered": ordered_constructor,
"ts": ts_constructor,
}
DEFAULT_CLASS_MAP = MappingProxyType(default_class_map_dict)
"""
Default mapping of constructor functions.
It has support for converting several commonly used R classes:
- Converts R \"data.frame\" objects into Pandas :class:`~pandas.DataFrame`
objects.
- Converts R \"factor\" objects into unordered Pandas
:class:`~pandas.Categorical` objects.
- Converts R \"ordered\" objects into ordered Pandas
:class:`~pandas.Categorical` objects.
- Converts R \"ts\" objects into Pandas :class:`~pandas.Series` objects.
"""
class Converter(abc.ABC):
"""
Interface of a class converting R objects in Python objects.
"""
@abc.abstractmethod
def convert(self, data: Union[parser.RData, parser.RObject]) -> Any:
"""
Convert a R object to a Python one.
"""
pass
class SimpleConverter(Converter):
"""
Class converting R objects to Python objects.
Parameters
----------
constructor_dict:
Dictionary mapping names of R classes to constructor functions with
the following prototype:
.. code-block :: python
def constructor(obj, attrs):
This dictionary can be used to support custom R classes. By default,
the dictionary used is
:data:`~rdata.conversion._conversion.DEFAULT_CLASS_MAP`
which has support for several common classes.
default_encoding:
Default encoding used for strings with unknown encoding. If `None`,
the one stored in the file will be used, or ASCII as a fallback.
force_default_encoding:
Use the default encoding even if the strings specify other encoding.
"""
def __init__(
self,
constructor_dict: Mapping[
Union[str, bytes],
Constructor,
] = DEFAULT_CLASS_MAP,
default_encoding: Optional[str] = None,
force_default_encoding: bool = False,
global_environment: Optional[Mapping[Union[str, bytes], Any]] = None,
) -> None:
self.constructor_dict = constructor_dict
self.default_encoding = default_encoding
self.force_default_encoding = force_default_encoding
self.global_environment = ChainMap(
{} if global_environment is None
else global_environment
)
self.empty_environment: Mapping[Union[str, bytes], Any] = ChainMap({})
self._reset()
def _reset(self) -> None:
self.references: MutableMapping[int, Any] = {}
self.default_encoding_used = self.default_encoding
def convert(self, data: Union[parser.RData, parser.RObject]) -> Any:
self._reset()
return self._convert_next(data)
def _convert_next(self, data: Union[parser.RData, parser.RObject]) -> Any:
"""
Convert a R object to a Python one.
"""
obj: RObject
if isinstance(data, parser.RData):
obj = data.object
if self.default_encoding is None:
self.default_encoding_used = data.extra.encoding
else:
obj = data
attrs = convert_attrs(obj, self._convert_next)
reference_id = id(obj)
# Return the value if previously referenced
value: Any = self.references.get(id(obj))
if value is not None:
pass
if obj.info.type == parser.RObjectType.SYM:
# Return the internal string
value = convert_symbol(obj, self._convert_next)
elif obj.info.type == parser.RObjectType.LIST:
# Expand the list and process the elements
value = convert_list(obj, self._convert_next)
elif obj.info.type == parser.RObjectType.ENV:
# Return a ChainMap of the environments
value = convert_env(obj, self._convert_next)
elif obj.info.type == parser.RObjectType.LANG:
# Expand the list and process the elements, returning a
# special object
rlanguage_list = convert_list(obj, self._convert_next)
assert isinstance(rlanguage_list, list)
value = RLanguage(rlanguage_list)
elif obj.info.type == parser.RObjectType.CHAR:
# Return the internal string
value = convert_char(
obj,
default_encoding=self.default_encoding_used,
force_default_encoding=self.force_default_encoding,
)
elif obj.info.type in {parser.RObjectType.LGL,
parser.RObjectType.INT,
parser.RObjectType.REAL,
parser.RObjectType.CPLX}:
# Return the internal array
value = convert_array(obj, self._convert_next, attrs=attrs)
elif obj.info.type == parser.RObjectType.STR:
# Convert the internal strings
value = [self._convert_next(o) for o in obj.value]
elif obj.info.type == parser.RObjectType.VEC:
# Convert the internal objects
value = convert_vector(obj, self._convert_next, attrs=attrs)
elif obj.info.type == parser.RObjectType.EXPR:
rexpression_list = convert_vector(
obj, self._convert_next, attrs=attrs)
assert isinstance(rexpression_list, list)
# Convert the internal objects returning a special object
value = RExpression(rexpression_list)
elif obj.info.type == parser.RObjectType.S4:
value = SimpleNamespace(**attrs)
elif obj.info.type == parser.RObjectType.EMPTYENV:
value = self.empty_environment
elif obj.info.type == parser.RObjectType.GLOBALENV:
value = self.global_environment
elif obj.info.type == parser.RObjectType.REF:
# Return the referenced value
value = self.references.get(id(obj.referenced_object))
# value = self.references[id(obj.referenced_object)]
if value is None:
reference_id = id(obj.referenced_object)
assert obj.referenced_object is not None
value = self._convert_next(obj.referenced_object)
elif obj.info.type == parser.RObjectType.NILVALUE:
value = None
else:
raise NotImplementedError(f"Type {obj.info.type} not implemented")
if obj.info.object:
classname = attrs["class"]
for i, c in enumerate(classname):
constructor = self.constructor_dict.get(c, None)
if constructor:
new_value = constructor(value, attrs)
else:
new_value = NotImplemented
if new_value is NotImplemented:
missing_msg = (f"Missing constructor for R class "
f"\"{c}\". ")
if len(classname) > (i + 1):
solution_msg = (f"The constructor for class "
f"\"{classname[i+1]}\" will be "
f"used instead."
)
else:
solution_msg = ("The underlying R object is "
"returned instead.")
warnings.warn(missing_msg + solution_msg,
stacklevel=1)
else:
value = new_value
break
self.references[reference_id] = value
return value
def convert(
data: Union[parser.RData, parser.RObject],
*args: Any,
**kwargs: Any,
) -> Any:
"""
Uses the default converter (:func:`SimpleConverter`) to convert the data.
Examples:
Parse one of the included examples, containing a vector
>>> import rdata
>>>
>>> parsed = rdata.parser.parse_file(
... rdata.TESTDATA_PATH / "test_vector.rda")
>>> converted = rdata.conversion.convert(parsed)
>>> converted
{'test_vector': array([1., 2., 3.])}
Parse another example, containing a dataframe
>>> import rdata
>>>
>>> parsed = rdata.parser.parse_file(
... rdata.TESTDATA_PATH / "test_dataframe.rda")
>>> converted = rdata.conversion.convert(parsed)
>>> converted
{'test_dataframe': class value
0 a 1
1 b 2
2 b 3}
"""
return SimpleConverter(*args, **kwargs).convert(data)
python-rdata-0.5/rdata/parser/ 0000775 0000000 0000000 00000000000 14137100362 0016354 5 ustar 00root root 0000000 0000000 python-rdata-0.5/rdata/parser/__init__.py 0000664 0000000 0000000 00000000232 14137100362 0020462 0 ustar 00root root 0000000 0000000 from ._parser import (
DEFAULT_ALTREP_MAP,
CharFlags,
RData,
RObject,
RObjectInfo,
RObjectType,
parse_data,
parse_file,
)
python-rdata-0.5/rdata/parser/_parser.py 0000664 0000000 0000000 00000073700 14137100362 0020370 0 ustar 00root root 0000000 0000000 from __future__ import annotations
import abc
import bz2
import enum
import gzip
import lzma
import os
import pathlib
import warnings
import xdrlib
from dataclasses import dataclass
from types import MappingProxyType
from typing import (
Any,
BinaryIO,
Callable,
List,
Mapping,
Optional,
Set,
TextIO,
Tuple,
Union,
)
import numpy as np
class FileTypes(enum.Enum):
"""
Type of file containing a R file.
"""
bzip2 = "bz2"
gzip = "gzip"
xz = "xz"
rdata_binary_v2 = "rdata version 2 (binary)"
rdata_binary_v3 = "rdata version 3 (binary)"
magic_dict = {
FileTypes.bzip2: b"\x42\x5a\x68",
FileTypes.gzip: b"\x1f\x8b",
FileTypes.xz: b"\xFD7zXZ\x00",
FileTypes.rdata_binary_v2: b"RDX2\n",
FileTypes.rdata_binary_v3: b"RDX3\n"
}
def file_type(data: memoryview) -> Optional[FileTypes]:
"""
Returns the type of the file.
"""
for filetype, magic in magic_dict.items():
if data[:len(magic)] == magic:
return filetype
return None
class RdataFormats(enum.Enum):
"""
Format of a R file.
"""
XDR = "XDR"
ASCII = "ASCII"
binary = "binary"
format_dict = {
RdataFormats.XDR: b"X\n",
RdataFormats.ASCII: b"A\n",
RdataFormats.binary: b"B\n",
}
def rdata_format(data: memoryview) -> Optional[RdataFormats]:
"""
Returns the format of the data.
"""
for format_type, magic in format_dict.items():
if data[:len(magic)] == magic:
return format_type
return None
class RObjectType(enum.Enum):
"""
Type of a R object.
"""
NIL = 0 # NULL
SYM = 1 # symbols
LIST = 2 # pairlists
CLO = 3 # closures
ENV = 4 # environments
PROM = 5 # promises
LANG = 6 # language objects
SPECIAL = 7 # special functions
BUILTIN = 8 # builtin functions
CHAR = 9 # internal character strings
LGL = 10 # logical vectors
INT = 13 # integer vectors
REAL = 14 # numeric vectors
CPLX = 15 # complex vectors
STR = 16 # character vectors
DOT = 17 # dot-dot-dot object
ANY = 18 # make “any” args work
VEC = 19 # list (generic vector)
EXPR = 20 # expression vector
BCODE = 21 # byte code
EXTPTR = 22 # external pointer
WEAKREF = 23 # weak reference
RAW = 24 # raw vector
S4 = 25 # S4 classes not of simple type
ALTREP = 238 # Alternative representations
EMPTYENV = 242 # Empty environment
GLOBALENV = 253 # Global environment
NILVALUE = 254 # NIL value
REF = 255 # Reference
class CharFlags(enum.IntFlag):
HAS_HASH = 1
BYTES = 1 << 1
LATIN1 = 1 << 2
UTF8 = 1 << 3
CACHED = 1 << 5
ASCII = 1 << 6
@dataclass
class RVersions():
"""
R versions.
"""
format: int
serialized: int
minimum: int
@dataclass
class RExtraInfo():
"""
Extra information.
Contains the default encoding (only in version 3).
"""
encoding: Optional[str] = None
@dataclass
class RObjectInfo():
"""
Internal attributes of a R object.
"""
type: RObjectType
object: bool
attributes: bool
tag: bool
gp: int
reference: int
@dataclass
class RObject():
"""
Representation of a R object.
"""
info: RObjectInfo
value: Any
attributes: Optional[RObject]
tag: Optional[RObject] = None
referenced_object: Optional[RObject] = None
def _str_internal(
self,
indent: int = 0,
used_references: Optional[Set[int]] = None
) -> str:
if used_references is None:
used_references = set()
string = ""
string += f"{' ' * indent}{self.info.type}\n"
if self.tag:
tag_string = self.tag._str_internal(indent + 4,
used_references.copy())
string += f"{' ' * (indent + 2)}tag:\n{tag_string}\n"
if self.info.reference:
assert self.referenced_object
reference_string = (f"{' ' * (indent + 4)}..."
if self.info.reference in used_references
else self.referenced_object._str_internal(
indent + 4, used_references.copy()))
string += (f"{' ' * (indent + 2)}reference: "
f"{self.info.reference}\n{reference_string}\n")
string += f"{' ' * (indent + 2)}value:\n"
if isinstance(self.value, RObject):
string += self.value._str_internal(indent + 4,
used_references.copy())
elif isinstance(self.value, tuple) or isinstance(self.value, list):
for elem in self.value:
string += elem._str_internal(indent + 4,
used_references.copy())
elif isinstance(self.value, np.ndarray):
string += " " * (indent + 4)
if len(self.value) > 4:
string += (f"[{self.value[0]}, {self.value[1]} ... "
f"{self.value[-2]}, {self.value[-1]}]\n")
else:
string += f"{self.value}\n"
else:
string += f"{' ' * (indent + 4)}{self.value}\n"
if(self.attributes):
attr_string = self.attributes._str_internal(
indent + 4,
used_references.copy())
string += f"{' ' * (indent + 2)}attributes:\n{attr_string}\n"
return string
def __str__(self) -> str:
return self._str_internal()
@dataclass
class RData():
"""
Data contained in a R file.
"""
versions: RVersions
extra: RExtraInfo
object: RObject
@dataclass
class EnvironmentValue():
"""
Value of an environment.
"""
locked: bool
enclosure: RObject
frame: RObject
hash_table: RObject
AltRepConstructor = Callable[
[RObject],
Tuple[RObjectInfo, Any],
]
AltRepConstructorMap = Mapping[bytes, AltRepConstructor]
def format_float_with_scipen(number: float, scipen: int) -> bytes:
fixed = np.format_float_positional(number, trim="-")
scientific = np.format_float_scientific(number, trim="-")
assert(isinstance(fixed, str))
assert(isinstance(scientific, str))
return (
scientific if len(fixed) - len(scientific) > scipen
else fixed
).encode()
def deferred_string_constructor(
state: RObject,
) -> Tuple[RObjectInfo, Any]:
new_info = RObjectInfo(
type=RObjectType.STR,
object=False,
attributes=False,
tag=False,
gp=0,
reference=0,
)
object_to_format = state.value[0].value
scipen = state.value[1].value
value = [
RObject(
info=RObjectInfo(
type=RObjectType.CHAR,
object=False,
attributes=False,
tag=False,
gp=CharFlags.ASCII,
reference=0,
),
value=format_float_with_scipen(num, scipen),
attributes=None,
tag=None,
referenced_object=None,
)
for num in object_to_format
]
return new_info, value
def compact_seq_constructor(
state: RObject,
*,
is_int: bool = False
) -> Tuple[RObjectInfo, Any]:
new_info = RObjectInfo(
type=RObjectType.INT if is_int else RObjectType.REAL,
object=False,
attributes=False,
tag=False,
gp=0,
reference=0,
)
start = state.value[1]
stop = state.value[0]
step = state.value[2]
if is_int:
start = int(start)
stop = int(stop)
step = int(step)
value = np.arange(start, stop, step)
return new_info, value
def compact_intseq_constructor(
state: RObject,
) -> Tuple[RObjectInfo, Any]:
return compact_seq_constructor(state, is_int=True)
def compact_realseq_constructor(
state: RObject,
) -> Tuple[RObjectInfo, Any]:
return compact_seq_constructor(state, is_int=False)
def wrap_constructor(
state: RObject,
) -> Tuple[RObjectInfo, Any]:
new_info = RObjectInfo(
type=state.value[0].info.type,
object=False,
attributes=False,
tag=False,
gp=0,
reference=0,
)
value = state.value[0].value
return new_info, value
default_altrep_map_dict: Mapping[bytes, AltRepConstructor] = {
b"deferred_string": deferred_string_constructor,
b"compact_intseq": compact_intseq_constructor,
b"compact_realseq": compact_realseq_constructor,
b"wrap_real": wrap_constructor,
b"wrap_string": wrap_constructor,
b"wrap_logical": wrap_constructor,
b"wrap_integer": wrap_constructor,
b"wrap_complex": wrap_constructor,
b"wrap_raw": wrap_constructor,
}
DEFAULT_ALTREP_MAP = MappingProxyType(default_altrep_map_dict)
class Parser(abc.ABC):
"""
Parser interface for a R file.
"""
def __init__(
self,
*,
expand_altrep: bool = True,
altrep_constructor_dict: AltRepConstructorMap = DEFAULT_ALTREP_MAP,
):
self.expand_altrep = expand_altrep
self.altrep_constructor_dict = altrep_constructor_dict
def parse_bool(self) -> bool:
"""
Parse a boolean.
"""
return bool(self.parse_int())
@abc.abstractmethod
def parse_int(self) -> int:
"""
Parse an integer.
"""
pass
@abc.abstractmethod
def parse_double(self) -> float:
"""
Parse a double.
"""
pass
def parse_complex(self) -> complex:
"""
Parse a complex number.
"""
return complex(self.parse_double(), self.parse_double())
@abc.abstractmethod
def parse_string(self, length: int) -> bytes:
"""
Parse a string.
"""
pass
def parse_all(self) -> RData:
"""
Parse all the file.
"""
versions = self.parse_versions()
extra_info = self.parse_extra_info(versions)
obj = self.parse_R_object()
return RData(versions, extra_info, obj)
def parse_versions(self) -> RVersions:
"""
Parse the versions header.
"""
format_version = self.parse_int()
r_version = self.parse_int()
minimum_r_version = self.parse_int()
if format_version not in [2, 3]:
raise NotImplementedError(
f"Format version {format_version} unsupported",
)
return RVersions(format_version, r_version, minimum_r_version)
def parse_extra_info(self, versions: RVersions) -> RExtraInfo:
"""
Parse the versions header.
"""
encoding = None
if versions.format >= 3:
encoding_len = self.parse_int()
encoding = self.parse_string(encoding_len).decode("ASCII")
extra_info = RExtraInfo(encoding)
return extra_info
def expand_altrep_to_object(
self,
info: RObject,
state: RObject,
) -> Tuple[RObjectInfo, Any]:
"""Expand alternative representation to normal object."""
assert info.info.type == RObjectType.LIST
class_sym = info.value[0]
while class_sym.info.type == RObjectType.REF:
class_sym = class_sym.referenced_object
assert class_sym.info.type == RObjectType.SYM
assert class_sym.value.info.type == RObjectType.CHAR
altrep_name = class_sym.value.value
assert isinstance(altrep_name, bytes)
constructor = self.altrep_constructor_dict[altrep_name]
return constructor(state)
def parse_R_object(
self,
reference_list: Optional[List[RObject]] = None
) -> RObject:
"""
Parse a R object.
"""
if reference_list is None:
# Index is 1-based, so we insert a dummy object
reference_list = []
info_int = self.parse_int()
info = parse_r_object_info(info_int)
tag = None
attributes = None
referenced_object = None
tag_read = False
attributes_read = False
add_reference = False
result = None
value: Any
if info.type == RObjectType.NIL:
value = None
elif info.type == RObjectType.SYM:
# Read Char
value = self.parse_R_object(reference_list)
# Symbols can be referenced
add_reference = True
elif info.type in [RObjectType.LIST, RObjectType.LANG]:
tag = None
if info.attributes:
attributes = self.parse_R_object(reference_list)
attributes_read = True
elif info.tag:
tag = self.parse_R_object(reference_list)
tag_read = True
# Read CAR and CDR
car = self.parse_R_object(reference_list)
cdr = self.parse_R_object(reference_list)
value = (car, cdr)
elif info.type == RObjectType.ENV:
result = RObject(
info=info,
tag=tag,
attributes=attributes,
value=None,
referenced_object=referenced_object,
)
reference_list.append(result)
locked = self.parse_bool()
enclosure = self.parse_R_object(reference_list)
frame = self.parse_R_object(reference_list)
hash_table = self.parse_R_object(reference_list)
attributes = self.parse_R_object(reference_list)
value = EnvironmentValue(
locked=locked,
enclosure=enclosure,
frame=frame,
hash_table=hash_table,
)
elif info.type == RObjectType.CHAR:
length = self.parse_int()
if length > 0:
value = self.parse_string(length=length)
elif length == 0:
value = b""
elif length == -1:
value = None
else:
raise NotImplementedError(
f"Length of CHAR cannot be {length}")
elif info.type == RObjectType.LGL:
length = self.parse_int()
value = np.empty(length, dtype=np.bool_)
for i in range(length):
value[i] = self.parse_bool()
elif info.type == RObjectType.INT:
length = self.parse_int()
value = np.empty(length, dtype=np.int64)
for i in range(length):
value[i] = self.parse_int()
elif info.type == RObjectType.REAL:
length = self.parse_int()
value = np.empty(length, dtype=np.double)
for i in range(length):
value[i] = self.parse_double()
elif info.type == RObjectType.CPLX:
length = self.parse_int()
value = np.empty(length, dtype=np.complex_)
for i in range(length):
value[i] = self.parse_complex()
elif info.type in [RObjectType.STR,
RObjectType.VEC, RObjectType.EXPR]:
length = self.parse_int()
value = [None] * length
for i in range(length):
value[i] = self.parse_R_object(reference_list)
elif info.type == RObjectType.S4:
value = None
elif info.type == RObjectType.ALTREP:
altrep_info = self.parse_R_object(reference_list)
altrep_state = self.parse_R_object(reference_list)
altrep_attr = self.parse_R_object(reference_list)
if self.expand_altrep:
info, value = self.expand_altrep_to_object(
info=altrep_info,
state=altrep_state,
)
attributes = altrep_attr
else:
value = (altrep_info, altrep_state, altrep_attr)
elif info.type == RObjectType.EMPTYENV:
value = None
elif info.type == RObjectType.GLOBALENV:
value = None
elif info.type == RObjectType.NILVALUE:
value = None
elif info.type == RObjectType.REF:
value = None
# Index is 1-based
referenced_object = reference_list[info.reference - 1]
else:
raise NotImplementedError(f"Type {info.type} not implemented")
if info.tag and not tag_read:
warnings.warn(f"Tag not implemented for type {info.type} "
"and ignored")
if info.attributes and not attributes_read:
attributes = self.parse_R_object(reference_list)
if result is None:
result = RObject(
info=info,
tag=tag,
attributes=attributes,
value=value,
referenced_object=referenced_object,
)
else:
result.info = info
result.attributes = attributes
result.value = value
result.referenced_object = referenced_object
if add_reference:
reference_list.append(result)
return result
class ParserXDR(Parser):
"""
Parser used when the integers and doubles are in XDR format.
"""
def __init__(
self,
data: memoryview,
position: int = 0,
*,
expand_altrep: bool = True,
altrep_constructor_dict: AltRepConstructorMap = DEFAULT_ALTREP_MAP,
) -> None:
super().__init__(
expand_altrep=expand_altrep,
altrep_constructor_dict=altrep_constructor_dict,
)
self.data = data
self.position = position
self.xdr_parser = xdrlib.Unpacker(data)
def parse_int(self) -> int:
self.xdr_parser.set_position(self.position)
result = self.xdr_parser.unpack_int()
self.position = self.xdr_parser.get_position()
return result
def parse_double(self) -> float:
self.xdr_parser.set_position(self.position)
result = self.xdr_parser.unpack_double()
self.position = self.xdr_parser.get_position()
return result
def parse_string(self, length: int) -> bytes:
result = self.data[self.position:(self.position + length)]
self.position += length
return bytes(result)
def parse_file(
file_or_path: Union[BinaryIO, TextIO, 'os.PathLike[Any]', str],
*,
expand_altrep: bool = True,
altrep_constructor_dict: AltRepConstructorMap = DEFAULT_ALTREP_MAP,
) -> RData:
"""
Parse a R file (.rda or .rdata).
Parameters:
file_or_path (file-like, str, bytes or path-like): File
in the R serialization format.
expand_altrep (bool): Wether to translate ALTREPs to normal objects.
altrep_constructor_dict: Dictionary mapping each ALTREP to
its constructor.
Returns:
RData: Data contained in the file (versions and object).
See Also:
:func:`parse_data`: Similar function that receives the data directly.
Examples:
Parse one of the included examples, containing a vector
>>> import rdata
>>>
>>> parsed = rdata.parser.parse_file(
... rdata.TESTDATA_PATH / "test_vector.rda")
>>> parsed
RData(versions=RVersions(format=2,
serialized=196610,
minimum=131840),
extra=RExtraInfo(encoding=None),
object=RObject(info=RObjectInfo(type=,
object=False,
attributes=False,
tag=True,
gp=0,
reference=0),
value=(RObject(info=RObjectInfo(type=,
object=False,
attributes=False,
tag=False,
gp=0,
reference=0),
value=array([1., 2., 3.]),
attributes=None,
tag=None,
referenced_object=None),
RObject(info=RObjectInfo(type=,
object=False,
attributes=False,
tag=False,
gp=0,
reference=0),
value=None,
attributes=None,
tag=None,
referenced_object=None)),
attributes=None,
tag=RObject(info=RObjectInfo(type=,
object=False,
attributes=False,
tag=False,
gp=0,
reference=0),
value=RObject(info=RObjectInfo(type=,
object=False,
attributes=False,
tag=False,
gp=64,
reference=0),
value=b'test_vector',
attributes=None,
tag=None,
referenced_object=None),
attributes=None,
tag=None,
referenced_object=None),
referenced_object=None))
"""
if isinstance(file_or_path, (os.PathLike, str)):
path = pathlib.Path(file_or_path)
data = path.read_bytes()
else:
# file is a pre-opened file
buffer: Optional[BinaryIO] = getattr(file_or_path, 'buffer', None)
if buffer is None:
assert isinstance(file_or_path, BinaryIO)
binary_file: BinaryIO = file_or_path
else:
binary_file = buffer
data = binary_file.read()
return parse_data(
data,
expand_altrep=expand_altrep,
altrep_constructor_dict=altrep_constructor_dict,
)
def parse_data(
data: bytes,
*,
expand_altrep: bool = True,
altrep_constructor_dict: AltRepConstructorMap = DEFAULT_ALTREP_MAP,
) -> RData:
"""
Parse the data of a R file, received as a sequence of bytes.
Parameters:
data (bytes): Data extracted of a R file.
expand_altrep (bool): Wether to translate ALTREPs to normal objects.
altrep_constructor_dict: Dictionary mapping each ALTREP to
its constructor.
Returns:
RData: Data contained in the file (versions and object).
See Also:
:func:`parse_file`: Similar function that parses a file directly.
Examples:
Parse one of the included examples, containing a vector
>>> import rdata
>>>
>>> with open(rdata.TESTDATA_PATH / "test_vector.rda", "rb") as f:
... parsed = rdata.parser.parse_data(f.read())
>>>
>>> parsed
RData(versions=RVersions(format=2,
serialized=196610,
minimum=131840),
extra=RExtraInfo(encoding=None),
object=RObject(info=RObjectInfo(type=,
object=False,
attributes=False,
tag=True,
gp=0,
reference=0),
value=(RObject(info=RObjectInfo(type=,
object=False,
attributes=False,
tag=False,
gp=0,
reference=0),
value=array([1., 2., 3.]),
attributes=None,
tag=None,
referenced_object=None),
RObject(info=RObjectInfo(type=,
object=False,
attributes=False,
tag=False,
gp=0,
reference=0),
value=None,
attributes=None,
tag=None,
referenced_object=None)),
attributes=None,
tag=RObject(info=RObjectInfo(type=,
object=False,
attributes=False,
tag=False,
gp=0,
reference=0),
value=RObject(info=RObjectInfo(type=,
object=False,
attributes=False,
tag=False,
gp=64,
reference=0),
value=b'test_vector',
attributes=None,
tag=None,
referenced_object=None),
attributes=None,
tag=None,
referenced_object=None),
referenced_object=None))
"""
view = memoryview(data)
filetype = file_type(view)
parse_function = (
parse_rdata_binary
if filetype in {
FileTypes.rdata_binary_v2,
FileTypes.rdata_binary_v3,
} else parse_data
)
if filetype is FileTypes.bzip2:
new_data = bz2.decompress(data)
elif filetype is FileTypes.gzip:
new_data = gzip.decompress(data)
elif filetype is FileTypes.xz:
new_data = lzma.decompress(data)
elif filetype in {FileTypes.rdata_binary_v2, FileTypes.rdata_binary_v3}:
view = view[len(magic_dict[filetype]):]
new_data = view
else:
raise NotImplementedError("Unknown file type")
return parse_function(
new_data, # type: ignore
expand_altrep=expand_altrep,
altrep_constructor_dict=altrep_constructor_dict,
)
def parse_rdata_binary(
data: memoryview,
expand_altrep: bool = True,
altrep_constructor_dict: AltRepConstructorMap = DEFAULT_ALTREP_MAP,
) -> RData:
"""
Select the appropiate parser and parse all the info.
"""
format_type = rdata_format(data)
if format_type:
data = data[len(format_dict[format_type]):]
if format_type is RdataFormats.XDR:
parser = ParserXDR(
data,
expand_altrep=expand_altrep,
altrep_constructor_dict=altrep_constructor_dict,
)
return parser.parse_all()
else:
raise NotImplementedError("Unknown file format")
def bits(data: int, start: int, stop: int) -> int:
"""
Read bits [start, stop) of an integer.
"""
count = stop - start
mask = ((1 << count) - 1) << start
bitvalue = data & mask
return bitvalue >> start
def is_special_r_object_type(r_object_type: RObjectType) -> bool:
"""
Check if a R type has a different serialization than the usual one.
"""
return (r_object_type is RObjectType.NILVALUE
or r_object_type is RObjectType.REF)
def parse_r_object_info(info_int: int) -> RObjectInfo:
"""
Parse the internal information of an object.
"""
type_exp = RObjectType(bits(info_int, 0, 8))
reference = 0
if is_special_r_object_type(type_exp):
object_flag = False
attributes = False
tag = False
gp = 0
else:
object_flag = bool(bits(info_int, 8, 9))
attributes = bool(bits(info_int, 9, 10))
tag = bool(bits(info_int, 10, 11))
gp = bits(info_int, 12, 28)
if type_exp == RObjectType.REF:
reference = bits(info_int, 8, 32)
return RObjectInfo(
type=type_exp,
object=object_flag,
attributes=attributes,
tag=tag,
gp=gp,
reference=reference
)
python-rdata-0.5/rdata/py.typed 0000664 0000000 0000000 00000000100 14137100362 0016546 0 ustar 00root root 0000000 0000000 # Marker file for PEP 561. The rdata package uses inline types. python-rdata-0.5/rdata/tests/ 0000775 0000000 0000000 00000000000 14137100362 0016222 5 ustar 00root root 0000000 0000000 python-rdata-0.5/rdata/tests/__init__.py 0000664 0000000 0000000 00000000000 14137100362 0020321 0 ustar 00root root 0000000 0000000 python-rdata-0.5/rdata/tests/data/ 0000775 0000000 0000000 00000000000 14137100362 0017133 5 ustar 00root root 0000000 0000000 python-rdata-0.5/rdata/tests/data/test_altrep_compact_intseq.rda 0000664 0000000 0000000 00000000176 14137100362 0025246 0 ustar 00root root 0000000 0000000 r0b```f`a`e`f2XCCt-XF'*I-.O))J-O-HL.+)N-ʾbd|*eYSb`q~d` b'c python-rdata-0.5/rdata/tests/data/test_altrep_compact_realseq.rda 0000664 0000000 0000000 00000000200 14137100362 0025363 0 ustar 00root root 0000000 0000000 r0b```f`a`e`f2XCCt-XF'.I-.O))J-O-HL./JM)N-JbdJYSb`q> 8 3ȩ python-rdata-0.5/rdata/tests/data/test_altrep_deferred_string.rda 0000664 0000000 0000000 00000000256 14137100362 0025402 0 ustar 00root root 0000000 0000000 r0b```f`a`e`f2XCCt-XF'.I-.O))J-OIMK-*JM/.)KJbdJYSb`q bN`; Y` ʷHq{)Geu8#P; 5 python-rdata-0.5/rdata/tests/data/test_altrep_wrap_logical.rda 0000664 0000000 0000000 00000000175 14137100362 0024677 0 ustar 00root root 0000000 0000000 r0b```f`a`e`f2XCCt-XF'(I-.O))J-//J,OLNʽbd