schema-salad-2.6.20171201034858/ 0000755 0001751 0001751 00000000000 13211573301 015551 5 ustar peter peter 0000000 0000000 schema-salad-2.6.20171201034858/README.rst 0000644 0001751 0001751 00000010145 13130233260 017236 0 ustar peter peter 0000000 0000000 |Build Status| |Build status|
.. |Build Status| image:: https://img.shields.io/travis/common-workflow-language/schema_salad/master.svg?label=unix%20build
:target: https://travis-ci.org/common-workflow-language/schema_salad
.. |Build status| image:: https://img.shields.io/appveyor/ci/mr-c/schema-salad/master.svg?label=windows%20build
:target: https://ci.appveyor.com/project/mr-c/schema-salad/branch/master
Schema Salad
------------
Salad is a schema language for describing JSON or YAML structured linked data
documents. Salad is based originally on JSON-LD_ and the Apache Avro_ data
serialization system.
Salad schema describes rules for preprocessing, structural validation, and link
checking for documents described by a Salad schema. Salad features for rich
data modeling such as inheritance, template specialization, object identifiers,
object references, documentation generation, and transformation to RDF_. Salad
provides a bridge between document and record oriented data modeling and the
Semantic Web.
Usage
-----
::
$ pip install schema_salad
$ schema-salad-tool
usage: schema-salad-tool [-h] [--rdf-serializer RDF_SERIALIZER]
[--print-jsonld-context | --print-doc | --print-rdfs | --print-avro | --print-rdf | --print-pre | --print-index | --print-metadata | --version]
[--strict | --non-strict]
[--verbose | --quiet | --debug]
schema [document]
$ python
>>> import schema_salad
To install from source::
git clone https://github.com/common-workflow-language/schema_salad
cd schema_salad
python setup.py install
Documentation
-------------
See the specification_ and the metaschema_ (salad schema for itself). For an
example application of Schema Salad see the Common Workflow Language_.
Rationale
---------
The JSON data model is an popular way to represent structured data. It is
attractive because of it's relative simplicity and is a natural fit with the
standard types of many programming languages. However, this simplicity comes
at the cost that basic JSON lacks expressive features useful for working with
complex data structures and document formats, such as schemas, object
references, and namespaces.
JSON-LD is a W3C standard providing a way to describe how to interpret a JSON
document as Linked Data by means of a "context". JSON-LD provides a powerful
solution for representing object references and namespaces in JSON based on
standard web URIs, but is not itself a schema language. Without a schema
providing a well defined structure, it is difficult to process an arbitrary
JSON-LD document as idiomatic JSON because there are many ways to express the
same data that are logically equivalent but structurally distinct.
Several schema languages exist for describing and validating JSON data, such as
JSON Schema and Apache Avro data serialization system, however none
understand linked data. As a result, to fully take advantage of JSON-LD to
build the next generation of linked data applications, one must maintain
separate JSON schema, JSON-LD context, RDF schema, and human documentation,
despite significant overlap of content and obvious need for these documents to
stay synchronized.
Schema Salad is designed to address this gap. It provides a schema language
and processing rules for describing structured JSON content permitting URI
resolution and strict document validation. The schema language supports linked
data through annotations that describe the linked data interpretation of the
content, enables generation of JSON-LD context and RDF schema, and production
of RDF triples by applying the JSON-LD context. The schema language also
provides for robust support of inline documentation.
.. _JSON-LD: http://json-ld.org
.. _Avro: http://avro.apache.org
.. _metaschema: https://github.com/common-workflow-language/schema_salad/blob/master/schema_salad/metaschema/metaschema.yml
.. _specification: http://www.commonwl.org/v1.0/SchemaSalad.html
.. _Language: https://github.com/common-workflow-language/common-workflow-language/blob/master/v1.0/CommandLineTool.yml
.. _RDF: https://www.w3.org/RDF/
schema-salad-2.6.20171201034858/gittaggers.py 0000644 0001751 0001751 00000001427 12706153124 020274 0 ustar peter peter 0000000 0000000 from setuptools.command.egg_info import egg_info
import subprocess
import time
class EggInfoFromGit(egg_info):
"""Tag the build with git commit timestamp.
If a build tag has already been set (e.g., "egg_info -b", building
from source package), leave it alone.
"""
def git_timestamp_tag(self):
gitinfo = subprocess.check_output(
['git', 'log', '--first-parent', '--max-count=1',
'--format=format:%ct', '.']).strip()
return time.strftime('.%Y%m%d%H%M%S', time.gmtime(int(gitinfo)))
def tags(self):
if self.tag_build is None:
try:
self.tag_build = self.git_timestamp_tag()
except (subprocess.CalledProcessError, OSError):
pass
return egg_info.tags(self)
schema-salad-2.6.20171201034858/MANIFEST.in 0000644 0001751 0001751 00000000457 13203345013 017313 0 ustar peter peter 0000000 0000000 include gittaggers.py Makefile
include schema_salad/tests/*
include schema_salad/tests/test_schema/*.md
include schema_salad/tests/test_schema/*.yml
include schema_salad/tests/test_schema/*.cwl
include schema_salad/metaschema/*
include schema_salad/tests/docimp/*
global-exclude *~
global-exclude *.pyc
schema-salad-2.6.20171201034858/schema_salad/ 0000755 0001751 0001751 00000000000 13211573301 020155 5 ustar peter peter 0000000 0000000 schema-salad-2.6.20171201034858/schema_salad/java_codegen.py 0000644 0001751 0001751 00000011641 13203345013 023135 0 ustar peter peter 0000000 0000000 import json
import sys
import six
from six.moves import urllib, cStringIO
import collections
import logging
from pkg_resources import resource_stream
from .utils import aslist, flatten
from . import schema
from .codegen_base import TypeDef, CodeGenBase, shortname
from typing import Text
import os
class JavaCodeGen(CodeGenBase):
def __init__(self, base):
# type: (Text) -> None
super(JavaCodeGen, self).__init__()
sp = urllib.parse.urlsplit(base)
self.package = ".".join(list(reversed(sp.netloc.split("."))) + sp.path.strip("/").split("/"))
self.outdir = self.package.replace(".", "/")
def prologue(self):
if not os.path.exists(self.outdir):
os.makedirs(self.outdir)
def safe_name(self, n):
avn = schema.avro_name(n)
if avn in ("class", "extends", "abstract"):
# reserved words
avn = avn+"_"
return avn
def interface_name(self, n):
return self.safe_name(n)
def begin_class(self, classname, extends, doc, abstract):
cls = self.interface_name(classname)
self.current_class = cls
self.current_class_is_abstract = abstract
self.current_loader = cStringIO()
self.current_fields = cStringIO()
with open(os.path.join(self.outdir, "%s.java" % cls), "w") as f:
if extends:
ext = "extends " + ", ".join(self.interface_name(e) for e in extends)
else:
ext = ""
f.write("""package {package};
public interface {cls} {ext} {{
""".
format(package=self.package,
cls=cls,
ext=ext))
if self.current_class_is_abstract:
return
with open(os.path.join(self.outdir, "%sImpl.java" % cls), "w") as f:
f.write("""package {package};
public class {cls}Impl implements {cls} {{
""".
format(package=self.package,
cls=cls,
ext=ext))
self.current_loader.write("""
void Load() {
""")
def end_class(self, classname):
with open(os.path.join(self.outdir, "%s.java" % self.current_class), "a") as f:
f.write("""
}
""")
if self.current_class_is_abstract:
return
self.current_loader.write("""
}
""")
with open(os.path.join(self.outdir, "%sImpl.java" % self.current_class), "a") as f:
f.write(self.current_fields.getvalue())
f.write(self.current_loader.getvalue())
f.write("""
}
""")
prims = {
u"http://www.w3.org/2001/XMLSchema#string": TypeDef("String", "Support.StringLoader()"),
u"http://www.w3.org/2001/XMLSchema#int": TypeDef("Integer", "Support.IntLoader()"),
u"http://www.w3.org/2001/XMLSchema#long": TypeDef("Long", "Support.LongLoader()"),
u"http://www.w3.org/2001/XMLSchema#float": TypeDef("Float", "Support.FloatLoader()"),
u"http://www.w3.org/2001/XMLSchema#double": TypeDef("Double", "Support.DoubleLoader()"),
u"http://www.w3.org/2001/XMLSchema#boolean": TypeDef("Boolean", "Support.BoolLoader()"),
u"https://w3id.org/cwl/salad#null": TypeDef("null_type", "Support.NullLoader()"),
u"https://w3id.org/cwl/salad#Any": TypeDef("Any_type", "Support.AnyLoader()")
}
def type_loader(self, t):
if isinstance(t, list) and len(t) == 2:
if t[0] == "https://w3id.org/cwl/salad#null":
t = t[1]
if isinstance(t, basestring):
if t in self.prims:
return self.prims[t]
return TypeDef("Object", "")
def declare_field(self, name, typedef, doc, optional):
fieldname = self.safe_name(name)
with open(os.path.join(self.outdir, "%s.java" % self.current_class), "a") as f:
f.write("""
{type} get{capfieldname}();
""".
format(fieldname=fieldname,
capfieldname=fieldname[0].upper() + fieldname[1:],
type=typedef.name))
if self.current_class_is_abstract:
return
self.current_fields.write("""
private {type} {fieldname};
public {type} get{capfieldname}() {{
return this.{fieldname};
}}
""".
format(fieldname=fieldname,
capfieldname=fieldname[0].upper() + fieldname[1:],
type=typedef.name))
self.current_loader.write("""
this.{fieldname} = null; // TODO: loaders
""".
format(fieldname=fieldname))
def declare_id_field(self, name, typedef, doc):
pass
def uri_loader(self, inner, scoped_id, vocab_term, refScope):
return inner
def idmap_loader(self, field, inner, mapSubject, mapPredicate):
return inner
def typedsl_loader(self, inner, refScope):
return inner
def epilogue(self, rootLoader):
pass
schema-salad-2.6.20171201034858/schema_salad/ref_resolver.py 0000644 0001751 0001751 00000133224 13210150510 023221 0 ustar peter peter 0000000 0000000 from __future__ import absolute_import
import sys
import os
import json
import hashlib
import logging
import collections
from io import open
import six
from six.moves import range
from six.moves import urllib
from six import StringIO
import re
import copy
from . import validate
from .utils import aslist, flatten
from .sourceline import SourceLine, add_lc_filename, relname
import requests
from cachecontrol.wrapper import CacheControl
from cachecontrol.caches import FileCache
import ruamel.yaml as yaml
from ruamel.yaml.comments import CommentedSeq, CommentedMap
import rdflib
from rdflib import Graph
from rdflib.namespace import RDF, RDFS, OWL
from rdflib.plugins.parsers.notation3 import BadSyntax
import xml.sax
from typing import (cast, Any, AnyStr, Callable, Dict, List, Iterable,
Optional, Set, Text, Tuple, TypeVar, Union)
_logger = logging.getLogger("salad")
ContextType = Dict[six.text_type, Union[Dict, six.text_type, Iterable[six.text_type]]]
DocumentType = TypeVar('DocumentType', CommentedSeq, CommentedMap)
DocumentOrStrType = TypeVar(
'DocumentOrStrType', CommentedSeq, CommentedMap, six.text_type)
_re_drive = re.compile(r"/([a-zA-Z]):")
def file_uri(path, split_frag=False): # type: (str, bool) -> str
if path.startswith("file://"):
return path
if split_frag:
pathsp = path.split("#", 2)
frag = "#" + urllib.parse.quote(str(pathsp[1])) if len(pathsp) == 2 else ""
urlpath = urllib.request.pathname2url(str(pathsp[0]))
else:
urlpath = urllib.request.pathname2url(path)
frag = ""
if urlpath.startswith("//"):
return "file:%s%s" % (urlpath, frag)
else:
return "file://%s%s" % (urlpath, frag)
def uri_file_path(url): # type: (str) -> str
split = urllib.parse.urlsplit(url)
if split.scheme == "file":
return urllib.request.url2pathname(
str(split.path)) + ("#" + urllib.parse.unquote(str(split.fragment))
if bool(split.fragment) else "")
else:
raise ValueError("Not a file URI")
class NormDict(CommentedMap):
def __init__(self, normalize=six.text_type): # type: (Callable) -> None
super(NormDict, self).__init__()
self.normalize = normalize
def __getitem__(self, key): # type: (Any) -> Any
return super(NormDict, self).__getitem__(self.normalize(key))
def __setitem__(self, key, value): # type: (Any, Any) -> Any
return super(NormDict, self).__setitem__(self.normalize(key), value)
def __delitem__(self, key): # type: (Any) -> Any
return super(NormDict, self).__delitem__(self.normalize(key))
def __contains__(self, key): # type: (Any) -> Any
return super(NormDict, self).__contains__(self.normalize(key))
def merge_properties(a, b): # type: (List[Any], List[Any]) -> Dict[Any, Any]
c = {}
for i in a:
if i not in b:
c[i] = a[i]
for i in b:
if i not in a:
c[i] = b[i]
for i in a:
if i in b:
c[i] = aslist(a[i]) + aslist(b[i])
return c
def SubLoader(loader): # type: (Loader) -> Loader
return Loader(loader.ctx, schemagraph=loader.graph,
foreign_properties=loader.foreign_properties, idx=loader.idx,
cache=loader.cache, fetcher_constructor=loader.fetcher_constructor,
skip_schemas=loader.skip_schemas)
class Fetcher(object):
def fetch_text(self, url): # type: (Text) -> Text
raise NotImplementedError()
def check_exists(self, url): # type: (Text) -> bool
raise NotImplementedError()
def urljoin(self, base_url, url): # type: (Text, Text) -> Text
raise NotImplementedError()
class DefaultFetcher(Fetcher):
def __init__(self,
cache, # type: Dict[Text, Union[Text, bool]]
session # type: Optional[requests.sessions.Session]
): # type: (...) -> None
self.cache = cache
self.session = session
def fetch_text(self, url):
# type: (Text) -> Text
if url in self.cache and self.cache[url] is not True:
# treat "True" as a placeholder that indicates something exists but
# not necessarily what its contents is.
return cast(Text, self.cache[url])
split = urllib.parse.urlsplit(url)
scheme, path = split.scheme, split.path
if scheme in [u'http', u'https'] and self.session is not None:
try:
resp = self.session.get(url)
resp.raise_for_status()
except Exception as e:
raise RuntimeError(url, e)
return resp.text
elif scheme == 'file':
try:
# On Windows, url.path will be /drive:/path ; on Unix systems,
# /path. As we want drive:/path instead of /drive:/path on Windows,
# remove the leading /.
if os.path.isabs(path[1:]): # checking if pathis valid after removing front / or not
path = path[1:]
with open(urllib.request.url2pathname(str(path)), encoding='utf-8') as fp:
return fp.read()
except (OSError, IOError) as e:
if e.filename == path:
raise RuntimeError(six.text_type(e))
else:
raise RuntimeError('Error reading %s: %s' % (url, e))
else:
raise ValueError('Unsupported scheme in url: %s' % url)
def check_exists(self, url): # type: (Text) -> bool
if url in self.cache:
return True
split = urllib.parse.urlsplit(url)
scheme, path = split.scheme, split.path
if scheme in [u'http', u'https'] and self.session is not None:
try:
resp = self.session.head(url)
resp.raise_for_status()
except Exception as e:
return False
self.cache[url] = True
return True
elif scheme == 'file':
return os.path.exists(urllib.request.url2pathname(str(path)))
else:
raise ValueError('Unsupported scheme in url: %s' % url)
def urljoin(self, base_url, url): # type: (Text, Text) -> Text
basesplit = urllib.parse.urlsplit(base_url)
split = urllib.parse.urlsplit(url)
if (basesplit.scheme and basesplit.scheme != "file" and split.scheme == "file"):
raise ValueError("Not resolving potential remote exploit %s from base %s" % (url, base_url))
if sys.platform == 'win32':
if (base_url == url):
return url
basesplit = urllib.parse.urlsplit(base_url)
# note that below might split
# "C:" with "C" as URI scheme
split = urllib.parse.urlsplit(url)
has_drive = split.scheme and len(split.scheme) == 1
if basesplit.scheme == "file":
# Special handling of relative file references on Windows
# as urllib seems to not be quite up to the job
# netloc MIGHT appear in equivalents of UNC Strings
# \\server1.example.com\path as
# file:///server1.example.com/path
# https://tools.ietf.org/html/rfc8089#appendix-E.3.2
# (TODO: test this)
netloc = split.netloc or basesplit.netloc
# Check if url is a local path like "C:/Users/fred"
# or actually an absolute URI like http://example.com/fred
if has_drive:
# Assume split.scheme is actually a drive, e.g. "C:"
# so we'll recombine into a path
path_with_drive = urllib.parse.urlunsplit((split.scheme, '', split.path,'', ''))
# Compose new file:/// URI with path_with_drive
# .. carrying over any #fragment (?query just in case..)
return urllib.parse.urlunsplit(("file", netloc,
path_with_drive, split.query, split.fragment))
if (not split.scheme and not netloc and
split.path and split.path.startswith("/")):
# Relative - but does it have a drive?
base_drive = _re_drive.match(basesplit.path)
drive = _re_drive.match(split.path)
if base_drive and not drive:
# Keep drive letter from base_url
# https://tools.ietf.org/html/rfc8089#appendix-E.2.1
# e.g. urljoin("file:///D:/bar/a.txt", "/foo/b.txt") == file:///D:/foo/b.txt
path_with_drive = "/%s:%s" % (base_drive.group(1), split.path)
return urllib.parse.urlunsplit(("file", netloc, path_with_drive,
split.query, split.fragment))
# else: fall-through to resolve as relative URI
elif has_drive:
# Base is http://something but url is C:/something - which urllib would wrongly
# resolve as an absolute path that could later be used to access local files
raise ValueError("Not resolving potential remote exploit %s from base %s" % (url, base_url))
return urllib.parse.urljoin(base_url, url)
class Loader(object):
def __init__(self,
ctx, # type: ContextType
schemagraph=None, # type: rdflib.graph.Graph
foreign_properties=None, # type: Set[Text]
idx=None, # type: Dict[Text, Union[CommentedMap, CommentedSeq, Text, None]]
cache=None, # type: Dict[Text, Any]
session=None, # type: requests.sessions.Session
fetcher_constructor=None, # type: Callable[[Dict[Text, Union[Text, bool]], requests.sessions.Session], Fetcher]
skip_schemas=None # type: bool
):
# type: (...) -> None
normalize = lambda url: urllib.parse.urlsplit(url).geturl()
if idx is not None:
self.idx = idx
else:
self.idx = NormDict(normalize)
self.ctx = {} # type: ContextType
if schemagraph is not None:
self.graph = schemagraph
else:
self.graph = rdflib.graph.Graph()
if foreign_properties is not None:
self.foreign_properties = foreign_properties
else:
self.foreign_properties = set()
if cache is not None:
self.cache = cache
else:
self.cache = {}
if skip_schemas is not None:
self.skip_schemas = skip_schemas
else:
self.skip_schemas = False
if session is None:
if "HOME" in os.environ:
self.session = CacheControl(
requests.Session(),
cache=FileCache(os.path.join(os.environ["HOME"], ".cache", "salad")))
elif "TMP" in os.environ:
self.session = CacheControl(
requests.Session(),
cache=FileCache(os.path.join(os.environ["TMP"], ".cache", "salad")))
else:
self.session = CacheControl(
requests.Session(),
cache=FileCache("/tmp", ".cache", "salad"))
else:
self.session = session
if fetcher_constructor is not None:
self.fetcher_constructor = fetcher_constructor
else:
self.fetcher_constructor = DefaultFetcher
self.fetcher = self.fetcher_constructor(self.cache, self.session)
self.fetch_text = self.fetcher.fetch_text
self.check_exists = self.fetcher.check_exists
self.url_fields = set() # type: Set[Text]
self.scoped_ref_fields = {} # type: Dict[Text, int]
self.vocab_fields = set() # type: Set[Text]
self.identifiers = [] # type: List[Text]
self.identity_links = set() # type: Set[Text]
self.standalone = None # type: Optional[Set[Text]]
self.nolinkcheck = set() # type: Set[Text]
self.vocab = {} # type: Dict[Text, Text]
self.rvocab = {} # type: Dict[Text, Text]
self.idmap = {} # type: Dict[Text, Any]
self.mapPredicate = {} # type: Dict[Text, Text]
self.type_dsl_fields = set() # type: Set[Text]
self.add_context(ctx)
def expand_url(self,
url, # type: Text
base_url, # type: Text
scoped_id=False, # type: bool
vocab_term=False, # type: bool
scoped_ref=None # type: int
):
# type: (...) -> Text
if url in (u"@id", u"@type"):
return url
if vocab_term and url in self.vocab:
return url
if bool(self.vocab) and u":" in url:
prefix = url.split(u":")[0]
if prefix in self.vocab:
url = self.vocab[prefix] + url[len(prefix) + 1:]
split = urllib.parse.urlsplit(url)
if ((bool(split.scheme) and split.scheme in [u'http', u'https', u'file']) or url.startswith(u"$(")
or url.startswith(u"${")):
pass
elif scoped_id and not bool(split.fragment):
splitbase = urllib.parse.urlsplit(base_url)
frg = u""
if bool(splitbase.fragment):
frg = splitbase.fragment + u"/" + split.path
else:
frg = split.path
pt = splitbase.path if splitbase.path != '' else "/"
url = urllib.parse.urlunsplit(
(splitbase.scheme, splitbase.netloc, pt, splitbase.query, frg))
elif scoped_ref is not None and not split.fragment:
pass
else:
url = self.fetcher.urljoin(base_url, url)
if vocab_term and url in self.rvocab:
return self.rvocab[url]
else:
return url
def _add_properties(self, s): # type: (Text) -> None
for _, _, rng in self.graph.triples((s, RDFS.range, None)):
literal = ((six.text_type(rng).startswith(
u"http://www.w3.org/2001/XMLSchema#") and
not six.text_type(rng) == u"http://www.w3.org/2001/XMLSchema#anyURI")
or six.text_type(rng) ==
u"http://www.w3.org/2000/01/rdf-schema#Literal")
if not literal:
self.url_fields.add(six.text_type(s))
self.foreign_properties.add(six.text_type(s))
def add_namespaces(self, ns): # type: (Dict[Text, Text]) -> None
self.vocab.update(ns)
def add_schemas(self, ns, base_url):
# type: (Union[List[Text], Text], Text) -> None
if self.skip_schemas:
return
for sch in aslist(ns):
try:
fetchurl = self.fetcher.urljoin(base_url, sch)
if fetchurl not in self.cache or self.cache[fetchurl] is True:
_logger.debug("Getting external schema %s", fetchurl)
content = self.fetch_text(fetchurl)
self.cache[fetchurl] = rdflib.graph.Graph()
for fmt in ['xml', 'turtle', 'rdfa']:
try:
self.cache[fetchurl].parse(data=content, format=fmt, publicID=str(fetchurl))
self.graph += self.cache[fetchurl]
break
except xml.sax.SAXParseException:
pass
except TypeError:
pass
except BadSyntax:
pass
except Exception as e:
_logger.warn("Could not load extension schema %s: %s", fetchurl, e)
for s, _, _ in self.graph.triples((None, RDF.type, RDF.Property)):
self._add_properties(s)
for s, _, o in self.graph.triples((None, RDFS.subPropertyOf, None)):
self._add_properties(s)
self._add_properties(o)
for s, _, _ in self.graph.triples((None, RDFS.range, None)):
self._add_properties(s)
for s, _, _ in self.graph.triples((None, RDF.type, OWL.ObjectProperty)):
self._add_properties(s)
for s, _, _ in self.graph.triples((None, None, None)):
self.idx[six.text_type(s)] = None
def add_context(self, newcontext, baseuri=""):
# type: (ContextType, Text) -> None
if bool(self.vocab):
raise validate.ValidationException(
"Refreshing context that already has stuff in it")
self.url_fields = set(("$schemas",))
self.scoped_ref_fields = {}
self.vocab_fields = set()
self.identifiers = []
self.identity_links = set()
self.standalone = set()
self.nolinkcheck = set()
self.idmap = {}
self.mapPredicate = {}
self.vocab = {}
self.rvocab = {}
self.type_dsl_fields = set()
self.ctx.update(_copy_dict_without_key(newcontext, u"@context"))
_logger.debug("ctx is %s", self.ctx)
for key, value in self.ctx.items():
if value == u"@id":
self.identifiers.append(key)
self.identity_links.add(key)
elif isinstance(value, dict) and value.get(u"@type") == u"@id":
self.url_fields.add(key)
if u"refScope" in value:
self.scoped_ref_fields[key] = value[u"refScope"]
if value.get(u"identity", False):
self.identity_links.add(key)
elif isinstance(value, dict) and value.get(u"@type") == u"@vocab":
self.url_fields.add(key)
self.vocab_fields.add(key)
if u"refScope" in value:
self.scoped_ref_fields[key] = value[u"refScope"]
if value.get(u"typeDSL"):
self.type_dsl_fields.add(key)
if isinstance(value, dict) and value.get(u"noLinkCheck"):
self.nolinkcheck.add(key)
if isinstance(value, dict) and value.get(u"mapSubject"):
self.idmap[key] = value[u"mapSubject"]
if isinstance(value, dict) and value.get(u"mapPredicate"):
self.mapPredicate[key] = value[u"mapPredicate"]
if isinstance(value, dict) and u"@id" in value:
self.vocab[key] = value[u"@id"]
elif isinstance(value, six.string_types):
self.vocab[key] = value
for k, v in self.vocab.items():
self.rvocab[self.expand_url(v, u"", scoped_id=False)] = k
self.identifiers.sort()
_logger.debug("identifiers is %s", self.identifiers)
_logger.debug("identity_links is %s", self.identity_links)
_logger.debug("url_fields is %s", self.url_fields)
_logger.debug("vocab_fields is %s", self.vocab_fields)
_logger.debug("vocab is %s", self.vocab)
def resolve_ref(self,
ref, # type: Union[CommentedMap, CommentedSeq, Text]
base_url=None, # type: Text
checklinks=True # type: bool
):
# type: (...) -> Tuple[Union[CommentedMap, CommentedSeq, Text, None], Dict[Text, Any]]
lref = ref # type: Union[CommentedMap, CommentedSeq, Text, None]
obj = None # type: Optional[CommentedMap]
resolved_obj = None # type: Optional[Union[CommentedMap, CommentedSeq, Text]]
inc = False
mixin = None # type: Optional[Dict[Text, Any]]
if not base_url:
base_url = file_uri(os.getcwd()) + "/"
sl = SourceLine(obj, None, ValueError)
# If `ref` is a dict, look for special directives.
if isinstance(lref, CommentedMap):
obj = lref
if "$import" in obj:
sl = SourceLine(obj, "$import", RuntimeError)
if len(obj) == 1:
lref = obj[u"$import"]
obj = None
else:
raise sl.makeError(
u"'$import' must be the only field in %s"
% (six.text_type(obj)))
elif "$include" in obj:
sl = SourceLine(obj, "$include", RuntimeError)
if len(obj) == 1:
lref = obj[u"$include"]
inc = True
obj = None
else:
raise sl.makeError(
u"'$include' must be the only field in %s"
% (six.text_type(obj)))
elif "$mixin" in obj:
sl = SourceLine(obj, "$mixin", RuntimeError)
lref = obj[u"$mixin"]
mixin = obj
obj = None
else:
lref = None
for identifier in self.identifiers:
if identifier in obj:
lref = obj[identifier]
break
if not lref:
raise sl.makeError(
u"Object `%s` does not have identifier field in %s"
% (relname(obj), self.identifiers))
if not isinstance(lref, (str, six.text_type)):
raise ValueError(u"Expected CommentedMap or string, got %s: `%s`"
% (type(lref), six.text_type(lref)))
if isinstance(lref, (str, six.text_type)) and os.sep == "\\":
# Convert Windows path separator in ref
lref = lref.replace("\\", "/")
url = self.expand_url(lref, base_url, scoped_id=(obj is not None))
# Has this reference been loaded already?
if url in self.idx and (not mixin):
return self.idx[url], {}
sl.raise_type = RuntimeError
with sl:
# "$include" directive means load raw text
if inc:
return self.fetch_text(url), {}
doc = None
if isinstance(obj, collections.MutableMapping):
for identifier in self.identifiers:
obj[identifier] = url
doc_url = url
else:
# Load structured document
doc_url, frg = urllib.parse.urldefrag(url)
if doc_url in self.idx and (not mixin):
# If the base document is in the index, it was already loaded,
# so if we didn't find the reference earlier then it must not
# exist.
raise validate.ValidationException(
u"Reference `#%s` not found in file `%s`."
% (frg, doc_url))
doc = self.fetch(doc_url, inject_ids=(not mixin))
# Recursively expand urls and resolve directives
if bool(mixin):
doc = copy.deepcopy(doc)
doc.update(mixin)
del doc["$mixin"]
resolved_obj, metadata = self.resolve_all(
doc, base_url, file_base=doc_url, checklinks=checklinks)
else:
resolved_obj, metadata = self.resolve_all(
doc if doc else obj, doc_url, checklinks=checklinks)
# Requested reference should be in the index now, otherwise it's a bad
# reference
if not bool(mixin):
if url in self.idx:
resolved_obj = self.idx[url]
else:
raise RuntimeError(
"Reference `%s` is not in the index. Index contains:\n %s"
% (url, "\n ".join(self.idx)))
if isinstance(resolved_obj, CommentedMap):
if u"$graph" in resolved_obj:
metadata = _copy_dict_without_key(resolved_obj, u"$graph")
return resolved_obj[u"$graph"], metadata
else:
return resolved_obj, metadata
else:
return resolved_obj, metadata
def _resolve_idmap(self,
document, # type: CommentedMap
loader # type: Loader
):
# type: (...) -> None
# Convert fields with mapSubject into lists
# use mapPredicate if the mapped value isn't a dict.
for idmapField in loader.idmap:
if (idmapField in document):
idmapFieldValue = document[idmapField]
if (isinstance(idmapFieldValue, dict)
and "$import" not in idmapFieldValue
and "$include" not in idmapFieldValue):
ls = CommentedSeq()
for k in sorted(idmapFieldValue.keys()):
val = idmapFieldValue[k]
v = None # type: Optional[CommentedMap]
if not isinstance(val, CommentedMap):
if idmapField in loader.mapPredicate:
v = CommentedMap(
((loader.mapPredicate[idmapField], val),))
v.lc.add_kv_line_col(
loader.mapPredicate[idmapField],
document[idmapField].lc.data[k])
v.lc.filename = document.lc.filename
else:
raise validate.ValidationException(
"mapSubject '%s' value '%s' is not a dict"
"and does not have a mapPredicate", k, v)
else:
v = val
v[loader.idmap[idmapField]] = k
v.lc.add_kv_line_col(loader.idmap[idmapField],
document[idmapField].lc.data[k])
v.lc.filename = document.lc.filename
ls.lc.add_kv_line_col(
len(ls), document[idmapField].lc.data[k])
ls.lc.filename = document.lc.filename
ls.append(v)
document[idmapField] = ls
typeDSLregex = re.compile(u"^([^[?]+)(\[\])?(\?)?$")
def _type_dsl(self,
t, # type: Union[Text, Dict, List]
lc,
filename):
# type: (...) -> Union[Text, Dict[Text, Text], List[Union[Text, Dict[Text, Text]]]]
if not isinstance(t, (str, six.text_type)):
return t
m = Loader.typeDSLregex.match(t)
if not m:
return t
first = m.group(1)
second = third = None
if bool(m.group(2)):
second = CommentedMap((("type", "array"),
("items", first)))
second.lc.add_kv_line_col("type", lc)
second.lc.add_kv_line_col("items", lc)
second.lc.filename = filename
if bool(m.group(3)):
third = CommentedSeq([u"null", second or first])
third.lc.add_kv_line_col(0, lc)
third.lc.add_kv_line_col(1, lc)
third.lc.filename = filename
return third or second or first
def _resolve_type_dsl(self,
document, # type: CommentedMap
loader # type: Loader
):
# type: (...) -> None
for d in loader.type_dsl_fields:
if d in document:
datum2 = datum = document[d]
if isinstance(datum, (str, six.text_type)):
datum2 = self._type_dsl(datum, document.lc.data[
d], document.lc.filename)
elif isinstance(datum, CommentedSeq):
datum2 = CommentedSeq()
for n, t in enumerate(datum):
datum2.lc.add_kv_line_col(
len(datum2), datum.lc.data[n])
datum2.append(self._type_dsl(
t, datum.lc.data[n], document.lc.filename))
if isinstance(datum2, CommentedSeq):
datum3 = CommentedSeq()
seen = [] # type: List[Text]
for i, item in enumerate(datum2):
if isinstance(item, CommentedSeq):
for j, v in enumerate(item):
if v not in seen:
datum3.lc.add_kv_line_col(
len(datum3), item.lc.data[j])
datum3.append(v)
seen.append(v)
else:
if item not in seen:
datum3.lc.add_kv_line_col(
len(datum3), datum2.lc.data[i])
datum3.append(item)
seen.append(item)
document[d] = datum3
else:
document[d] = datum2
def _resolve_identifier(self, document, loader, base_url):
# type: (CommentedMap, Loader, Text) -> Text
# Expand identifier field (usually 'id') to resolve scope
for identifer in loader.identifiers:
if identifer in document:
if isinstance(document[identifer], six.string_types):
document[identifer] = loader.expand_url(
document[identifer], base_url, scoped_id=True)
if (document[identifer] not in loader.idx
or isinstance(
loader.idx[document[identifer]], six.string_types)):
loader.idx[document[identifer]] = document
base_url = document[identifer]
else:
raise validate.ValidationException(
"identifier field '%s' must be a string"
% (document[identifer]))
return base_url
def _resolve_identity(self, document, loader, base_url):
# type: (Dict[Text, List[Text]], Loader, Text) -> None
# Resolve scope for identity fields (fields where the value is the
# identity of a standalone node, such as enum symbols)
for identifer in loader.identity_links:
if identifer in document and isinstance(document[identifer], list):
for n, v in enumerate(document[identifer]):
if isinstance(document[identifer][n], six.string_types):
document[identifer][n] = loader.expand_url(
document[identifer][n], base_url, scoped_id=True)
if document[identifer][n] not in loader.idx:
loader.idx[document[identifer][
n]] = document[identifer][n]
def _normalize_fields(self, document, loader):
# type: (Dict[Text, Text], Loader) -> None
# Normalize fields which are prefixed or full URIn to vocabulary terms
for d in list(document.keys()):
d2 = loader.expand_url(d, u"", scoped_id=False, vocab_term=True)
if d != d2:
document[d2] = document[d]
del document[d]
def _resolve_uris(self,
document, # type: Dict[Text, Union[Text, List[Text]]]
loader, # type: Loader
base_url # type: Text
):
# type: (...) -> None
# Resolve remaining URLs based on document base
for d in loader.url_fields:
if d in document:
datum = document[d]
if isinstance(datum, (str, six.text_type)):
document[d] = loader.expand_url(
datum, base_url, scoped_id=False,
vocab_term=(d in loader.vocab_fields),
scoped_ref=self.scoped_ref_fields.get(d))
elif isinstance(datum, list):
for i, url in enumerate(datum):
if isinstance(url, (str, six.text_type)):
datum[i] = loader.expand_url(
url, base_url, scoped_id=False,
vocab_term=(d in loader.vocab_fields),
scoped_ref=self.scoped_ref_fields.get(d))
def resolve_all(self,
document, # type: Union[CommentedMap, CommentedSeq]
base_url, # type: Text
file_base=None, # type: Text
checklinks=True # type: bool
):
# type: (...) -> Tuple[Union[CommentedMap, CommentedSeq, Text, None], Dict[Text, Any]]
loader = self
metadata = CommentedMap() # type: CommentedMap
if file_base is None:
file_base = base_url
if isinstance(document, CommentedMap):
# Handle $import and $include
if (u'$import' in document or u'$include' in document):
return self.resolve_ref(
document, base_url=file_base, checklinks=checklinks)
elif u'$mixin' in document:
return self.resolve_ref(
document, base_url=base_url, checklinks=checklinks)
elif isinstance(document, CommentedSeq):
pass
elif isinstance(document, (list, dict)):
raise Exception("Expected CommentedMap or CommentedSeq, got %s: `%s`" % (type(document), document))
else:
return (document, metadata)
newctx = None # type: Optional[Loader]
if isinstance(document, CommentedMap):
# Handle $base, $profile, $namespaces, $schemas and $graph
if u"$base" in document:
base_url = document[u"$base"]
if u"$profile" in document:
if newctx is None:
newctx = SubLoader(self)
prof = self.fetch(document[u"$profile"])
newctx.add_namespaces(document.get(u"$namespaces", {}))
newctx.add_schemas(document.get(
u"$schemas", []), document[u"$profile"])
if u"$namespaces" in document:
if newctx is None:
newctx = SubLoader(self)
newctx.add_namespaces(document[u"$namespaces"])
if u"$schemas" in document:
if newctx is None:
newctx = SubLoader(self)
newctx.add_schemas(document[u"$schemas"], file_base)
if newctx is not None:
loader = newctx
if u"$graph" in document:
metadata = _copy_dict_without_key(document, u"$graph")
document = document[u"$graph"]
resolved_metadata = loader.resolve_all(
metadata, base_url, file_base=file_base,
checklinks=False)[0]
if isinstance(resolved_metadata, dict):
metadata = resolved_metadata
else:
raise validate.ValidationException(
"Validation error, metadata must be dict: %s"
% (resolved_metadata))
if isinstance(document, CommentedMap):
self._normalize_fields(document, loader)
self._resolve_idmap(document, loader)
self._resolve_type_dsl(document, loader)
base_url = self._resolve_identifier(document, loader, base_url)
self._resolve_identity(document, loader, base_url)
self._resolve_uris(document, loader, base_url)
try:
for key, val in document.items():
document[key], _ = loader.resolve_all(
val, base_url, file_base=file_base, checklinks=False)
except validate.ValidationException as v:
_logger.warn("loader is %s", id(loader), exc_info=True)
raise validate.ValidationException("(%s) (%s) Validation error in field %s:\n%s" % (
id(loader), file_base, key, validate.indent(six.text_type(v))))
elif isinstance(document, CommentedSeq):
i = 0
try:
while i < len(document):
val = document[i]
if isinstance(val, CommentedMap) and (u"$import" in val or u"$mixin" in val):
l, _ = loader.resolve_ref(
val, base_url=file_base, checklinks=False)
if isinstance(l, CommentedSeq):
lc = document.lc.data[i]
del document[i]
llen = len(l)
for j in range(len(document) + llen, i + llen, -1):
document.lc.data[
j - 1] = document.lc.data[j - llen]
for item in l:
document.insert(i, item)
document.lc.data[i] = lc
i += 1
else:
document[i] = l
i += 1
else:
document[i], _ = loader.resolve_all(
val, base_url, file_base=file_base, checklinks=False)
i += 1
except validate.ValidationException as v:
_logger.warn("failed", exc_info=True)
raise validate.ValidationException("(%s) (%s) Validation error in position %i:\n%s" % (
id(loader), file_base, i, validate.indent(six.text_type(v))))
for identifer in loader.identity_links:
if identifer in metadata:
if isinstance(metadata[identifer], (str, six.text_type)):
metadata[identifer] = loader.expand_url(
metadata[identifer], base_url, scoped_id=True)
loader.idx[metadata[identifer]] = document
if checklinks:
all_doc_ids={} # type: Dict[Text, Text]
self.validate_links(document, u"", all_doc_ids)
return document, metadata
def fetch(self, url, inject_ids=True): # type: (Text, bool) -> Any
if url in self.idx:
return self.idx[url]
try:
text = self.fetch_text(url)
if isinstance(text, bytes):
textIO = StringIO(text.decode('utf-8'))
else:
textIO = StringIO(text)
textIO.name = url # type: ignore
result = yaml.round_trip_load(textIO)
add_lc_filename(result, url)
except yaml.parser.ParserError as e:
raise validate.ValidationException("Syntax error %s" % (e))
if (isinstance(result, CommentedMap) and inject_ids
and bool(self.identifiers)):
for identifier in self.identifiers:
if identifier not in result:
result[identifier] = url
self.idx[self.expand_url(result[identifier], url)] = result
else:
self.idx[url] = result
return result
FieldType = TypeVar('FieldType', six.text_type, CommentedSeq, CommentedMap)
def validate_scoped(self, field, link, docid):
# type: (Text, Text, Text) -> Text
split = urllib.parse.urlsplit(docid)
sp = split.fragment.split(u"/")
n = self.scoped_ref_fields[field]
while n > 0 and len(sp) > 0:
sp.pop()
n -= 1
tried = []
while True:
sp.append(link)
url = urllib.parse.urlunsplit((
split.scheme, split.netloc, split.path, split.query,
u"/".join(sp)))
tried.append(url)
if url in self.idx:
return url
sp.pop()
if len(sp) == 0:
break
sp.pop()
raise validate.ValidationException(
"Field `%s` references unknown identifier `%s`, tried %s" % (field, link, ", ".join(tried)))
def validate_link(self, field, link, docid, all_doc_ids):
# type: (Text, FieldType, Text, Dict[Text, Text]) -> FieldType
if field in self.nolinkcheck:
return link
if isinstance(link, (str, six.text_type)):
if field in self.vocab_fields:
if (link not in self.vocab and link not in self.idx
and link not in self.rvocab):
if field in self.scoped_ref_fields:
return self.validate_scoped(field, link, docid)
elif not self.check_exists(link):
raise validate.ValidationException(
"Field `%s` contains undefined reference to `%s`" % (field, link))
elif link not in self.idx and link not in self.rvocab:
if field in self.scoped_ref_fields:
return self.validate_scoped(field, link, docid)
elif not self.check_exists(link):
raise validate.ValidationException(
"Field `%s` contains undefined reference to `%s`"
% (field, link))
elif isinstance(link, CommentedSeq):
errors = []
for n, i in enumerate(link):
try:
link[n] = self.validate_link(field, i, docid, all_doc_ids)
except validate.ValidationException as v:
errors.append(v)
if bool(errors):
raise validate.ValidationException(
"\n".join([six.text_type(e) for e in errors]))
elif isinstance(link, CommentedMap):
self.validate_links(link, docid, all_doc_ids)
else:
raise validate.ValidationException(
"`%s` field is %s, expected string, list, or a dict."
% (field, type(link).__name__))
return link
def getid(self, d): # type: (Any) -> Optional[Text]
if isinstance(d, dict):
for i in self.identifiers:
if i in d:
idd = d[i]
if isinstance(idd, (str, six.text_type)):
return idd
return None
def validate_links(self, document, base_url, all_doc_ids):
# type: (Union[CommentedMap, CommentedSeq, Text, None], Text, Dict[Text, Text]) -> None
docid = self.getid(document)
if not docid:
docid = base_url
errors = [] # type: List[Exception]
iterator = None # type: Any
if isinstance(document, list):
iterator = enumerate(document)
elif isinstance(document, dict):
try:
for d in self.url_fields:
sl = SourceLine(document, d, validate.ValidationException)
if d in document and d not in self.identity_links:
document[d] = self.validate_link(d, document[d], docid, all_doc_ids)
for identifier in self.identifiers: # validate that each id is defined uniquely
if identifier in document:
sl = SourceLine(document, identifier, validate.ValidationException)
if document[identifier] in all_doc_ids and sl.makeLead() != all_doc_ids[document[identifier]]:
raise validate.ValidationException(
"%s object %s `%s` previously defined" % (all_doc_ids[document[identifier]], identifier, relname(document[identifier]), ))
else:
all_doc_ids[document[identifier]] = sl.makeLead()
break
except validate.ValidationException as v:
if d == "$schemas":
_logger.warn( validate.indent(six.text_type(v)))
else:
errors.append(sl.makeError(six.text_type(v)))
if hasattr(document, "iteritems"):
iterator = six.iteritems(document)
else:
iterator = list(document.items())
else:
return
for key, val in iterator:
sl = SourceLine(document, key, validate.ValidationException)
try:
self.validate_links(val, docid, all_doc_ids)
except validate.ValidationException as v:
if key in self.nolinkcheck or (isinstance(key, six.string_types) and ":" in key):
_logger.warn( validate.indent(six.text_type(v)))
else:
docid2 = self.getid(val)
if docid2 is not None:
errors.append(sl.makeError("checking object `%s`\n%s"
% (relname(docid2), validate.indent(six.text_type(v)))))
else:
if isinstance(key, six.string_types):
errors.append(sl.makeError("checking field `%s`\n%s" % (
key, validate.indent(six.text_type(v)))))
else:
errors.append(sl.makeError("checking item\n%s" % (
validate.indent(six.text_type(v)))))
if bool(errors):
if len(errors) > 1:
raise validate.ValidationException(
u"\n".join([six.text_type(e) for e in errors]))
else:
raise errors[0]
return
D = TypeVar('D', CommentedMap, ContextType)
def _copy_dict_without_key(from_dict, filtered_key):
# type: (D, Any) -> D
new_dict = copy.copy(from_dict)
if filtered_key in new_dict:
del new_dict[filtered_key]
if isinstance(from_dict, CommentedMap):
new_dict.lc.data = copy.copy(from_dict.lc.data)
new_dict.lc.filename = from_dict.lc.filename
return new_dict
schema-salad-2.6.20171201034858/schema_salad/sourceline.py 0000644 0001751 0001751 00000015001 13162734255 022710 0 ustar peter peter 0000000 0000000 from __future__ import absolute_import
import ruamel.yaml
from ruamel.yaml.comments import CommentedBase, CommentedMap, CommentedSeq
import re
import os
import traceback
from typing import (Any, AnyStr, Callable, cast, Dict, List, Iterable, Tuple,
TypeVar, Union, Text)
import six
lineno_re = re.compile(u"^(.*?:[0-9]+:[0-9]+: )(( *)(.*))")
def _add_lc_filename(r, source): # type: (ruamel.yaml.comments.CommentedBase, AnyStr) -> None
if isinstance(r, ruamel.yaml.comments.CommentedBase):
r.lc.filename = source
if isinstance(r, list):
for d in r:
_add_lc_filename(d, source)
elif isinstance(r, dict):
for d in six.itervalues(r):
_add_lc_filename(d, source)
def relname(source): # type: (Text) -> Text
if source.startswith("file://"):
source = source[7:]
source = os.path.relpath(source)
return source
def add_lc_filename(r, source): # type: (ruamel.yaml.comments.CommentedBase, Text) -> None
_add_lc_filename(r, relname(source))
def reflow(text, maxline, shift=""): # type: (Text, int, Text) -> Text
if maxline < 20:
maxline = 20
if len(text) > maxline:
sp = text.rfind(' ', 0, maxline)
if sp < 1:
sp = text.find(' ', sp+1)
if sp == -1:
sp = len(text)
if sp < len(text):
return "%s\n%s%s" % (text[0:sp], shift, reflow(text[sp+1:], maxline, shift))
return text
def indent(v, nolead=False, shift=u" ", bullet=u" "): # type: (Text, bool, Text, Text) -> Text
if nolead:
return v.splitlines()[0] + u"\n".join([shift + l for l in v.splitlines()[1:]])
else:
def lineno(i, l): # type: (int, Text) -> Text
r = lineno_re.match(l)
if bool(r):
return r.group(1) + (bullet if i == 0 else shift) + r.group(2)
else:
return (bullet if i == 0 else shift) + l
return u"\n".join([lineno(i, l) for i, l in enumerate(v.splitlines())])
def bullets(textlist, bul): # type: (List[Text], Text) -> Text
if len(textlist) == 1:
return textlist[0]
else:
return "\n".join(indent(t, bullet=bul) for t in textlist)
def strip_dup_lineno(text, maxline=None): # type: (Text, int) -> Text
if maxline is None:
maxline = int(os.environ.get("COLUMNS", "100"))
pre = None
msg = []
for l in text.splitlines():
g = lineno_re.match(l)
if not g:
msg.append(l)
continue
shift = len(g.group(1)) + len(g.group(3))
g2 = reflow(g.group(2), maxline-shift, " " * shift)
if g.group(1) != pre:
pre = g.group(1)
msg.append(pre + g2)
else:
g2 = reflow(g.group(2), maxline-len(g.group(1)), " " * (len(g.group(1))+len(g.group(3))))
msg.append(" " * len(g.group(1)) + g2)
return "\n".join(msg)
def cmap(d, lc=None, fn=None): # type: (Union[int, float, str, Text, Dict, List], List[int], Text) -> Union[int, float, str, Text, CommentedMap, CommentedSeq]
if lc is None:
lc = [0, 0, 0, 0]
if fn is None:
fn = "test"
if isinstance(d, CommentedMap):
fn = d.lc.filename if hasattr(d.lc, "filename") else fn
for k,v in six.iteritems(d):
if k in d.lc.data:
d[k] = cmap(v, lc=d.lc.data[k], fn=fn)
else:
d[k] = cmap(v, lc, fn=fn)
return d
if isinstance(d, CommentedSeq):
fn = d.lc.filename if hasattr(d.lc, "filename") else fn
for k,v in enumerate(d):
if k in d.lc.data:
d[k] = cmap(v, lc=d.lc.data[k], fn=fn)
else:
d[k] = cmap(v, lc, fn=fn)
return d
if isinstance(d, dict):
cm = CommentedMap()
for k in sorted(d.keys()):
v = d[k]
if isinstance(v, CommentedBase):
uselc = [v.lc.line, v.lc.col, v.lc.line, v.lc.col]
vfn = v.lc.filename if hasattr(v.lc, "filename") else fn
else:
uselc = lc
vfn = fn
cm[k] = cmap(v, lc=uselc, fn=vfn)
cm.lc.add_kv_line_col(k, uselc)
cm.lc.filename = fn
return cm
if isinstance(d, list):
cs = CommentedSeq()
for k,v in enumerate(d):
if isinstance(v, CommentedBase):
uselc = [v.lc.line, v.lc.col, v.lc.line, v.lc.col]
vfn = v.lc.filename if hasattr(v.lc, "filename") else fn
else:
uselc = lc
vfn = fn
cs.append(cmap(v, lc=uselc, fn=vfn))
cs.lc.add_kv_line_col(k, uselc)
cs.lc.filename = fn
return cs
else:
return d
class SourceLine(object):
def __init__(self, item, key=None, raise_type=six.text_type, include_traceback=False): # type: (Any, Any, Callable, bool) -> None
self.item = item
self.key = key
self.raise_type = raise_type
self.include_traceback = include_traceback
def __enter__(self): # type: () -> SourceLine
return self
def __exit__(self,
exc_type, # type: Any
exc_value, # type: Any
tb # type: Any
): # -> Any
if not exc_value:
return
if self.include_traceback:
raise self.makeError("\n".join(traceback.format_exception(exc_type, exc_value, tb)))
else:
raise self.makeError(six.text_type(exc_value))
def makeLead(self): # type: () -> Text
if self.key is None or self.item.lc.data is None or self.key not in self.item.lc.data:
return "%s:%i:%i:" % (self.item.lc.filename if hasattr(self.item.lc, "filename") else "",
(self.item.lc.line or 0)+1,
(self.item.lc.col or 0)+1)
else:
return "%s:%i:%i:" % (self.item.lc.filename if hasattr(self.item.lc, "filename") else "",
(self.item.lc.data[self.key][0] or 0)+1,
(self.item.lc.data[self.key][1] or 0)+1)
def makeError(self, msg): # type: (Text) -> Any
if not isinstance(self.item, ruamel.yaml.comments.CommentedBase):
return self.raise_type(msg)
errs = []
lead = self.makeLead()
for m in msg.splitlines():
if bool(lineno_re.match(m)):
errs.append(m)
else:
errs.append("%s %s" % (lead, m))
return self.raise_type("\n".join(errs))
schema-salad-2.6.20171201034858/schema_salad/main.py 0000644 0001751 0001751 00000031456 13203345013 021462 0 ustar peter peter 0000000 0000000 from __future__ import print_function
from __future__ import absolute_import
import argparse
import logging
import sys
import traceback
import json
import os
import re
import itertools
import six
from six.moves import urllib
import pkg_resources # part of setuptools
from typing import Any, Dict, List, Union, Pattern, Text, Tuple, cast
from rdflib import Graph, plugin
from rdflib.serializer import Serializer
from . import schema
from . import jsonld_context
from . import makedoc
from . import validate
from . import codegen
from .sourceline import strip_dup_lineno
from .ref_resolver import Loader, file_uri
_logger = logging.getLogger("salad")
from rdflib.plugin import register, Parser
register('json-ld', Parser, 'rdflib_jsonld.parser', 'JsonLDParser')
def printrdf(workflow, # type: str
wf, # type: Union[List[Dict[Text, Any]], Dict[Text, Any]]
ctx, # type: Dict[Text, Any]
sr # type: str
):
# type: (...) -> None
g = jsonld_context.makerdf(workflow, wf, ctx)
print(g.serialize(format=sr))
def regex_chunk(lines, regex):
# type: (List[str], Pattern[str]) -> List[List[str]]
lst = list(itertools.dropwhile(lambda x: not regex.match(x), lines))
arr = []
while lst:
ret = [lst[0]]+list(itertools.takewhile(lambda x: not regex.match(x),
lst[1:]))
arr.append(ret)
lst = list(itertools.dropwhile(lambda x: not regex.match(x),
lst[1:]))
return arr
def chunk_messages(message): # type: (str) -> List[Tuple[int, str]]
file_regex = re.compile(r'^(.+:\d+:\d+:)(\s+)(.+)$')
item_regex = re.compile(r'^\s*\*\s+')
arr = []
for chun in regex_chunk(message.splitlines(), file_regex):
fst = chun[0]
mat = file_regex.match(fst)
place = mat.group(1)
indent = len(mat.group(2))
lst = [mat.group(3)]+chun[1:]
if [x for x in lst if item_regex.match(x)]:
for item in regex_chunk(lst, item_regex):
msg = re.sub(item_regex, '', "\n".join(item))
arr.append((indent, place+' '+re.sub(r'[\n\s]+',
' ',
msg)))
else:
msg = re.sub(item_regex, '', "\n".join(lst))
arr.append((indent, place+' '+re.sub(r'[\n\s]+',
' ',
msg)))
return arr
def to_one_line_messages(message): # type: (str) -> str
ret = []
max_elem = (0, '')
for (indent, msg) in chunk_messages(message):
if indent > max_elem[0]:
max_elem = (indent, msg)
else:
ret.append(max_elem[1])
max_elem = (indent, msg)
ret.append(max_elem[1])
return "\n".join(ret)
def reformat_yaml_exception_message(message): # type: (str) -> str
line_regex = re.compile(r'^\s+in "(.+)", line (\d+), column (\d+)$')
fname_regex = re.compile(r'^file://'+os.getcwd()+'/')
msgs = message.splitlines()
ret = []
if len(msgs) == 3:
msgs = msgs[1:]
nblanks = 0
elif len(msgs) == 4:
c_msg = msgs[0]
c_file, c_line, c_column = line_regex.match(msgs[1]).groups()
c_file = re.sub(fname_regex, '', c_file)
ret.append("%s:%s:%s: %s" % (c_file, c_line, c_column, c_msg))
msgs = msgs[2:]
nblanks = 2
p_msg = msgs[0]
p_file, p_line, p_column = line_regex.match(msgs[1]).groups()
p_file = re.sub(fname_regex, '', p_file)
ret.append("%s:%s:%s:%s %s" % (p_file, p_line, p_column, ' '*nblanks, p_msg))
return "\n".join(ret)
def main(argsl=None): # type: (List[str]) -> int
if argsl is None:
argsl = sys.argv[1:]
parser = argparse.ArgumentParser()
parser.add_argument("--rdf-serializer",
help="Output RDF serialization format used by --print-rdf (one of turtle (default), n3, nt, xml)",
default="turtle")
exgroup = parser.add_mutually_exclusive_group()
exgroup.add_argument("--print-jsonld-context", action="store_true",
help="Print JSON-LD context for schema")
exgroup.add_argument(
"--print-rdfs", action="store_true", help="Print RDF schema")
exgroup.add_argument("--print-avro", action="store_true",
help="Print Avro schema")
exgroup.add_argument("--print-rdf", action="store_true",
help="Print corresponding RDF graph for document")
exgroup.add_argument("--print-pre", action="store_true",
help="Print document after preprocessing")
exgroup.add_argument(
"--print-index", action="store_true", help="Print node index")
exgroup.add_argument("--print-metadata",
action="store_true", help="Print document metadata")
exgroup.add_argument("--codegen", type=str, metavar="language", help="Generate classes in target language, currently supported: python")
exgroup.add_argument("--print-oneline", action="store_true",
help="Print each error message in oneline")
exgroup = parser.add_mutually_exclusive_group()
exgroup.add_argument("--strict", action="store_true", help="Strict validation (unrecognized or out of place fields are error)",
default=True, dest="strict")
exgroup.add_argument("--non-strict", action="store_false", help="Lenient validation (ignore unrecognized fields)",
default=True, dest="strict")
exgroup = parser.add_mutually_exclusive_group()
exgroup.add_argument("--verbose", action="store_true",
help="Default logging")
exgroup.add_argument("--quiet", action="store_true",
help="Only print warnings and errors.")
exgroup.add_argument("--debug", action="store_true",
help="Print even more logging")
parser.add_argument("schema", type=str, nargs="?", default=None)
parser.add_argument("document", type=str, nargs="?", default=None)
parser.add_argument("--version", "-v", action="store_true",
help="Print version", default=None)
args = parser.parse_args(argsl)
if args.version is None and args.schema is None:
print('%s: error: too few arguments' % sys.argv[0])
return 1
if args.quiet:
_logger.setLevel(logging.WARN)
if args.debug:
_logger.setLevel(logging.DEBUG)
pkg = pkg_resources.require("schema_salad")
if pkg:
if args.version:
print("%s Current version: %s" % (sys.argv[0], pkg[0].version))
return 0
else:
_logger.info("%s Current version: %s", sys.argv[0], pkg[0].version)
# Get the metaschema to validate the schema
metaschema_names, metaschema_doc, metaschema_loader = schema.get_metaschema()
# Load schema document and resolve refs
schema_uri = args.schema
if not (urllib.parse.urlparse(schema_uri)[0] and urllib.parse.urlparse(schema_uri)[0] in [u'http', u'https', u'file']):
schema_uri = file_uri(os.path.abspath(schema_uri))
schema_raw_doc = metaschema_loader.fetch(schema_uri)
try:
schema_doc, schema_metadata = metaschema_loader.resolve_all(
schema_raw_doc, schema_uri)
except (validate.ValidationException) as e:
_logger.error("Schema `%s` failed link checking:\n%s",
args.schema, e, exc_info=(True if args.debug else False))
_logger.debug("Index is %s", list(metaschema_loader.idx.keys()))
_logger.debug("Vocabulary is %s", list(metaschema_loader.vocab.keys()))
return 1
except (RuntimeError) as e:
_logger.error("Schema `%s` read error:\n%s",
args.schema, e, exc_info=(True if args.debug else False))
return 1
# Optionally print the schema after ref resolution
if not args.document and args.print_pre:
print(json.dumps(schema_doc, indent=4))
return 0
if not args.document and args.print_index:
print(json.dumps(list(metaschema_loader.idx.keys()), indent=4))
return 0
# Validate the schema document against the metaschema
try:
schema.validate_doc(metaschema_names, schema_doc,
metaschema_loader, args.strict,
source=schema_metadata.get("name"))
except validate.ValidationException as e:
_logger.error("While validating schema `%s`:\n%s" %
(args.schema, str(e)))
return 1
# Get the json-ld context and RDFS representation from the schema
metactx = {} # type: Dict[str, str]
if isinstance(schema_raw_doc, dict):
metactx = schema_raw_doc.get("$namespaces", {})
if "$base" in schema_raw_doc:
metactx["@base"] = schema_raw_doc["$base"]
if schema_doc is not None:
(schema_ctx, rdfs) = jsonld_context.salad_to_jsonld_context(
schema_doc, metactx)
else:
raise Exception("schema_doc is None??")
# Create the loader that will be used to load the target document.
document_loader = Loader(schema_ctx)
if args.codegen:
codegen.codegen(args.codegen, cast(List[Dict[Text, Any]], schema_doc),
schema_metadata, document_loader)
return 0
# Make the Avro validation that will be used to validate the target
# document
if isinstance(schema_doc, list):
(avsc_names, avsc_obj) = schema.make_avro_schema(
schema_doc, document_loader)
else:
_logger.error("Schema `%s` must be a list.", args.schema)
return 1
if isinstance(avsc_names, Exception):
_logger.error("Schema `%s` error:\n%s", args.schema,
avsc_names, exc_info=((type(avsc_names), avsc_names,
None) if args.debug else None))
if args.print_avro:
print(json.dumps(avsc_obj, indent=4))
return 1
# Optionally print Avro-compatible schema from schema
if args.print_avro:
print(json.dumps(avsc_obj, indent=4))
return 0
# Optionally print the json-ld context from the schema
if args.print_jsonld_context:
j = {"@context": schema_ctx}
print(json.dumps(j, indent=4, sort_keys=True))
return 0
# Optionally print the RDFS graph from the schema
if args.print_rdfs:
print(rdfs.serialize(format=args.rdf_serializer))
return 0
if args.print_metadata and not args.document:
print(json.dumps(schema_metadata, indent=4))
return 0
# If no document specified, all done.
if not args.document:
print("Schema `%s` is valid" % args.schema)
return 0
# Load target document and resolve refs
try:
uri = args.document
if not urllib.parse.urlparse(uri)[0]:
doc = "file://" + os.path.abspath(uri)
document, doc_metadata = document_loader.resolve_ref(uri)
except validate.ValidationException as e:
msg = strip_dup_lineno(six.text_type(e))
msg = to_one_line_messages(str(msg)) if args.print_oneline else msg
_logger.error("Document `%s` failed validation:\n%s",
args.document, msg, exc_info=args.debug)
return 1
except RuntimeError as e:
msg = strip_dup_lineno(six.text_type(e))
msg = reformat_yaml_exception_message(str(msg))
msg = to_one_line_messages(msg) if args.print_oneline else msg
_logger.error("Document `%s` failed validation:\n%s",
args.document, msg, exc_info=args.debug)
return 1
# Optionally print the document after ref resolution
if args.print_pre:
print(json.dumps(document, indent=4))
return 0
if args.print_index:
print(json.dumps(list(document_loader.idx.keys()), indent=4))
return 0
# Validate the schema document against the metaschema
try:
schema.validate_doc(avsc_names, document,
document_loader, args.strict)
except validate.ValidationException as e:
msg = to_one_line_messages(str(e)) if args.print_oneline else str(e)
_logger.error("While validating document `%s`:\n%s" %
(args.document, msg))
return 1
# Optionally convert the document to RDF
if args.print_rdf:
if isinstance(document, (dict, list)):
printrdf(args.document, document, schema_ctx, args.rdf_serializer)
return 0
else:
print("Document must be a dictionary or list.")
return 1
if args.print_metadata:
print(json.dumps(doc_metadata, indent=4))
return 0
print("Document `%s` is valid" % args.document)
return 0
if __name__ == "__main__":
sys.exit(main(sys.argv[1:]))
schema-salad-2.6.20171201034858/schema_salad/python_codegen_support.py 0000644 0001751 0001751 00000032334 13203345013 025333 0 ustar peter peter 0000000 0000000 import six
from six.moves import urllib, StringIO
import ruamel.yaml as yaml
import copy
import re
from typing import List, Text, Dict, Union, Any, Sequence
class ValidationException(Exception):
pass
class Savable(object):
pass
class LoadingOptions(object):
def __init__(self, fetcher=None, namespaces=None, fileuri=None, copyfrom=None):
if copyfrom is not None:
self.idx = copyfrom.idx
if fetcher is None:
fetcher = copyfrom.fetcher
if fileuri is None:
fileuri = copyfrom.fileuri
else:
self.idx = {}
if fetcher is None:
import os
import requests
from cachecontrol.wrapper import CacheControl
from cachecontrol.caches import FileCache
from schema_salad.ref_resolver import DefaultFetcher
if "HOME" in os.environ:
session = CacheControl(
requests.Session(),
cache=FileCache(os.path.join(os.environ["HOME"], ".cache", "salad")))
elif "TMP" in os.environ:
session = CacheControl(
requests.Session(),
cache=FileCache(os.path.join(os.environ["TMP"], ".cache", "salad")))
else:
session = CacheControl(
requests.Session(),
cache=FileCache("/tmp", ".cache", "salad"))
self.fetcher = DefaultFetcher({}, session)
else:
self.fetcher = fetcher
self.fileuri = fileuri
self.vocab = _vocab
self.rvocab = _rvocab
if namespaces is not None:
self.vocab = self.vocab.copy()
self.rvocab = self.rvocab.copy()
for k,v in six.iteritems(namespaces):
self.vocab[k] = v
self.rvocab[v] = k
def load_field(val, fieldtype, baseuri, loadingOptions):
if isinstance(val, dict):
if "$import" in val:
return _document_load_by_url(fieldtype, loadingOptions.fetcher.urljoin(loadingOptions.fileuri, val["$import"]), loadingOptions)
elif "$include" in val:
val = loadingOptions.fetcher.fetch_text(loadingOptions.fetcher.urljoin(loadingOptions.fileuri, val["$include"]))
return fieldtype.load(val, baseuri, loadingOptions)
def save(val):
if isinstance(val, Savable):
return val.save()
if isinstance(val, list):
return [save(v) for v in val]
return val
def expand_url(url, # type: Union[str, Text]
base_url, # type: Union[str, Text]
loadingOptions, # type: LoadingOptions
scoped_id=False, # type: bool
vocab_term=False, # type: bool
scoped_ref=None # type: int
):
# type: (...) -> Text
if not isinstance(url, six.string_types):
return url
url = Text(url)
if url in (u"@id", u"@type"):
return url
if vocab_term and url in loadingOptions.vocab:
return url
if bool(loadingOptions.vocab) and u":" in url:
prefix = url.split(u":")[0]
if prefix in loadingOptions.vocab:
url = loadingOptions.vocab[prefix] + url[len(prefix) + 1:]
split = urllib.parse.urlsplit(url)
if ((bool(split.scheme) and split.scheme in [u'http', u'https', u'file']) or url.startswith(u"$(")
or url.startswith(u"${")):
pass
elif scoped_id and not bool(split.fragment):
splitbase = urllib.parse.urlsplit(base_url)
frg = u""
if bool(splitbase.fragment):
frg = splitbase.fragment + u"/" + split.path
else:
frg = split.path
pt = splitbase.path if splitbase.path != '' else "/"
url = urllib.parse.urlunsplit(
(splitbase.scheme, splitbase.netloc, pt, splitbase.query, frg))
elif scoped_ref is not None and not bool(split.fragment):
splitbase = urllib.parse.urlsplit(base_url)
sp = splitbase.fragment.split(u"/")
n = scoped_ref
while n > 0 and len(sp) > 0:
sp.pop()
n -= 1
sp.append(url)
url = urllib.parse.urlunsplit((
splitbase.scheme, splitbase.netloc, splitbase.path, splitbase.query,
u"/".join(sp)))
else:
url = loadingOptions.fetcher.urljoin(base_url, url)
if vocab_term:
split = urllib.parse.urlsplit(url)
if bool(split.scheme):
if url in loadingOptions.rvocab:
return loadingOptions.rvocab[url]
else:
raise ValidationException("Term '%s' not in vocabulary" % url)
return url
class _Loader(object):
def load(self, doc, baseuri, loadingOptions, docRoot=None):
# type: (Any, Text, LoadingOptions, Union[Text, None]) -> Any
pass
class _AnyLoader(_Loader):
def load(self, doc, baseuri, loadingOptions, docRoot=None):
if doc is not None:
return doc
raise ValidationException("Expected non-null")
class _PrimitiveLoader(_Loader):
def __init__(self, tp):
# type: (Union[type, Sequence[type]]) -> None
self.tp = tp
def load(self, doc, baseuri, loadingOptions, docRoot=None):
if not isinstance(doc, self.tp):
raise ValidationException("Expected a %s but got %s" % (self.tp, type(doc)))
return doc
def __repr__(self):
return str(self.tp)
class _ArrayLoader(_Loader):
def __init__(self, items):
# type: (_Loader) -> None
self.items = items
def load(self, doc, baseuri, loadingOptions, docRoot=None):
if not isinstance(doc, list):
raise ValidationException("Expected a list")
r = []
errors = []
for i in range(0, len(doc)):
try:
lf = load_field(doc[i], _UnionLoader((self, self.items)), baseuri, loadingOptions)
if isinstance(lf, list):
r.extend(lf)
else:
r.append(lf)
except ValidationException as e:
errors.append(SourceLine(doc, i, str).makeError(six.text_type(e)))
if errors:
raise ValidationException("\n".join(errors))
return r
def __repr__(self):
return "array<%s>" % self.items
class _EnumLoader(_Loader):
def __init__(self, symbols):
# type: (Sequence[Text]) -> None
self.symbols = symbols
def load(self, doc, baseuri, loadingOptions, docRoot=None):
if doc in self.symbols:
return doc
else:
raise ValidationException("Expected one of %s" % (self.symbols,))
class _RecordLoader(_Loader):
def __init__(self, classtype):
# type: (type) -> None
self.classtype = classtype
def load(self, doc, baseuri, loadingOptions, docRoot=None):
if not isinstance(doc, dict):
raise ValidationException("Expected a dict")
return self.classtype(doc, baseuri, loadingOptions, docRoot=docRoot)
def __repr__(self):
return str(self.classtype)
class _UnionLoader(_Loader):
def __init__(self, alternates):
# type: (Sequence[_Loader]) -> None
self.alternates = alternates
def load(self, doc, baseuri, loadingOptions, docRoot=None):
errors = []
for t in self.alternates:
try:
return t.load(doc, baseuri, loadingOptions, docRoot=docRoot)
except ValidationException as e:
errors.append("tried %s but\n%s" % (t, indent(str(e))))
raise ValidationException(bullets(errors, "- "))
def __repr__(self):
return " | ".join(str(a) for a in self.alternates)
class _URILoader(_Loader):
def __init__(self, inner, scoped_id, vocab_term, scoped_ref):
# type: (_Loader, bool, bool, Union[int, None]) -> None
self.inner = inner
self.scoped_id = scoped_id
self.vocab_term = vocab_term
self.scoped_ref = scoped_ref
def load(self, doc, baseuri, loadingOptions, docRoot=None):
if isinstance(doc, list):
doc = [expand_url(i, baseuri, loadingOptions,
self.scoped_id, self.vocab_term, self.scoped_ref) for i in doc]
if isinstance(doc, six.string_types):
doc = expand_url(doc, baseuri, loadingOptions,
self.scoped_id, self.vocab_term, self.scoped_ref)
return self.inner.load(doc, baseuri, loadingOptions)
class _TypeDSLLoader(_Loader):
typeDSLregex = re.compile(u"^([^[?]+)(\[\])?(\?)?$")
def __init__(self, inner, refScope):
# type: (_Loader, Union[int, None]) -> None
self.inner = inner
self.refScope = refScope
def resolve(self, doc, baseuri, loadingOptions):
m = self.typeDSLregex.match(doc)
if m:
first = expand_url(m.group(1), baseuri, loadingOptions, False, True, self.refScope)
second = third = None
if bool(m.group(2)):
second = {"type": "array", "items": first}
#second = CommentedMap((("type", "array"),
# ("items", first)))
#second.lc.add_kv_line_col("type", lc)
#second.lc.add_kv_line_col("items", lc)
#second.lc.filename = filename
if bool(m.group(3)):
third = [u"null", second or first]
#third = CommentedSeq([u"null", second or first])
#third.lc.add_kv_line_col(0, lc)
#third.lc.add_kv_line_col(1, lc)
#third.lc.filename = filename
doc = third or second or first
return doc
def load(self, doc, baseuri, loadingOptions, docRoot=None):
if isinstance(doc, list):
r = []
for d in doc:
if isinstance(d, six.string_types):
resolved = self.resolve(d, baseuri, loadingOptions)
if isinstance(resolved, list):
for i in resolved:
if i not in r:
r.append(i)
else:
if resolved not in r:
r.append(resolved)
else:
r.append(d)
doc = r
elif isinstance(doc, six.string_types):
doc = self.resolve(doc, baseuri, loadingOptions)
return self.inner.load(doc, baseuri, loadingOptions)
class _IdMapLoader(_Loader):
def __init__(self, inner, mapSubject, mapPredicate):
# type: (_Loader, Text, Union[Text, None]) -> None
self.inner = inner
self.mapSubject = mapSubject
self.mapPredicate = mapPredicate
def load(self, doc, baseuri, loadingOptions, docRoot=None):
if isinstance(doc, dict):
r = []
for k in sorted(doc.keys()):
val = doc[k]
if isinstance(val, dict):
v = copy.copy(val)
if hasattr(val, 'lc'):
v.lc.data = val.lc.data
v.lc.filename = val.lc.filename
else:
if self.mapPredicate:
v = {self.mapPredicate: val}
else:
raise ValidationException("No mapPredicate")
v[self.mapSubject] = k
r.append(v)
doc = r
return self.inner.load(doc, baseuri, loadingOptions)
def _document_load(loader, doc, baseuri, loadingOptions):
if isinstance(doc, six.string_types):
return _document_load_by_url(loader, loadingOptions.fetcher.urljoin(baseuri, doc), loadingOptions)
if isinstance(doc, dict):
if "$namespaces" in doc:
loadingOptions = LoadingOptions(copyfrom=loadingOptions, namespaces=doc["$namespaces"])
if "$base" in doc:
baseuri = doc["$base"]
if "$graph" in doc:
return loader.load(doc["$graph"], baseuri, loadingOptions)
else:
return loader.load(doc, baseuri, loadingOptions, docRoot=baseuri)
if isinstance(doc, list):
return loader.load(doc, baseuri, loadingOptions)
raise ValidationException()
def _document_load_by_url(loader, url, loadingOptions):
if url in loadingOptions.idx:
return _document_load(loader, loadingOptions.idx[url], url, loadingOptions)
text = loadingOptions.fetcher.fetch_text(url)
if isinstance(text, bytes):
textIO = StringIO(text.decode('utf-8'))
else:
textIO = StringIO(text)
textIO.name = url # type: ignore
result = yaml.round_trip_load(textIO)
add_lc_filename(result, url)
loadingOptions.idx[url] = result
loadingOptions = LoadingOptions(copyfrom=loadingOptions, fileuri=url)
return _document_load(loader, result, url, loadingOptions)
def file_uri(path, split_frag=False): # type: (str, bool) -> str
if path.startswith("file://"):
return path
if split_frag:
pathsp = path.split("#", 2)
frag = "#" + urllib.parse.quote(str(pathsp[1])) if len(pathsp) == 2 else ""
urlpath = urllib.request.pathname2url(str(pathsp[0]))
else:
urlpath = urllib.request.pathname2url(path)
frag = ""
if urlpath.startswith("//"):
return "file:%s%s" % (urlpath, frag)
else:
return "file://%s%s" % (urlpath, frag)
schema-salad-2.6.20171201034858/schema_salad/utils.py 0000644 0001751 0001751 00000002047 13203345013 021670 0 ustar peter peter 0000000 0000000 from __future__ import absolute_import
import os
from typing import Any, Dict, List
def add_dictlist(di, key, val): # type: (Dict, Any, Any) -> None
if key not in di:
di[key] = []
di[key].append(val)
def aslist(l): # type: (Any) -> List
"""Convenience function to wrap single items and lists, and return lists unchanged."""
if isinstance(l, list):
return l
else:
return [l]
# http://rightfootin.blogspot.com/2006/09/more-on-python-flatten.html
def flatten(l, ltypes=(list, tuple)):
# type: (Any, Any) -> Any
if l is None:
return []
if not isinstance(l, ltypes):
return [l]
ltype = type(l)
lst = list(l)
i = 0
while i < len(lst):
while isinstance(lst[i], ltypes):
if not lst[i]:
lst.pop(i)
i -= 1
break
else:
lst[i:i + 1] = lst[i]
i += 1
return ltype(lst)
# Check if we are on windows OS
def onWindows():
# type: () -> (bool)
return os.name == 'nt'
schema-salad-2.6.20171201034858/schema_salad/codegen_base.py 0000644 0001751 0001751 00000004563 13203345013 023133 0 ustar peter peter 0000000 0000000 import collections
from six.moves import urllib
from typing import List, Text, Dict, Union, Any
from . import schema
def shortname(inputid):
# type: (Text) -> Text
d = urllib.parse.urlparse(inputid)
if d.fragment:
return d.fragment.split(u"/")[-1]
else:
return d.path.split(u"/")[-1]
class TypeDef(object):
def __init__(self, name, init):
# type: (Text, Text) -> None
self.name = name
self.init = init
class CodeGenBase(object):
def __init__(self):
# type: () -> None
self.collected_types = collections.OrderedDict() # type: collections.OrderedDict[Text, TypeDef]
self.vocab = {} # type: Dict[Text, Text]
def declare_type(self, t):
# type: (TypeDef) -> TypeDef
if t not in self.collected_types:
self.collected_types[t.name] = t
return t
def add_vocab(self, name, uri):
# type: (Text, Text) -> None
self.vocab[name] = uri
def prologue(self):
# type: () -> None
raise NotImplementedError()
def safe_name(self, n):
# type: (Text) -> Text
return schema.avro_name(n)
def begin_class(self, classname, extends, doc, abstract):
# type: (Text, List[Text], Text, bool) -> None
raise NotImplementedError()
def end_class(self, classname):
# type: (Text) -> None
raise NotImplementedError()
def type_loader(self, t):
# type: (Union[List[Any], Dict[Text, Any]]) -> TypeDef
raise NotImplementedError()
def declare_field(self, name, typedef, doc, optional):
# type: (Text, TypeDef, Text, bool) -> None
raise NotImplementedError()
def declare_id_field(self, name, typedef, doc):
# type: (Text, TypeDef, Text) -> None
raise NotImplementedError()
def uri_loader(self, inner, scoped_id, vocab_term, refScope):
# type: (TypeDef, bool, bool, Union[int, None]) -> TypeDef
raise NotImplementedError()
def idmap_loader(self, field, inner, mapSubject, mapPredicate):
# type: (Text, TypeDef, Text, Union[Text, None]) -> TypeDef
raise NotImplementedError()
def typedsl_loader(self, inner, refScope):
# type: (TypeDef, Union[int, None]) -> TypeDef
raise NotImplementedError()
def epilogue(self, rootLoader):
# type: (TypeDef) -> None
raise NotImplementedError()
schema-salad-2.6.20171201034858/schema_salad/metaschema/ 0000755 0001751 0001751 00000000000 13211573301 022264 5 ustar peter peter 0000000 0000000 schema-salad-2.6.20171201034858/schema_salad/metaschema/vocab_res_proc.yml 0000644 0001751 0001751 00000000363 12651763266 026021 0 ustar peter peter 0000000 0000000 {
"form": {
"things": [
{
"voc": "red",
},
{
"voc": "red",
},
{
"voc": "http://example.com/acid#blue",
}
]
}
}
schema-salad-2.6.20171201034858/schema_salad/metaschema/link_res_proc.yml 0000644 0001751 0001751 00000000633 12651763266 025664 0 ustar peter peter 0000000 0000000 {
"$base": "http://example.com/base",
"link": "http://example.com/base/zero",
"form": {
"link": "http://example.com/one",
"things": [
{
"link": "http://example.com/two"
},
{
"link": "http://example.com/base#three"
},
{
"link": "http://example.com/four#five",
},
{
"link": "http://example.com/acid#six",
}
]
}
}
schema-salad-2.6.20171201034858/schema_salad/metaschema/typedsl_res_schema.yml 0000644 0001751 0001751 00000000453 13060036611 026666 0 ustar peter peter 0000000 0000000 {
"$graph": [
{"$import": "metaschema_base.yml"},
{
"name": "TypeDSLExample",
"type": "record",
"documentRoot": true,
"fields": [{
"name": "extype",
"type": "string",
"jsonldPredicate": {
_type: "@vocab",
"typeDSL": true
}
}]
}]
}
schema-salad-2.6.20171201034858/schema_salad/metaschema/metaschema.html 0000644 0001751 0001751 00000147742 13165562750 025316 0 ustar peter peter 0000000 0000000
Salad is a schema language for describing structured linked data documents
in JSON or YAML documents. A Salad schema provides rules for
preprocessing, structural validation, and link checking for documents
described by a Salad schema. Salad builds on JSON-LD and the Apache Avro
data serialization system, and extends Avro with features for rich data
modeling such as inheritance, template specialization, object identifiers,
and object references. Salad was developed to provide a bridge between the
record oriented data modeling supported by Apache Avro and the Semantic
Web.
Status of This Document
This document is the product of the Common Workflow Language working
group. The
latest version of this document is available in the "schema_salad" repository at
The products of the CWL working group (including this document) are made available
under the terms of the Apache License, version 2.0.
Table of contents
1. Introduction
The JSON data model is an extremely popular way to represent structured
data. It is attractive because of its relative simplicity and is a
natural fit with the standard types of many programming languages.
However, this simplicity means that basic JSON lacks expressive features
useful for working with complex data structures and document formats, such
as schemas, object references, and namespaces.
JSON-LD is a W3C standard providing a way to describe how to interpret a
JSON document as Linked Data by means of a "context". JSON-LD provides a
powerful solution for representing object references and namespaces in JSON
based on standard web URIs, but is not itself a schema language. Without a
schema providing a well defined structure, it is difficult to process an
arbitrary JSON-LD document as idiomatic JSON because there are many ways to
express the same data that are logically equivalent but structurally
distinct.
Several schema languages exist for describing and validating JSON data,
such as the Apache Avro data serialization system, however none understand
linked data. As a result, to fully take advantage of JSON-LD to build the
next generation of linked data applications, one must maintain separate
JSON schema, JSON-LD context, RDF schema, and human documentation, despite
significant overlap of content and obvious need for these documents to stay
synchronized.
Schema Salad is designed to address this gap. It provides a schema
language and processing rules for describing structured JSON content
permitting URI resolution and strict document validation. The schema
language supports linked data through annotations that describe the linked
data interpretation of the content, enables generation of JSON-LD context
and RDF schema, and production of RDF triples by applying the JSON-LD
context. The schema language also provides for robust support of inline
documentation.
1.1 Introduction to v1.0
This is the second version of of the Schema Salad specification. It is
developed concurrently with v1.0 of the Common Workflow Language for use in
specifying the Common Workflow Language, however Schema Salad is intended to be
useful to a broader audience. Compared to the draft-1 schema salad
specification, the following changes have been made:
This document describes the syntax, data model, algorithms, and schema
language for working with Salad documents. It is not intended to document
a specific implementation of Salad, however it may serve as a reference for
the behavior of conforming implementations.
1.4 Terminology
The terminology used to describe Salad documents is defined in the Concepts
section of the specification. The terms defined in the following list are
used in building those definitions and in describing the actions of an
Salad implementation:
may: Conforming Salad documents and Salad implementations are permitted but
not required to be interpreted as described.
must: Conforming Salad documents and Salad implementations are required
to be interpreted as described; otherwise they are in error.
error: A violation of the rules of this specification; results are
undefined. Conforming implementations may detect and report an error and may
recover from it.
fatal error: A violation of the rules of this specification; results
are undefined. Conforming implementations must not continue to process the
document and may report an error.
at user option: Conforming software may or must (depending on the modal verb in
the sentence) behave as described; if it does, it must provide users a means to
enable or disable the behavior described.
2. Document model
2.1 Data concepts
An object is a data structure equivalent to the "object" type in JSON,
consisting of a unordered set of name/value pairs (referred to here as
fields) and where the name is a string and the value is a string, number,
boolean, array, or object.
A document is a file containing a serialized object, or an array of
objects.
A document type is a class of files that share a common structure and
semantics.
A document schema is a formal description of the grammar of a document type.
A base URI is a context-dependent URI used to resolve relative references.
An identifier is a URI that designates a single document or single
object within a document.
A vocabulary is the set of symbolic field names and enumerated symbols defined
by a document schema, where each term maps to absolute URI.
2.2 Syntax
Conforming Salad documents are serialized and loaded using YAML syntax and
UTF-8 text encoding. Salad documents are written using the JSON-compatible
subset of YAML. Features of YAML such as headers and type tags that are
not found in the standard JSON data model must not be used in conforming
Salad documents. It is a fatal error if the document is not valid YAML.
A Salad document must consist only of either a single root object or an
array of objects.
2.3 Document context
2.3.1 Implied context
The implicit context consists of the vocabulary defined by the schema and
the base URI. By default, the base URI must be the URI that was used to
load the document. It may be overridden by an explicit context.
2.3.2 Explicit context
If a document consists of a root object, this object may contain the
fields $base, $namespaces, $schemas, and $graph:
$base: Must be a string. Set the base URI for the document used to
resolve relative references.
$namespaces: Must be an object with strings as values. The keys of
the object are namespace prefixes used in the document; the values of
the object are the prefix expansions.
$schemas: Must be an array of strings. This field may list URI
references to documents in RDF-XML format which will be queried for RDF
schema data. The subjects and predicates described by the RDF schema
may provide additional semantic context for the document, and may be
used for validation of prefixed extension fields found in the document.
Other directives beginning with $ must be ignored.
2.4 Document graph
If a document consists of a single root object, this object may contain the
field $graph. This field must be an array of objects. If present, this
field holds the primary content of the document. A document that consists
of array of objects at the root is an implicit graph.
2.5 Document metadata
If a document consists of a single root object, metadata about the
document, such as authorship, may be declared in the root object.
2.6 Document schema
Document preprocessing, link validation and schema validation require a
document schema. A schema may consist of:
At least one record definition object which defines valid fields that
make up a record type. Record field definitions include the valid types
that may be assigned to each field and annotations to indicate fields
that represent identifiers and links, described below in "Semantic
Annotations".
Any number of enumerated type objects which define a set of finite set of symbols that are
valid value of the type.
Any number of documentation objects which allow in-line documentation of the schema.
The schema for defining a salad schema (the metaschema) is described in
detail in "Schema validation".
2.6.1 Record field annotations
In a document schema, record field definitions may include the field
jsonldPredicate, which may be either a string or object. Implementations
must use the following document preprocessing of fields by the following
rules:
If the value of jsonldPredicate is @id, the field is an identifier
field.
If the value of jsonldPredicate is an object, and contains that
object contains the field _type with the value @id, the field is a
link field.
If the value of jsonldPredicate is an object, and contains that
object contains the field _type with the value @vocab, the field is a
vocabulary field, which is a subtype of link field.
2.7 Document traversal
To perform document document preprocessing, link validation and schema
validation, the document must be traversed starting from the fields or
array items of the root object or array and recursively visiting each child
item which contains an object or arrays.
3. Document preprocessing
After processing the explicit context (if any), document preprocessing
begins. Starting from the document root, object fields values or array
items which contain objects or arrays are recursively traversed
depth-first. For each visited object, field names, identifier fields, link
fields, vocabulary fields, and $import and $include directives must be
processed as described in this section. The order of traversal of child
nodes within a parent node is undefined.
3.1 Field name resolution
The document schema declares the vocabulary of known field names. During
preprocessing traversal, field name in the document which are not part of
the schema vocabulary must be resolved to absolute URIs. Under "strict"
validation, it is an error for a document to include fields which are not
part of the vocabulary and not resolvable to absolute URIs. Fields names
which are not part of the vocabulary are resolved using the following
rules:
If an field name URI begins with a namespace prefix declared in the
document context (@context) followed by a colon :, the prefix and
colon must be replaced by the namespace declared in @context.
If there is a vocabulary term which maps to the URI of a resolved
field, the field name must be replace with the vocabulary term.
If a field name URI is an absolute URI consisting of a scheme and path
and is not part of the vocabulary, no processing occurs.
Field name resolution is not relative. It must not be affected by the
base URI.
The schema may designate one or more fields as identifier fields to identify
specific objects. Processing must resolve relative identifiers to absolute
identifiers using the following rules:
If an identifier URI is prefixed with # it is a URI relative
fragment identifier. It is resolved relative to the base URI by setting
or replacing the fragment portion of the base URI.
If an identifier URI does not contain a scheme and is not prefixed # it
is a parent relative fragment identifier. It is resolved relative to the
base URI by the following rule: if the base URI does not contain a
document fragment, set the fragment portion of the base URI. If the base
URI does contain a document fragment, append a slash / followed by the
identifier field to the fragment portion of the base URI.
If an identifier URI begins with a namespace prefix declared in
$namespaces followed by a colon :, the prefix and colon must be
replaced by the namespace declared in $namespaces.
If an identifier URI is an absolute URI consisting of a scheme and path,
no processing occurs.
When preprocessing visits a node containing an identifier, that identifier
must be used as the base URI to process child nodes.
It is an error for more than one object in a document to have the same
absolute URI.
The schema may designate one or more fields as link fields reference other
objects. Processing must resolve links to either absolute URIs using the
following rules:
If a reference URI is prefixed with # it is a relative
fragment identifier. It is resolved relative to the base URI by setting
or replacing the fragment portion of the base URI.
If a reference URI does not contain a scheme and is not prefixed with #
it is a path relative reference. If the reference URI contains # in any
position other than the first character, the reference URI must be divided
into a path portion and a fragment portion split on the first instance of
#. The path portion is resolved relative to the base URI by the following
rule: if the path portion of the base URI ends in a slash /, append the
path portion of the reference URI to the path portion of the base URI. If
the path portion of the base URI does not end in a slash, replace the final
path segment with the path portion of the reference URI. Replace the
fragment portion of the base URI with the fragment portion of the reference
URI.
If a reference URI begins with a namespace prefix declared in $namespaces
followed by a colon :, the prefix and colon must be replaced by the
namespace declared in $namespaces.
If a reference URI is an absolute URI consisting of a scheme and path,
no processing occurs.
Link resolution must not affect the base URI used to resolve identifiers
and other links.
The schema may designate one or more vocabulary fields which use terms
defined in the vocabulary. Processing must resolve vocabulary fields to
either vocabulary terms or absolute URIs by first applying the link
resolution rules defined above, then applying the following additional
rule:
* If a reference URI is a vocabulary field, and there is a vocabulary
term which maps to the resolved URI, the reference must be replace with
the vocabulary term.
During preprocessing traversal, an implementation must resolve $import
directives. An $import directive is an object consisting of exactly one
field $import specifying resource by URI string. It is an error if there
are additional fields in the $import object, such additional fields must
be ignored.
The URI string must be resolved to an absolute URI using the link
resolution rules described previously. Implementations must support
loading from file, http and https resources. The URI referenced by
$import must be loaded and recursively preprocessed as a Salad document.
The external imported document does not inherit the context of the
importing document, and the default base URI for processing the imported
document must be the URI used to retrieve the imported document. If the
$import URI includes a document fragment, the fragment must be excluded
from the base URI used to preprocess the imported document.
Once loaded and processed, the $import node is replaced in the document
structure by the object or array yielded from the import operation.
URIs may reference document fragments which refer to specific an object in
the target document. This indicates that the $import node must be
replaced by only the object with the appropriate fragment identifier.
It is a fatal error if an import directive refers to an external resource
or resource fragment which does not exist or is not accessible.
During preprocessing traversal, an implementation must resolve $include
directives. An $include directive is an object consisting of exactly one
field $include specifying a URI string. It is an error if there are
additional fields in the $include object, such additional fields must be
ignored.
The URI string must be resolved to an absolute URI using the link
resolution rules described previously. The URI referenced by $include must
be loaded as a text data. Implementations must support loading from
file, http and https resources. Implementations may transcode the
character encoding of the text data to match that of the parent document,
but must not interpret or parse the text document in any other way.
Once loaded, the $include node is replaced in the document structure by a
string containing the text data loaded from the resource.
It is a fatal error if an import directive refers to an external resource
which does not exist or is not accessible.
During preprocessing traversal, an implementation must resolve $mixin
directives. An $mixin directive is an object consisting of the field
$mixin specifying resource by URI string. If there are additional fields in
the $mixin object, these fields override fields in the object which is loaded
from the $mixin URI.
The URI string must be resolved to an absolute URI using the link resolution
rules described previously. Implementations must support loading from file,
http and https resources. The URI referenced by $mixin must be loaded
and recursively preprocessed as a Salad document. The external imported
document must inherit the context of the importing document, however the file
URI for processing the imported document must be the URI used to retrieve the
imported document. The $mixin URI must not include a document fragment.
Once loaded and processed, the $mixin node is replaced in the document
structure by the object or array yielded from the import operation.
URIs may reference document fragments which refer to specific an object in
the target document. This indicates that the $mixin node must be
replaced by only the object with the appropriate fragment identifier.
It is a fatal error if an import directive refers to an external resource
or resource fragment which does not exist or is not accessible.
The schema may designate certain fields as having a mapSubject. If the
value of the field is a JSON object, it must be transformed into an array of
JSON objects. Each key-value pair from the source JSON object is a list
item, each list item must be a JSON objects, and the value of the key is
assigned to the field specified by mapSubject.
Fields which have mapSubject specified may also supply a mapPredicate.
If the value of a map item is not a JSON object, the item is transformed to a
JSON object with the key assigned to the field specified by mapSubject and
the value assigned to the field specified by mapPredicate.
Once a document has been preprocessed, an implementation may validate
links. The link validation traversal may visit fields which the schema
designates as link fields and check that each URI references an existing
object in the current document, an imported document, file system, or
network resource. Failure to validate links may be a fatal error. Link
validation behavior for individual fields may be modified by identity and
noLinkCheck in the jsonldPredicate section of the field schema.
Only applies if extends is declared. Apply type specialization using the
base record as a template. For each field inherited from the base
record, replace any instance of the type specializeFrom with
specializeTo.
The context type hint, corresponds to JSON-LD @type directive.
If the value of this field is @id and identity is false or
unspecified, the parent field must be resolved using the link
resolution rules. If identity is true, the parent field must be
resolved using the identifier expansion rules.
If the value of this field is @vocab, the parent field must be
resolved using the vocabulary resolution rules.
If true and _type is @id this indicates that the parent field must
be resolved according to identity resolution rules instead of link
resolution rules. In addition, the field value is considered an
assertion that the linked value exists; absence of an object in the loaded document
with the URI is not an error.
If true, this indicates that link validation traversal must stop at
this field. This field (it is is a URI) or any fields under it (if it
is an object or array) are not subject to link checking.
If the value of the field is a JSON object, it must be transformed
into an array of JSON objects, where each key-value pair from the
source JSON object is a list item, the list items must be JSON objects,
and the key is assigned to the field specified by mapSubject.
Only applies if mapSubject is also provided. If the value of the
field is a JSON object, it is transformed as described in mapSubject,
with the addition that when the value of a map item is not an object,
the item is transformed to a JSON object with the key assigned to the
field specified by mapSubject and the value assigned to the field
specified by mapPredicate.
If the field contains a relative reference, it must be resolved by
searching for valid document references in each successive parent scope
in the document fragment. For example, a reference of foo in the
context #foo/bar/baz will first check for the existence of
#foo/bar/baz/foo, followed by #foo/bar/foo, then #foo/foo and
then finally #foo. The first valid URI in the search order shall be
used as the fully resolved value of the identifier. The value of the
refScope field is the specified number of levels from the containing
identifer scope before starting the search, so if refScope: 2 then
"baz" and "bar" must be stripped to get the base #foo and search
#foo/foo and the #foo. The last scope searched must be the top
level scope before determining if the identifier cannot be resolved.