schema-salad-2.6.20171201034858/0000755000175100017510000000000013211573301015551 5ustar peterpeter00000000000000schema-salad-2.6.20171201034858/README.rst0000644000175100017510000001014513130233260017236 0ustar peterpeter00000000000000|Build Status| |Build status| .. |Build Status| image:: https://img.shields.io/travis/common-workflow-language/schema_salad/master.svg?label=unix%20build :target: https://travis-ci.org/common-workflow-language/schema_salad .. |Build status| image:: https://img.shields.io/appveyor/ci/mr-c/schema-salad/master.svg?label=windows%20build :target: https://ci.appveyor.com/project/mr-c/schema-salad/branch/master Schema Salad ------------ Salad is a schema language for describing JSON or YAML structured linked data documents. Salad is based originally on JSON-LD_ and the Apache Avro_ data serialization system. Salad schema describes rules for preprocessing, structural validation, and link checking for documents described by a Salad schema. Salad features for rich data modeling such as inheritance, template specialization, object identifiers, object references, documentation generation, and transformation to RDF_. Salad provides a bridge between document and record oriented data modeling and the Semantic Web. Usage ----- :: $ pip install schema_salad $ schema-salad-tool usage: schema-salad-tool [-h] [--rdf-serializer RDF_SERIALIZER] [--print-jsonld-context | --print-doc | --print-rdfs | --print-avro | --print-rdf | --print-pre | --print-index | --print-metadata | --version] [--strict | --non-strict] [--verbose | --quiet | --debug] schema [document] $ python >>> import schema_salad To install from source:: git clone https://github.com/common-workflow-language/schema_salad cd schema_salad python setup.py install Documentation ------------- See the specification_ and the metaschema_ (salad schema for itself). For an example application of Schema Salad see the Common Workflow Language_. Rationale --------- The JSON data model is an popular way to represent structured data. It is attractive because of it's relative simplicity and is a natural fit with the standard types of many programming languages. However, this simplicity comes at the cost that basic JSON lacks expressive features useful for working with complex data structures and document formats, such as schemas, object references, and namespaces. JSON-LD is a W3C standard providing a way to describe how to interpret a JSON document as Linked Data by means of a "context". JSON-LD provides a powerful solution for representing object references and namespaces in JSON based on standard web URIs, but is not itself a schema language. Without a schema providing a well defined structure, it is difficult to process an arbitrary JSON-LD document as idiomatic JSON because there are many ways to express the same data that are logically equivalent but structurally distinct. Several schema languages exist for describing and validating JSON data, such as JSON Schema and Apache Avro data serialization system, however none understand linked data. As a result, to fully take advantage of JSON-LD to build the next generation of linked data applications, one must maintain separate JSON schema, JSON-LD context, RDF schema, and human documentation, despite significant overlap of content and obvious need for these documents to stay synchronized. Schema Salad is designed to address this gap. It provides a schema language and processing rules for describing structured JSON content permitting URI resolution and strict document validation. The schema language supports linked data through annotations that describe the linked data interpretation of the content, enables generation of JSON-LD context and RDF schema, and production of RDF triples by applying the JSON-LD context. The schema language also provides for robust support of inline documentation. .. _JSON-LD: http://json-ld.org .. _Avro: http://avro.apache.org .. _metaschema: https://github.com/common-workflow-language/schema_salad/blob/master/schema_salad/metaschema/metaschema.yml .. _specification: http://www.commonwl.org/v1.0/SchemaSalad.html .. _Language: https://github.com/common-workflow-language/common-workflow-language/blob/master/v1.0/CommandLineTool.yml .. _RDF: https://www.w3.org/RDF/ schema-salad-2.6.20171201034858/gittaggers.py0000644000175100017510000000142712706153124020274 0ustar peterpeter00000000000000from setuptools.command.egg_info import egg_info import subprocess import time class EggInfoFromGit(egg_info): """Tag the build with git commit timestamp. If a build tag has already been set (e.g., "egg_info -b", building from source package), leave it alone. """ def git_timestamp_tag(self): gitinfo = subprocess.check_output( ['git', 'log', '--first-parent', '--max-count=1', '--format=format:%ct', '.']).strip() return time.strftime('.%Y%m%d%H%M%S', time.gmtime(int(gitinfo))) def tags(self): if self.tag_build is None: try: self.tag_build = self.git_timestamp_tag() except (subprocess.CalledProcessError, OSError): pass return egg_info.tags(self) schema-salad-2.6.20171201034858/MANIFEST.in0000644000175100017510000000045713203345013017313 0ustar peterpeter00000000000000include gittaggers.py Makefile include schema_salad/tests/* include schema_salad/tests/test_schema/*.md include schema_salad/tests/test_schema/*.yml include schema_salad/tests/test_schema/*.cwl include schema_salad/metaschema/* include schema_salad/tests/docimp/* global-exclude *~ global-exclude *.pyc schema-salad-2.6.20171201034858/schema_salad/0000755000175100017510000000000013211573301020155 5ustar peterpeter00000000000000schema-salad-2.6.20171201034858/schema_salad/java_codegen.py0000644000175100017510000001164113203345013023135 0ustar peterpeter00000000000000import json import sys import six from six.moves import urllib, cStringIO import collections import logging from pkg_resources import resource_stream from .utils import aslist, flatten from . import schema from .codegen_base import TypeDef, CodeGenBase, shortname from typing import Text import os class JavaCodeGen(CodeGenBase): def __init__(self, base): # type: (Text) -> None super(JavaCodeGen, self).__init__() sp = urllib.parse.urlsplit(base) self.package = ".".join(list(reversed(sp.netloc.split("."))) + sp.path.strip("/").split("/")) self.outdir = self.package.replace(".", "/") def prologue(self): if not os.path.exists(self.outdir): os.makedirs(self.outdir) def safe_name(self, n): avn = schema.avro_name(n) if avn in ("class", "extends", "abstract"): # reserved words avn = avn+"_" return avn def interface_name(self, n): return self.safe_name(n) def begin_class(self, classname, extends, doc, abstract): cls = self.interface_name(classname) self.current_class = cls self.current_class_is_abstract = abstract self.current_loader = cStringIO() self.current_fields = cStringIO() with open(os.path.join(self.outdir, "%s.java" % cls), "w") as f: if extends: ext = "extends " + ", ".join(self.interface_name(e) for e in extends) else: ext = "" f.write("""package {package}; public interface {cls} {ext} {{ """. format(package=self.package, cls=cls, ext=ext)) if self.current_class_is_abstract: return with open(os.path.join(self.outdir, "%sImpl.java" % cls), "w") as f: f.write("""package {package}; public class {cls}Impl implements {cls} {{ """. format(package=self.package, cls=cls, ext=ext)) self.current_loader.write(""" void Load() { """) def end_class(self, classname): with open(os.path.join(self.outdir, "%s.java" % self.current_class), "a") as f: f.write(""" } """) if self.current_class_is_abstract: return self.current_loader.write(""" } """) with open(os.path.join(self.outdir, "%sImpl.java" % self.current_class), "a") as f: f.write(self.current_fields.getvalue()) f.write(self.current_loader.getvalue()) f.write(""" } """) prims = { u"http://www.w3.org/2001/XMLSchema#string": TypeDef("String", "Support.StringLoader()"), u"http://www.w3.org/2001/XMLSchema#int": TypeDef("Integer", "Support.IntLoader()"), u"http://www.w3.org/2001/XMLSchema#long": TypeDef("Long", "Support.LongLoader()"), u"http://www.w3.org/2001/XMLSchema#float": TypeDef("Float", "Support.FloatLoader()"), u"http://www.w3.org/2001/XMLSchema#double": TypeDef("Double", "Support.DoubleLoader()"), u"http://www.w3.org/2001/XMLSchema#boolean": TypeDef("Boolean", "Support.BoolLoader()"), u"https://w3id.org/cwl/salad#null": TypeDef("null_type", "Support.NullLoader()"), u"https://w3id.org/cwl/salad#Any": TypeDef("Any_type", "Support.AnyLoader()") } def type_loader(self, t): if isinstance(t, list) and len(t) == 2: if t[0] == "https://w3id.org/cwl/salad#null": t = t[1] if isinstance(t, basestring): if t in self.prims: return self.prims[t] return TypeDef("Object", "") def declare_field(self, name, typedef, doc, optional): fieldname = self.safe_name(name) with open(os.path.join(self.outdir, "%s.java" % self.current_class), "a") as f: f.write(""" {type} get{capfieldname}(); """. format(fieldname=fieldname, capfieldname=fieldname[0].upper() + fieldname[1:], type=typedef.name)) if self.current_class_is_abstract: return self.current_fields.write(""" private {type} {fieldname}; public {type} get{capfieldname}() {{ return this.{fieldname}; }} """. format(fieldname=fieldname, capfieldname=fieldname[0].upper() + fieldname[1:], type=typedef.name)) self.current_loader.write(""" this.{fieldname} = null; // TODO: loaders """. format(fieldname=fieldname)) def declare_id_field(self, name, typedef, doc): pass def uri_loader(self, inner, scoped_id, vocab_term, refScope): return inner def idmap_loader(self, field, inner, mapSubject, mapPredicate): return inner def typedsl_loader(self, inner, refScope): return inner def epilogue(self, rootLoader): pass schema-salad-2.6.20171201034858/schema_salad/ref_resolver.py0000644000175100017510000013322413210150510023221 0ustar peterpeter00000000000000from __future__ import absolute_import import sys import os import json import hashlib import logging import collections from io import open import six from six.moves import range from six.moves import urllib from six import StringIO import re import copy from . import validate from .utils import aslist, flatten from .sourceline import SourceLine, add_lc_filename, relname import requests from cachecontrol.wrapper import CacheControl from cachecontrol.caches import FileCache import ruamel.yaml as yaml from ruamel.yaml.comments import CommentedSeq, CommentedMap import rdflib from rdflib import Graph from rdflib.namespace import RDF, RDFS, OWL from rdflib.plugins.parsers.notation3 import BadSyntax import xml.sax from typing import (cast, Any, AnyStr, Callable, Dict, List, Iterable, Optional, Set, Text, Tuple, TypeVar, Union) _logger = logging.getLogger("salad") ContextType = Dict[six.text_type, Union[Dict, six.text_type, Iterable[six.text_type]]] DocumentType = TypeVar('DocumentType', CommentedSeq, CommentedMap) DocumentOrStrType = TypeVar( 'DocumentOrStrType', CommentedSeq, CommentedMap, six.text_type) _re_drive = re.compile(r"/([a-zA-Z]):") def file_uri(path, split_frag=False): # type: (str, bool) -> str if path.startswith("file://"): return path if split_frag: pathsp = path.split("#", 2) frag = "#" + urllib.parse.quote(str(pathsp[1])) if len(pathsp) == 2 else "" urlpath = urllib.request.pathname2url(str(pathsp[0])) else: urlpath = urllib.request.pathname2url(path) frag = "" if urlpath.startswith("//"): return "file:%s%s" % (urlpath, frag) else: return "file://%s%s" % (urlpath, frag) def uri_file_path(url): # type: (str) -> str split = urllib.parse.urlsplit(url) if split.scheme == "file": return urllib.request.url2pathname( str(split.path)) + ("#" + urllib.parse.unquote(str(split.fragment)) if bool(split.fragment) else "") else: raise ValueError("Not a file URI") class NormDict(CommentedMap): def __init__(self, normalize=six.text_type): # type: (Callable) -> None super(NormDict, self).__init__() self.normalize = normalize def __getitem__(self, key): # type: (Any) -> Any return super(NormDict, self).__getitem__(self.normalize(key)) def __setitem__(self, key, value): # type: (Any, Any) -> Any return super(NormDict, self).__setitem__(self.normalize(key), value) def __delitem__(self, key): # type: (Any) -> Any return super(NormDict, self).__delitem__(self.normalize(key)) def __contains__(self, key): # type: (Any) -> Any return super(NormDict, self).__contains__(self.normalize(key)) def merge_properties(a, b): # type: (List[Any], List[Any]) -> Dict[Any, Any] c = {} for i in a: if i not in b: c[i] = a[i] for i in b: if i not in a: c[i] = b[i] for i in a: if i in b: c[i] = aslist(a[i]) + aslist(b[i]) return c def SubLoader(loader): # type: (Loader) -> Loader return Loader(loader.ctx, schemagraph=loader.graph, foreign_properties=loader.foreign_properties, idx=loader.idx, cache=loader.cache, fetcher_constructor=loader.fetcher_constructor, skip_schemas=loader.skip_schemas) class Fetcher(object): def fetch_text(self, url): # type: (Text) -> Text raise NotImplementedError() def check_exists(self, url): # type: (Text) -> bool raise NotImplementedError() def urljoin(self, base_url, url): # type: (Text, Text) -> Text raise NotImplementedError() class DefaultFetcher(Fetcher): def __init__(self, cache, # type: Dict[Text, Union[Text, bool]] session # type: Optional[requests.sessions.Session] ): # type: (...) -> None self.cache = cache self.session = session def fetch_text(self, url): # type: (Text) -> Text if url in self.cache and self.cache[url] is not True: # treat "True" as a placeholder that indicates something exists but # not necessarily what its contents is. return cast(Text, self.cache[url]) split = urllib.parse.urlsplit(url) scheme, path = split.scheme, split.path if scheme in [u'http', u'https'] and self.session is not None: try: resp = self.session.get(url) resp.raise_for_status() except Exception as e: raise RuntimeError(url, e) return resp.text elif scheme == 'file': try: # On Windows, url.path will be /drive:/path ; on Unix systems, # /path. As we want drive:/path instead of /drive:/path on Windows, # remove the leading /. if os.path.isabs(path[1:]): # checking if pathis valid after removing front / or not path = path[1:] with open(urllib.request.url2pathname(str(path)), encoding='utf-8') as fp: return fp.read() except (OSError, IOError) as e: if e.filename == path: raise RuntimeError(six.text_type(e)) else: raise RuntimeError('Error reading %s: %s' % (url, e)) else: raise ValueError('Unsupported scheme in url: %s' % url) def check_exists(self, url): # type: (Text) -> bool if url in self.cache: return True split = urllib.parse.urlsplit(url) scheme, path = split.scheme, split.path if scheme in [u'http', u'https'] and self.session is not None: try: resp = self.session.head(url) resp.raise_for_status() except Exception as e: return False self.cache[url] = True return True elif scheme == 'file': return os.path.exists(urllib.request.url2pathname(str(path))) else: raise ValueError('Unsupported scheme in url: %s' % url) def urljoin(self, base_url, url): # type: (Text, Text) -> Text basesplit = urllib.parse.urlsplit(base_url) split = urllib.parse.urlsplit(url) if (basesplit.scheme and basesplit.scheme != "file" and split.scheme == "file"): raise ValueError("Not resolving potential remote exploit %s from base %s" % (url, base_url)) if sys.platform == 'win32': if (base_url == url): return url basesplit = urllib.parse.urlsplit(base_url) # note that below might split # "C:" with "C" as URI scheme split = urllib.parse.urlsplit(url) has_drive = split.scheme and len(split.scheme) == 1 if basesplit.scheme == "file": # Special handling of relative file references on Windows # as urllib seems to not be quite up to the job # netloc MIGHT appear in equivalents of UNC Strings # \\server1.example.com\path as # file:///server1.example.com/path # https://tools.ietf.org/html/rfc8089#appendix-E.3.2 # (TODO: test this) netloc = split.netloc or basesplit.netloc # Check if url is a local path like "C:/Users/fred" # or actually an absolute URI like http://example.com/fred if has_drive: # Assume split.scheme is actually a drive, e.g. "C:" # so we'll recombine into a path path_with_drive = urllib.parse.urlunsplit((split.scheme, '', split.path,'', '')) # Compose new file:/// URI with path_with_drive # .. carrying over any #fragment (?query just in case..) return urllib.parse.urlunsplit(("file", netloc, path_with_drive, split.query, split.fragment)) if (not split.scheme and not netloc and split.path and split.path.startswith("/")): # Relative - but does it have a drive? base_drive = _re_drive.match(basesplit.path) drive = _re_drive.match(split.path) if base_drive and not drive: # Keep drive letter from base_url # https://tools.ietf.org/html/rfc8089#appendix-E.2.1 # e.g. urljoin("file:///D:/bar/a.txt", "/foo/b.txt") == file:///D:/foo/b.txt path_with_drive = "/%s:%s" % (base_drive.group(1), split.path) return urllib.parse.urlunsplit(("file", netloc, path_with_drive, split.query, split.fragment)) # else: fall-through to resolve as relative URI elif has_drive: # Base is http://something but url is C:/something - which urllib would wrongly # resolve as an absolute path that could later be used to access local files raise ValueError("Not resolving potential remote exploit %s from base %s" % (url, base_url)) return urllib.parse.urljoin(base_url, url) class Loader(object): def __init__(self, ctx, # type: ContextType schemagraph=None, # type: rdflib.graph.Graph foreign_properties=None, # type: Set[Text] idx=None, # type: Dict[Text, Union[CommentedMap, CommentedSeq, Text, None]] cache=None, # type: Dict[Text, Any] session=None, # type: requests.sessions.Session fetcher_constructor=None, # type: Callable[[Dict[Text, Union[Text, bool]], requests.sessions.Session], Fetcher] skip_schemas=None # type: bool ): # type: (...) -> None normalize = lambda url: urllib.parse.urlsplit(url).geturl() if idx is not None: self.idx = idx else: self.idx = NormDict(normalize) self.ctx = {} # type: ContextType if schemagraph is not None: self.graph = schemagraph else: self.graph = rdflib.graph.Graph() if foreign_properties is not None: self.foreign_properties = foreign_properties else: self.foreign_properties = set() if cache is not None: self.cache = cache else: self.cache = {} if skip_schemas is not None: self.skip_schemas = skip_schemas else: self.skip_schemas = False if session is None: if "HOME" in os.environ: self.session = CacheControl( requests.Session(), cache=FileCache(os.path.join(os.environ["HOME"], ".cache", "salad"))) elif "TMP" in os.environ: self.session = CacheControl( requests.Session(), cache=FileCache(os.path.join(os.environ["TMP"], ".cache", "salad"))) else: self.session = CacheControl( requests.Session(), cache=FileCache("/tmp", ".cache", "salad")) else: self.session = session if fetcher_constructor is not None: self.fetcher_constructor = fetcher_constructor else: self.fetcher_constructor = DefaultFetcher self.fetcher = self.fetcher_constructor(self.cache, self.session) self.fetch_text = self.fetcher.fetch_text self.check_exists = self.fetcher.check_exists self.url_fields = set() # type: Set[Text] self.scoped_ref_fields = {} # type: Dict[Text, int] self.vocab_fields = set() # type: Set[Text] self.identifiers = [] # type: List[Text] self.identity_links = set() # type: Set[Text] self.standalone = None # type: Optional[Set[Text]] self.nolinkcheck = set() # type: Set[Text] self.vocab = {} # type: Dict[Text, Text] self.rvocab = {} # type: Dict[Text, Text] self.idmap = {} # type: Dict[Text, Any] self.mapPredicate = {} # type: Dict[Text, Text] self.type_dsl_fields = set() # type: Set[Text] self.add_context(ctx) def expand_url(self, url, # type: Text base_url, # type: Text scoped_id=False, # type: bool vocab_term=False, # type: bool scoped_ref=None # type: int ): # type: (...) -> Text if url in (u"@id", u"@type"): return url if vocab_term and url in self.vocab: return url if bool(self.vocab) and u":" in url: prefix = url.split(u":")[0] if prefix in self.vocab: url = self.vocab[prefix] + url[len(prefix) + 1:] split = urllib.parse.urlsplit(url) if ((bool(split.scheme) and split.scheme in [u'http', u'https', u'file']) or url.startswith(u"$(") or url.startswith(u"${")): pass elif scoped_id and not bool(split.fragment): splitbase = urllib.parse.urlsplit(base_url) frg = u"" if bool(splitbase.fragment): frg = splitbase.fragment + u"/" + split.path else: frg = split.path pt = splitbase.path if splitbase.path != '' else "/" url = urllib.parse.urlunsplit( (splitbase.scheme, splitbase.netloc, pt, splitbase.query, frg)) elif scoped_ref is not None and not split.fragment: pass else: url = self.fetcher.urljoin(base_url, url) if vocab_term and url in self.rvocab: return self.rvocab[url] else: return url def _add_properties(self, s): # type: (Text) -> None for _, _, rng in self.graph.triples((s, RDFS.range, None)): literal = ((six.text_type(rng).startswith( u"http://www.w3.org/2001/XMLSchema#") and not six.text_type(rng) == u"http://www.w3.org/2001/XMLSchema#anyURI") or six.text_type(rng) == u"http://www.w3.org/2000/01/rdf-schema#Literal") if not literal: self.url_fields.add(six.text_type(s)) self.foreign_properties.add(six.text_type(s)) def add_namespaces(self, ns): # type: (Dict[Text, Text]) -> None self.vocab.update(ns) def add_schemas(self, ns, base_url): # type: (Union[List[Text], Text], Text) -> None if self.skip_schemas: return for sch in aslist(ns): try: fetchurl = self.fetcher.urljoin(base_url, sch) if fetchurl not in self.cache or self.cache[fetchurl] is True: _logger.debug("Getting external schema %s", fetchurl) content = self.fetch_text(fetchurl) self.cache[fetchurl] = rdflib.graph.Graph() for fmt in ['xml', 'turtle', 'rdfa']: try: self.cache[fetchurl].parse(data=content, format=fmt, publicID=str(fetchurl)) self.graph += self.cache[fetchurl] break except xml.sax.SAXParseException: pass except TypeError: pass except BadSyntax: pass except Exception as e: _logger.warn("Could not load extension schema %s: %s", fetchurl, e) for s, _, _ in self.graph.triples((None, RDF.type, RDF.Property)): self._add_properties(s) for s, _, o in self.graph.triples((None, RDFS.subPropertyOf, None)): self._add_properties(s) self._add_properties(o) for s, _, _ in self.graph.triples((None, RDFS.range, None)): self._add_properties(s) for s, _, _ in self.graph.triples((None, RDF.type, OWL.ObjectProperty)): self._add_properties(s) for s, _, _ in self.graph.triples((None, None, None)): self.idx[six.text_type(s)] = None def add_context(self, newcontext, baseuri=""): # type: (ContextType, Text) -> None if bool(self.vocab): raise validate.ValidationException( "Refreshing context that already has stuff in it") self.url_fields = set(("$schemas",)) self.scoped_ref_fields = {} self.vocab_fields = set() self.identifiers = [] self.identity_links = set() self.standalone = set() self.nolinkcheck = set() self.idmap = {} self.mapPredicate = {} self.vocab = {} self.rvocab = {} self.type_dsl_fields = set() self.ctx.update(_copy_dict_without_key(newcontext, u"@context")) _logger.debug("ctx is %s", self.ctx) for key, value in self.ctx.items(): if value == u"@id": self.identifiers.append(key) self.identity_links.add(key) elif isinstance(value, dict) and value.get(u"@type") == u"@id": self.url_fields.add(key) if u"refScope" in value: self.scoped_ref_fields[key] = value[u"refScope"] if value.get(u"identity", False): self.identity_links.add(key) elif isinstance(value, dict) and value.get(u"@type") == u"@vocab": self.url_fields.add(key) self.vocab_fields.add(key) if u"refScope" in value: self.scoped_ref_fields[key] = value[u"refScope"] if value.get(u"typeDSL"): self.type_dsl_fields.add(key) if isinstance(value, dict) and value.get(u"noLinkCheck"): self.nolinkcheck.add(key) if isinstance(value, dict) and value.get(u"mapSubject"): self.idmap[key] = value[u"mapSubject"] if isinstance(value, dict) and value.get(u"mapPredicate"): self.mapPredicate[key] = value[u"mapPredicate"] if isinstance(value, dict) and u"@id" in value: self.vocab[key] = value[u"@id"] elif isinstance(value, six.string_types): self.vocab[key] = value for k, v in self.vocab.items(): self.rvocab[self.expand_url(v, u"", scoped_id=False)] = k self.identifiers.sort() _logger.debug("identifiers is %s", self.identifiers) _logger.debug("identity_links is %s", self.identity_links) _logger.debug("url_fields is %s", self.url_fields) _logger.debug("vocab_fields is %s", self.vocab_fields) _logger.debug("vocab is %s", self.vocab) def resolve_ref(self, ref, # type: Union[CommentedMap, CommentedSeq, Text] base_url=None, # type: Text checklinks=True # type: bool ): # type: (...) -> Tuple[Union[CommentedMap, CommentedSeq, Text, None], Dict[Text, Any]] lref = ref # type: Union[CommentedMap, CommentedSeq, Text, None] obj = None # type: Optional[CommentedMap] resolved_obj = None # type: Optional[Union[CommentedMap, CommentedSeq, Text]] inc = False mixin = None # type: Optional[Dict[Text, Any]] if not base_url: base_url = file_uri(os.getcwd()) + "/" sl = SourceLine(obj, None, ValueError) # If `ref` is a dict, look for special directives. if isinstance(lref, CommentedMap): obj = lref if "$import" in obj: sl = SourceLine(obj, "$import", RuntimeError) if len(obj) == 1: lref = obj[u"$import"] obj = None else: raise sl.makeError( u"'$import' must be the only field in %s" % (six.text_type(obj))) elif "$include" in obj: sl = SourceLine(obj, "$include", RuntimeError) if len(obj) == 1: lref = obj[u"$include"] inc = True obj = None else: raise sl.makeError( u"'$include' must be the only field in %s" % (six.text_type(obj))) elif "$mixin" in obj: sl = SourceLine(obj, "$mixin", RuntimeError) lref = obj[u"$mixin"] mixin = obj obj = None else: lref = None for identifier in self.identifiers: if identifier in obj: lref = obj[identifier] break if not lref: raise sl.makeError( u"Object `%s` does not have identifier field in %s" % (relname(obj), self.identifiers)) if not isinstance(lref, (str, six.text_type)): raise ValueError(u"Expected CommentedMap or string, got %s: `%s`" % (type(lref), six.text_type(lref))) if isinstance(lref, (str, six.text_type)) and os.sep == "\\": # Convert Windows path separator in ref lref = lref.replace("\\", "/") url = self.expand_url(lref, base_url, scoped_id=(obj is not None)) # Has this reference been loaded already? if url in self.idx and (not mixin): return self.idx[url], {} sl.raise_type = RuntimeError with sl: # "$include" directive means load raw text if inc: return self.fetch_text(url), {} doc = None if isinstance(obj, collections.MutableMapping): for identifier in self.identifiers: obj[identifier] = url doc_url = url else: # Load structured document doc_url, frg = urllib.parse.urldefrag(url) if doc_url in self.idx and (not mixin): # If the base document is in the index, it was already loaded, # so if we didn't find the reference earlier then it must not # exist. raise validate.ValidationException( u"Reference `#%s` not found in file `%s`." % (frg, doc_url)) doc = self.fetch(doc_url, inject_ids=(not mixin)) # Recursively expand urls and resolve directives if bool(mixin): doc = copy.deepcopy(doc) doc.update(mixin) del doc["$mixin"] resolved_obj, metadata = self.resolve_all( doc, base_url, file_base=doc_url, checklinks=checklinks) else: resolved_obj, metadata = self.resolve_all( doc if doc else obj, doc_url, checklinks=checklinks) # Requested reference should be in the index now, otherwise it's a bad # reference if not bool(mixin): if url in self.idx: resolved_obj = self.idx[url] else: raise RuntimeError( "Reference `%s` is not in the index. Index contains:\n %s" % (url, "\n ".join(self.idx))) if isinstance(resolved_obj, CommentedMap): if u"$graph" in resolved_obj: metadata = _copy_dict_without_key(resolved_obj, u"$graph") return resolved_obj[u"$graph"], metadata else: return resolved_obj, metadata else: return resolved_obj, metadata def _resolve_idmap(self, document, # type: CommentedMap loader # type: Loader ): # type: (...) -> None # Convert fields with mapSubject into lists # use mapPredicate if the mapped value isn't a dict. for idmapField in loader.idmap: if (idmapField in document): idmapFieldValue = document[idmapField] if (isinstance(idmapFieldValue, dict) and "$import" not in idmapFieldValue and "$include" not in idmapFieldValue): ls = CommentedSeq() for k in sorted(idmapFieldValue.keys()): val = idmapFieldValue[k] v = None # type: Optional[CommentedMap] if not isinstance(val, CommentedMap): if idmapField in loader.mapPredicate: v = CommentedMap( ((loader.mapPredicate[idmapField], val),)) v.lc.add_kv_line_col( loader.mapPredicate[idmapField], document[idmapField].lc.data[k]) v.lc.filename = document.lc.filename else: raise validate.ValidationException( "mapSubject '%s' value '%s' is not a dict" "and does not have a mapPredicate", k, v) else: v = val v[loader.idmap[idmapField]] = k v.lc.add_kv_line_col(loader.idmap[idmapField], document[idmapField].lc.data[k]) v.lc.filename = document.lc.filename ls.lc.add_kv_line_col( len(ls), document[idmapField].lc.data[k]) ls.lc.filename = document.lc.filename ls.append(v) document[idmapField] = ls typeDSLregex = re.compile(u"^([^[?]+)(\[\])?(\?)?$") def _type_dsl(self, t, # type: Union[Text, Dict, List] lc, filename): # type: (...) -> Union[Text, Dict[Text, Text], List[Union[Text, Dict[Text, Text]]]] if not isinstance(t, (str, six.text_type)): return t m = Loader.typeDSLregex.match(t) if not m: return t first = m.group(1) second = third = None if bool(m.group(2)): second = CommentedMap((("type", "array"), ("items", first))) second.lc.add_kv_line_col("type", lc) second.lc.add_kv_line_col("items", lc) second.lc.filename = filename if bool(m.group(3)): third = CommentedSeq([u"null", second or first]) third.lc.add_kv_line_col(0, lc) third.lc.add_kv_line_col(1, lc) third.lc.filename = filename return third or second or first def _resolve_type_dsl(self, document, # type: CommentedMap loader # type: Loader ): # type: (...) -> None for d in loader.type_dsl_fields: if d in document: datum2 = datum = document[d] if isinstance(datum, (str, six.text_type)): datum2 = self._type_dsl(datum, document.lc.data[ d], document.lc.filename) elif isinstance(datum, CommentedSeq): datum2 = CommentedSeq() for n, t in enumerate(datum): datum2.lc.add_kv_line_col( len(datum2), datum.lc.data[n]) datum2.append(self._type_dsl( t, datum.lc.data[n], document.lc.filename)) if isinstance(datum2, CommentedSeq): datum3 = CommentedSeq() seen = [] # type: List[Text] for i, item in enumerate(datum2): if isinstance(item, CommentedSeq): for j, v in enumerate(item): if v not in seen: datum3.lc.add_kv_line_col( len(datum3), item.lc.data[j]) datum3.append(v) seen.append(v) else: if item not in seen: datum3.lc.add_kv_line_col( len(datum3), datum2.lc.data[i]) datum3.append(item) seen.append(item) document[d] = datum3 else: document[d] = datum2 def _resolve_identifier(self, document, loader, base_url): # type: (CommentedMap, Loader, Text) -> Text # Expand identifier field (usually 'id') to resolve scope for identifer in loader.identifiers: if identifer in document: if isinstance(document[identifer], six.string_types): document[identifer] = loader.expand_url( document[identifer], base_url, scoped_id=True) if (document[identifer] not in loader.idx or isinstance( loader.idx[document[identifer]], six.string_types)): loader.idx[document[identifer]] = document base_url = document[identifer] else: raise validate.ValidationException( "identifier field '%s' must be a string" % (document[identifer])) return base_url def _resolve_identity(self, document, loader, base_url): # type: (Dict[Text, List[Text]], Loader, Text) -> None # Resolve scope for identity fields (fields where the value is the # identity of a standalone node, such as enum symbols) for identifer in loader.identity_links: if identifer in document and isinstance(document[identifer], list): for n, v in enumerate(document[identifer]): if isinstance(document[identifer][n], six.string_types): document[identifer][n] = loader.expand_url( document[identifer][n], base_url, scoped_id=True) if document[identifer][n] not in loader.idx: loader.idx[document[identifer][ n]] = document[identifer][n] def _normalize_fields(self, document, loader): # type: (Dict[Text, Text], Loader) -> None # Normalize fields which are prefixed or full URIn to vocabulary terms for d in list(document.keys()): d2 = loader.expand_url(d, u"", scoped_id=False, vocab_term=True) if d != d2: document[d2] = document[d] del document[d] def _resolve_uris(self, document, # type: Dict[Text, Union[Text, List[Text]]] loader, # type: Loader base_url # type: Text ): # type: (...) -> None # Resolve remaining URLs based on document base for d in loader.url_fields: if d in document: datum = document[d] if isinstance(datum, (str, six.text_type)): document[d] = loader.expand_url( datum, base_url, scoped_id=False, vocab_term=(d in loader.vocab_fields), scoped_ref=self.scoped_ref_fields.get(d)) elif isinstance(datum, list): for i, url in enumerate(datum): if isinstance(url, (str, six.text_type)): datum[i] = loader.expand_url( url, base_url, scoped_id=False, vocab_term=(d in loader.vocab_fields), scoped_ref=self.scoped_ref_fields.get(d)) def resolve_all(self, document, # type: Union[CommentedMap, CommentedSeq] base_url, # type: Text file_base=None, # type: Text checklinks=True # type: bool ): # type: (...) -> Tuple[Union[CommentedMap, CommentedSeq, Text, None], Dict[Text, Any]] loader = self metadata = CommentedMap() # type: CommentedMap if file_base is None: file_base = base_url if isinstance(document, CommentedMap): # Handle $import and $include if (u'$import' in document or u'$include' in document): return self.resolve_ref( document, base_url=file_base, checklinks=checklinks) elif u'$mixin' in document: return self.resolve_ref( document, base_url=base_url, checklinks=checklinks) elif isinstance(document, CommentedSeq): pass elif isinstance(document, (list, dict)): raise Exception("Expected CommentedMap or CommentedSeq, got %s: `%s`" % (type(document), document)) else: return (document, metadata) newctx = None # type: Optional[Loader] if isinstance(document, CommentedMap): # Handle $base, $profile, $namespaces, $schemas and $graph if u"$base" in document: base_url = document[u"$base"] if u"$profile" in document: if newctx is None: newctx = SubLoader(self) prof = self.fetch(document[u"$profile"]) newctx.add_namespaces(document.get(u"$namespaces", {})) newctx.add_schemas(document.get( u"$schemas", []), document[u"$profile"]) if u"$namespaces" in document: if newctx is None: newctx = SubLoader(self) newctx.add_namespaces(document[u"$namespaces"]) if u"$schemas" in document: if newctx is None: newctx = SubLoader(self) newctx.add_schemas(document[u"$schemas"], file_base) if newctx is not None: loader = newctx if u"$graph" in document: metadata = _copy_dict_without_key(document, u"$graph") document = document[u"$graph"] resolved_metadata = loader.resolve_all( metadata, base_url, file_base=file_base, checklinks=False)[0] if isinstance(resolved_metadata, dict): metadata = resolved_metadata else: raise validate.ValidationException( "Validation error, metadata must be dict: %s" % (resolved_metadata)) if isinstance(document, CommentedMap): self._normalize_fields(document, loader) self._resolve_idmap(document, loader) self._resolve_type_dsl(document, loader) base_url = self._resolve_identifier(document, loader, base_url) self._resolve_identity(document, loader, base_url) self._resolve_uris(document, loader, base_url) try: for key, val in document.items(): document[key], _ = loader.resolve_all( val, base_url, file_base=file_base, checklinks=False) except validate.ValidationException as v: _logger.warn("loader is %s", id(loader), exc_info=True) raise validate.ValidationException("(%s) (%s) Validation error in field %s:\n%s" % ( id(loader), file_base, key, validate.indent(six.text_type(v)))) elif isinstance(document, CommentedSeq): i = 0 try: while i < len(document): val = document[i] if isinstance(val, CommentedMap) and (u"$import" in val or u"$mixin" in val): l, _ = loader.resolve_ref( val, base_url=file_base, checklinks=False) if isinstance(l, CommentedSeq): lc = document.lc.data[i] del document[i] llen = len(l) for j in range(len(document) + llen, i + llen, -1): document.lc.data[ j - 1] = document.lc.data[j - llen] for item in l: document.insert(i, item) document.lc.data[i] = lc i += 1 else: document[i] = l i += 1 else: document[i], _ = loader.resolve_all( val, base_url, file_base=file_base, checklinks=False) i += 1 except validate.ValidationException as v: _logger.warn("failed", exc_info=True) raise validate.ValidationException("(%s) (%s) Validation error in position %i:\n%s" % ( id(loader), file_base, i, validate.indent(six.text_type(v)))) for identifer in loader.identity_links: if identifer in metadata: if isinstance(metadata[identifer], (str, six.text_type)): metadata[identifer] = loader.expand_url( metadata[identifer], base_url, scoped_id=True) loader.idx[metadata[identifer]] = document if checklinks: all_doc_ids={} # type: Dict[Text, Text] self.validate_links(document, u"", all_doc_ids) return document, metadata def fetch(self, url, inject_ids=True): # type: (Text, bool) -> Any if url in self.idx: return self.idx[url] try: text = self.fetch_text(url) if isinstance(text, bytes): textIO = StringIO(text.decode('utf-8')) else: textIO = StringIO(text) textIO.name = url # type: ignore result = yaml.round_trip_load(textIO) add_lc_filename(result, url) except yaml.parser.ParserError as e: raise validate.ValidationException("Syntax error %s" % (e)) if (isinstance(result, CommentedMap) and inject_ids and bool(self.identifiers)): for identifier in self.identifiers: if identifier not in result: result[identifier] = url self.idx[self.expand_url(result[identifier], url)] = result else: self.idx[url] = result return result FieldType = TypeVar('FieldType', six.text_type, CommentedSeq, CommentedMap) def validate_scoped(self, field, link, docid): # type: (Text, Text, Text) -> Text split = urllib.parse.urlsplit(docid) sp = split.fragment.split(u"/") n = self.scoped_ref_fields[field] while n > 0 and len(sp) > 0: sp.pop() n -= 1 tried = [] while True: sp.append(link) url = urllib.parse.urlunsplit(( split.scheme, split.netloc, split.path, split.query, u"/".join(sp))) tried.append(url) if url in self.idx: return url sp.pop() if len(sp) == 0: break sp.pop() raise validate.ValidationException( "Field `%s` references unknown identifier `%s`, tried %s" % (field, link, ", ".join(tried))) def validate_link(self, field, link, docid, all_doc_ids): # type: (Text, FieldType, Text, Dict[Text, Text]) -> FieldType if field in self.nolinkcheck: return link if isinstance(link, (str, six.text_type)): if field in self.vocab_fields: if (link not in self.vocab and link not in self.idx and link not in self.rvocab): if field in self.scoped_ref_fields: return self.validate_scoped(field, link, docid) elif not self.check_exists(link): raise validate.ValidationException( "Field `%s` contains undefined reference to `%s`" % (field, link)) elif link not in self.idx and link not in self.rvocab: if field in self.scoped_ref_fields: return self.validate_scoped(field, link, docid) elif not self.check_exists(link): raise validate.ValidationException( "Field `%s` contains undefined reference to `%s`" % (field, link)) elif isinstance(link, CommentedSeq): errors = [] for n, i in enumerate(link): try: link[n] = self.validate_link(field, i, docid, all_doc_ids) except validate.ValidationException as v: errors.append(v) if bool(errors): raise validate.ValidationException( "\n".join([six.text_type(e) for e in errors])) elif isinstance(link, CommentedMap): self.validate_links(link, docid, all_doc_ids) else: raise validate.ValidationException( "`%s` field is %s, expected string, list, or a dict." % (field, type(link).__name__)) return link def getid(self, d): # type: (Any) -> Optional[Text] if isinstance(d, dict): for i in self.identifiers: if i in d: idd = d[i] if isinstance(idd, (str, six.text_type)): return idd return None def validate_links(self, document, base_url, all_doc_ids): # type: (Union[CommentedMap, CommentedSeq, Text, None], Text, Dict[Text, Text]) -> None docid = self.getid(document) if not docid: docid = base_url errors = [] # type: List[Exception] iterator = None # type: Any if isinstance(document, list): iterator = enumerate(document) elif isinstance(document, dict): try: for d in self.url_fields: sl = SourceLine(document, d, validate.ValidationException) if d in document and d not in self.identity_links: document[d] = self.validate_link(d, document[d], docid, all_doc_ids) for identifier in self.identifiers: # validate that each id is defined uniquely if identifier in document: sl = SourceLine(document, identifier, validate.ValidationException) if document[identifier] in all_doc_ids and sl.makeLead() != all_doc_ids[document[identifier]]: raise validate.ValidationException( "%s object %s `%s` previously defined" % (all_doc_ids[document[identifier]], identifier, relname(document[identifier]), )) else: all_doc_ids[document[identifier]] = sl.makeLead() break except validate.ValidationException as v: if d == "$schemas": _logger.warn( validate.indent(six.text_type(v))) else: errors.append(sl.makeError(six.text_type(v))) if hasattr(document, "iteritems"): iterator = six.iteritems(document) else: iterator = list(document.items()) else: return for key, val in iterator: sl = SourceLine(document, key, validate.ValidationException) try: self.validate_links(val, docid, all_doc_ids) except validate.ValidationException as v: if key in self.nolinkcheck or (isinstance(key, six.string_types) and ":" in key): _logger.warn( validate.indent(six.text_type(v))) else: docid2 = self.getid(val) if docid2 is not None: errors.append(sl.makeError("checking object `%s`\n%s" % (relname(docid2), validate.indent(six.text_type(v))))) else: if isinstance(key, six.string_types): errors.append(sl.makeError("checking field `%s`\n%s" % ( key, validate.indent(six.text_type(v))))) else: errors.append(sl.makeError("checking item\n%s" % ( validate.indent(six.text_type(v))))) if bool(errors): if len(errors) > 1: raise validate.ValidationException( u"\n".join([six.text_type(e) for e in errors])) else: raise errors[0] return D = TypeVar('D', CommentedMap, ContextType) def _copy_dict_without_key(from_dict, filtered_key): # type: (D, Any) -> D new_dict = copy.copy(from_dict) if filtered_key in new_dict: del new_dict[filtered_key] if isinstance(from_dict, CommentedMap): new_dict.lc.data = copy.copy(from_dict.lc.data) new_dict.lc.filename = from_dict.lc.filename return new_dict schema-salad-2.6.20171201034858/schema_salad/sourceline.py0000644000175100017510000001500113162734255022710 0ustar peterpeter00000000000000from __future__ import absolute_import import ruamel.yaml from ruamel.yaml.comments import CommentedBase, CommentedMap, CommentedSeq import re import os import traceback from typing import (Any, AnyStr, Callable, cast, Dict, List, Iterable, Tuple, TypeVar, Union, Text) import six lineno_re = re.compile(u"^(.*?:[0-9]+:[0-9]+: )(( *)(.*))") def _add_lc_filename(r, source): # type: (ruamel.yaml.comments.CommentedBase, AnyStr) -> None if isinstance(r, ruamel.yaml.comments.CommentedBase): r.lc.filename = source if isinstance(r, list): for d in r: _add_lc_filename(d, source) elif isinstance(r, dict): for d in six.itervalues(r): _add_lc_filename(d, source) def relname(source): # type: (Text) -> Text if source.startswith("file://"): source = source[7:] source = os.path.relpath(source) return source def add_lc_filename(r, source): # type: (ruamel.yaml.comments.CommentedBase, Text) -> None _add_lc_filename(r, relname(source)) def reflow(text, maxline, shift=""): # type: (Text, int, Text) -> Text if maxline < 20: maxline = 20 if len(text) > maxline: sp = text.rfind(' ', 0, maxline) if sp < 1: sp = text.find(' ', sp+1) if sp == -1: sp = len(text) if sp < len(text): return "%s\n%s%s" % (text[0:sp], shift, reflow(text[sp+1:], maxline, shift)) return text def indent(v, nolead=False, shift=u" ", bullet=u" "): # type: (Text, bool, Text, Text) -> Text if nolead: return v.splitlines()[0] + u"\n".join([shift + l for l in v.splitlines()[1:]]) else: def lineno(i, l): # type: (int, Text) -> Text r = lineno_re.match(l) if bool(r): return r.group(1) + (bullet if i == 0 else shift) + r.group(2) else: return (bullet if i == 0 else shift) + l return u"\n".join([lineno(i, l) for i, l in enumerate(v.splitlines())]) def bullets(textlist, bul): # type: (List[Text], Text) -> Text if len(textlist) == 1: return textlist[0] else: return "\n".join(indent(t, bullet=bul) for t in textlist) def strip_dup_lineno(text, maxline=None): # type: (Text, int) -> Text if maxline is None: maxline = int(os.environ.get("COLUMNS", "100")) pre = None msg = [] for l in text.splitlines(): g = lineno_re.match(l) if not g: msg.append(l) continue shift = len(g.group(1)) + len(g.group(3)) g2 = reflow(g.group(2), maxline-shift, " " * shift) if g.group(1) != pre: pre = g.group(1) msg.append(pre + g2) else: g2 = reflow(g.group(2), maxline-len(g.group(1)), " " * (len(g.group(1))+len(g.group(3)))) msg.append(" " * len(g.group(1)) + g2) return "\n".join(msg) def cmap(d, lc=None, fn=None): # type: (Union[int, float, str, Text, Dict, List], List[int], Text) -> Union[int, float, str, Text, CommentedMap, CommentedSeq] if lc is None: lc = [0, 0, 0, 0] if fn is None: fn = "test" if isinstance(d, CommentedMap): fn = d.lc.filename if hasattr(d.lc, "filename") else fn for k,v in six.iteritems(d): if k in d.lc.data: d[k] = cmap(v, lc=d.lc.data[k], fn=fn) else: d[k] = cmap(v, lc, fn=fn) return d if isinstance(d, CommentedSeq): fn = d.lc.filename if hasattr(d.lc, "filename") else fn for k,v in enumerate(d): if k in d.lc.data: d[k] = cmap(v, lc=d.lc.data[k], fn=fn) else: d[k] = cmap(v, lc, fn=fn) return d if isinstance(d, dict): cm = CommentedMap() for k in sorted(d.keys()): v = d[k] if isinstance(v, CommentedBase): uselc = [v.lc.line, v.lc.col, v.lc.line, v.lc.col] vfn = v.lc.filename if hasattr(v.lc, "filename") else fn else: uselc = lc vfn = fn cm[k] = cmap(v, lc=uselc, fn=vfn) cm.lc.add_kv_line_col(k, uselc) cm.lc.filename = fn return cm if isinstance(d, list): cs = CommentedSeq() for k,v in enumerate(d): if isinstance(v, CommentedBase): uselc = [v.lc.line, v.lc.col, v.lc.line, v.lc.col] vfn = v.lc.filename if hasattr(v.lc, "filename") else fn else: uselc = lc vfn = fn cs.append(cmap(v, lc=uselc, fn=vfn)) cs.lc.add_kv_line_col(k, uselc) cs.lc.filename = fn return cs else: return d class SourceLine(object): def __init__(self, item, key=None, raise_type=six.text_type, include_traceback=False): # type: (Any, Any, Callable, bool) -> None self.item = item self.key = key self.raise_type = raise_type self.include_traceback = include_traceback def __enter__(self): # type: () -> SourceLine return self def __exit__(self, exc_type, # type: Any exc_value, # type: Any tb # type: Any ): # -> Any if not exc_value: return if self.include_traceback: raise self.makeError("\n".join(traceback.format_exception(exc_type, exc_value, tb))) else: raise self.makeError(six.text_type(exc_value)) def makeLead(self): # type: () -> Text if self.key is None or self.item.lc.data is None or self.key not in self.item.lc.data: return "%s:%i:%i:" % (self.item.lc.filename if hasattr(self.item.lc, "filename") else "", (self.item.lc.line or 0)+1, (self.item.lc.col or 0)+1) else: return "%s:%i:%i:" % (self.item.lc.filename if hasattr(self.item.lc, "filename") else "", (self.item.lc.data[self.key][0] or 0)+1, (self.item.lc.data[self.key][1] or 0)+1) def makeError(self, msg): # type: (Text) -> Any if not isinstance(self.item, ruamel.yaml.comments.CommentedBase): return self.raise_type(msg) errs = [] lead = self.makeLead() for m in msg.splitlines(): if bool(lineno_re.match(m)): errs.append(m) else: errs.append("%s %s" % (lead, m)) return self.raise_type("\n".join(errs)) schema-salad-2.6.20171201034858/schema_salad/main.py0000644000175100017510000003145613203345013021462 0ustar peterpeter00000000000000from __future__ import print_function from __future__ import absolute_import import argparse import logging import sys import traceback import json import os import re import itertools import six from six.moves import urllib import pkg_resources # part of setuptools from typing import Any, Dict, List, Union, Pattern, Text, Tuple, cast from rdflib import Graph, plugin from rdflib.serializer import Serializer from . import schema from . import jsonld_context from . import makedoc from . import validate from . import codegen from .sourceline import strip_dup_lineno from .ref_resolver import Loader, file_uri _logger = logging.getLogger("salad") from rdflib.plugin import register, Parser register('json-ld', Parser, 'rdflib_jsonld.parser', 'JsonLDParser') def printrdf(workflow, # type: str wf, # type: Union[List[Dict[Text, Any]], Dict[Text, Any]] ctx, # type: Dict[Text, Any] sr # type: str ): # type: (...) -> None g = jsonld_context.makerdf(workflow, wf, ctx) print(g.serialize(format=sr)) def regex_chunk(lines, regex): # type: (List[str], Pattern[str]) -> List[List[str]] lst = list(itertools.dropwhile(lambda x: not regex.match(x), lines)) arr = [] while lst: ret = [lst[0]]+list(itertools.takewhile(lambda x: not regex.match(x), lst[1:])) arr.append(ret) lst = list(itertools.dropwhile(lambda x: not regex.match(x), lst[1:])) return arr def chunk_messages(message): # type: (str) -> List[Tuple[int, str]] file_regex = re.compile(r'^(.+:\d+:\d+:)(\s+)(.+)$') item_regex = re.compile(r'^\s*\*\s+') arr = [] for chun in regex_chunk(message.splitlines(), file_regex): fst = chun[0] mat = file_regex.match(fst) place = mat.group(1) indent = len(mat.group(2)) lst = [mat.group(3)]+chun[1:] if [x for x in lst if item_regex.match(x)]: for item in regex_chunk(lst, item_regex): msg = re.sub(item_regex, '', "\n".join(item)) arr.append((indent, place+' '+re.sub(r'[\n\s]+', ' ', msg))) else: msg = re.sub(item_regex, '', "\n".join(lst)) arr.append((indent, place+' '+re.sub(r'[\n\s]+', ' ', msg))) return arr def to_one_line_messages(message): # type: (str) -> str ret = [] max_elem = (0, '') for (indent, msg) in chunk_messages(message): if indent > max_elem[0]: max_elem = (indent, msg) else: ret.append(max_elem[1]) max_elem = (indent, msg) ret.append(max_elem[1]) return "\n".join(ret) def reformat_yaml_exception_message(message): # type: (str) -> str line_regex = re.compile(r'^\s+in "(.+)", line (\d+), column (\d+)$') fname_regex = re.compile(r'^file://'+os.getcwd()+'/') msgs = message.splitlines() ret = [] if len(msgs) == 3: msgs = msgs[1:] nblanks = 0 elif len(msgs) == 4: c_msg = msgs[0] c_file, c_line, c_column = line_regex.match(msgs[1]).groups() c_file = re.sub(fname_regex, '', c_file) ret.append("%s:%s:%s: %s" % (c_file, c_line, c_column, c_msg)) msgs = msgs[2:] nblanks = 2 p_msg = msgs[0] p_file, p_line, p_column = line_regex.match(msgs[1]).groups() p_file = re.sub(fname_regex, '', p_file) ret.append("%s:%s:%s:%s %s" % (p_file, p_line, p_column, ' '*nblanks, p_msg)) return "\n".join(ret) def main(argsl=None): # type: (List[str]) -> int if argsl is None: argsl = sys.argv[1:] parser = argparse.ArgumentParser() parser.add_argument("--rdf-serializer", help="Output RDF serialization format used by --print-rdf (one of turtle (default), n3, nt, xml)", default="turtle") exgroup = parser.add_mutually_exclusive_group() exgroup.add_argument("--print-jsonld-context", action="store_true", help="Print JSON-LD context for schema") exgroup.add_argument( "--print-rdfs", action="store_true", help="Print RDF schema") exgroup.add_argument("--print-avro", action="store_true", help="Print Avro schema") exgroup.add_argument("--print-rdf", action="store_true", help="Print corresponding RDF graph for document") exgroup.add_argument("--print-pre", action="store_true", help="Print document after preprocessing") exgroup.add_argument( "--print-index", action="store_true", help="Print node index") exgroup.add_argument("--print-metadata", action="store_true", help="Print document metadata") exgroup.add_argument("--codegen", type=str, metavar="language", help="Generate classes in target language, currently supported: python") exgroup.add_argument("--print-oneline", action="store_true", help="Print each error message in oneline") exgroup = parser.add_mutually_exclusive_group() exgroup.add_argument("--strict", action="store_true", help="Strict validation (unrecognized or out of place fields are error)", default=True, dest="strict") exgroup.add_argument("--non-strict", action="store_false", help="Lenient validation (ignore unrecognized fields)", default=True, dest="strict") exgroup = parser.add_mutually_exclusive_group() exgroup.add_argument("--verbose", action="store_true", help="Default logging") exgroup.add_argument("--quiet", action="store_true", help="Only print warnings and errors.") exgroup.add_argument("--debug", action="store_true", help="Print even more logging") parser.add_argument("schema", type=str, nargs="?", default=None) parser.add_argument("document", type=str, nargs="?", default=None) parser.add_argument("--version", "-v", action="store_true", help="Print version", default=None) args = parser.parse_args(argsl) if args.version is None and args.schema is None: print('%s: error: too few arguments' % sys.argv[0]) return 1 if args.quiet: _logger.setLevel(logging.WARN) if args.debug: _logger.setLevel(logging.DEBUG) pkg = pkg_resources.require("schema_salad") if pkg: if args.version: print("%s Current version: %s" % (sys.argv[0], pkg[0].version)) return 0 else: _logger.info("%s Current version: %s", sys.argv[0], pkg[0].version) # Get the metaschema to validate the schema metaschema_names, metaschema_doc, metaschema_loader = schema.get_metaschema() # Load schema document and resolve refs schema_uri = args.schema if not (urllib.parse.urlparse(schema_uri)[0] and urllib.parse.urlparse(schema_uri)[0] in [u'http', u'https', u'file']): schema_uri = file_uri(os.path.abspath(schema_uri)) schema_raw_doc = metaschema_loader.fetch(schema_uri) try: schema_doc, schema_metadata = metaschema_loader.resolve_all( schema_raw_doc, schema_uri) except (validate.ValidationException) as e: _logger.error("Schema `%s` failed link checking:\n%s", args.schema, e, exc_info=(True if args.debug else False)) _logger.debug("Index is %s", list(metaschema_loader.idx.keys())) _logger.debug("Vocabulary is %s", list(metaschema_loader.vocab.keys())) return 1 except (RuntimeError) as e: _logger.error("Schema `%s` read error:\n%s", args.schema, e, exc_info=(True if args.debug else False)) return 1 # Optionally print the schema after ref resolution if not args.document and args.print_pre: print(json.dumps(schema_doc, indent=4)) return 0 if not args.document and args.print_index: print(json.dumps(list(metaschema_loader.idx.keys()), indent=4)) return 0 # Validate the schema document against the metaschema try: schema.validate_doc(metaschema_names, schema_doc, metaschema_loader, args.strict, source=schema_metadata.get("name")) except validate.ValidationException as e: _logger.error("While validating schema `%s`:\n%s" % (args.schema, str(e))) return 1 # Get the json-ld context and RDFS representation from the schema metactx = {} # type: Dict[str, str] if isinstance(schema_raw_doc, dict): metactx = schema_raw_doc.get("$namespaces", {}) if "$base" in schema_raw_doc: metactx["@base"] = schema_raw_doc["$base"] if schema_doc is not None: (schema_ctx, rdfs) = jsonld_context.salad_to_jsonld_context( schema_doc, metactx) else: raise Exception("schema_doc is None??") # Create the loader that will be used to load the target document. document_loader = Loader(schema_ctx) if args.codegen: codegen.codegen(args.codegen, cast(List[Dict[Text, Any]], schema_doc), schema_metadata, document_loader) return 0 # Make the Avro validation that will be used to validate the target # document if isinstance(schema_doc, list): (avsc_names, avsc_obj) = schema.make_avro_schema( schema_doc, document_loader) else: _logger.error("Schema `%s` must be a list.", args.schema) return 1 if isinstance(avsc_names, Exception): _logger.error("Schema `%s` error:\n%s", args.schema, avsc_names, exc_info=((type(avsc_names), avsc_names, None) if args.debug else None)) if args.print_avro: print(json.dumps(avsc_obj, indent=4)) return 1 # Optionally print Avro-compatible schema from schema if args.print_avro: print(json.dumps(avsc_obj, indent=4)) return 0 # Optionally print the json-ld context from the schema if args.print_jsonld_context: j = {"@context": schema_ctx} print(json.dumps(j, indent=4, sort_keys=True)) return 0 # Optionally print the RDFS graph from the schema if args.print_rdfs: print(rdfs.serialize(format=args.rdf_serializer)) return 0 if args.print_metadata and not args.document: print(json.dumps(schema_metadata, indent=4)) return 0 # If no document specified, all done. if not args.document: print("Schema `%s` is valid" % args.schema) return 0 # Load target document and resolve refs try: uri = args.document if not urllib.parse.urlparse(uri)[0]: doc = "file://" + os.path.abspath(uri) document, doc_metadata = document_loader.resolve_ref(uri) except validate.ValidationException as e: msg = strip_dup_lineno(six.text_type(e)) msg = to_one_line_messages(str(msg)) if args.print_oneline else msg _logger.error("Document `%s` failed validation:\n%s", args.document, msg, exc_info=args.debug) return 1 except RuntimeError as e: msg = strip_dup_lineno(six.text_type(e)) msg = reformat_yaml_exception_message(str(msg)) msg = to_one_line_messages(msg) if args.print_oneline else msg _logger.error("Document `%s` failed validation:\n%s", args.document, msg, exc_info=args.debug) return 1 # Optionally print the document after ref resolution if args.print_pre: print(json.dumps(document, indent=4)) return 0 if args.print_index: print(json.dumps(list(document_loader.idx.keys()), indent=4)) return 0 # Validate the schema document against the metaschema try: schema.validate_doc(avsc_names, document, document_loader, args.strict) except validate.ValidationException as e: msg = to_one_line_messages(str(e)) if args.print_oneline else str(e) _logger.error("While validating document `%s`:\n%s" % (args.document, msg)) return 1 # Optionally convert the document to RDF if args.print_rdf: if isinstance(document, (dict, list)): printrdf(args.document, document, schema_ctx, args.rdf_serializer) return 0 else: print("Document must be a dictionary or list.") return 1 if args.print_metadata: print(json.dumps(doc_metadata, indent=4)) return 0 print("Document `%s` is valid" % args.document) return 0 if __name__ == "__main__": sys.exit(main(sys.argv[1:])) schema-salad-2.6.20171201034858/schema_salad/python_codegen_support.py0000644000175100017510000003233413203345013025333 0ustar peterpeter00000000000000import six from six.moves import urllib, StringIO import ruamel.yaml as yaml import copy import re from typing import List, Text, Dict, Union, Any, Sequence class ValidationException(Exception): pass class Savable(object): pass class LoadingOptions(object): def __init__(self, fetcher=None, namespaces=None, fileuri=None, copyfrom=None): if copyfrom is not None: self.idx = copyfrom.idx if fetcher is None: fetcher = copyfrom.fetcher if fileuri is None: fileuri = copyfrom.fileuri else: self.idx = {} if fetcher is None: import os import requests from cachecontrol.wrapper import CacheControl from cachecontrol.caches import FileCache from schema_salad.ref_resolver import DefaultFetcher if "HOME" in os.environ: session = CacheControl( requests.Session(), cache=FileCache(os.path.join(os.environ["HOME"], ".cache", "salad"))) elif "TMP" in os.environ: session = CacheControl( requests.Session(), cache=FileCache(os.path.join(os.environ["TMP"], ".cache", "salad"))) else: session = CacheControl( requests.Session(), cache=FileCache("/tmp", ".cache", "salad")) self.fetcher = DefaultFetcher({}, session) else: self.fetcher = fetcher self.fileuri = fileuri self.vocab = _vocab self.rvocab = _rvocab if namespaces is not None: self.vocab = self.vocab.copy() self.rvocab = self.rvocab.copy() for k,v in six.iteritems(namespaces): self.vocab[k] = v self.rvocab[v] = k def load_field(val, fieldtype, baseuri, loadingOptions): if isinstance(val, dict): if "$import" in val: return _document_load_by_url(fieldtype, loadingOptions.fetcher.urljoin(loadingOptions.fileuri, val["$import"]), loadingOptions) elif "$include" in val: val = loadingOptions.fetcher.fetch_text(loadingOptions.fetcher.urljoin(loadingOptions.fileuri, val["$include"])) return fieldtype.load(val, baseuri, loadingOptions) def save(val): if isinstance(val, Savable): return val.save() if isinstance(val, list): return [save(v) for v in val] return val def expand_url(url, # type: Union[str, Text] base_url, # type: Union[str, Text] loadingOptions, # type: LoadingOptions scoped_id=False, # type: bool vocab_term=False, # type: bool scoped_ref=None # type: int ): # type: (...) -> Text if not isinstance(url, six.string_types): return url url = Text(url) if url in (u"@id", u"@type"): return url if vocab_term and url in loadingOptions.vocab: return url if bool(loadingOptions.vocab) and u":" in url: prefix = url.split(u":")[0] if prefix in loadingOptions.vocab: url = loadingOptions.vocab[prefix] + url[len(prefix) + 1:] split = urllib.parse.urlsplit(url) if ((bool(split.scheme) and split.scheme in [u'http', u'https', u'file']) or url.startswith(u"$(") or url.startswith(u"${")): pass elif scoped_id and not bool(split.fragment): splitbase = urllib.parse.urlsplit(base_url) frg = u"" if bool(splitbase.fragment): frg = splitbase.fragment + u"/" + split.path else: frg = split.path pt = splitbase.path if splitbase.path != '' else "/" url = urllib.parse.urlunsplit( (splitbase.scheme, splitbase.netloc, pt, splitbase.query, frg)) elif scoped_ref is not None and not bool(split.fragment): splitbase = urllib.parse.urlsplit(base_url) sp = splitbase.fragment.split(u"/") n = scoped_ref while n > 0 and len(sp) > 0: sp.pop() n -= 1 sp.append(url) url = urllib.parse.urlunsplit(( splitbase.scheme, splitbase.netloc, splitbase.path, splitbase.query, u"/".join(sp))) else: url = loadingOptions.fetcher.urljoin(base_url, url) if vocab_term: split = urllib.parse.urlsplit(url) if bool(split.scheme): if url in loadingOptions.rvocab: return loadingOptions.rvocab[url] else: raise ValidationException("Term '%s' not in vocabulary" % url) return url class _Loader(object): def load(self, doc, baseuri, loadingOptions, docRoot=None): # type: (Any, Text, LoadingOptions, Union[Text, None]) -> Any pass class _AnyLoader(_Loader): def load(self, doc, baseuri, loadingOptions, docRoot=None): if doc is not None: return doc raise ValidationException("Expected non-null") class _PrimitiveLoader(_Loader): def __init__(self, tp): # type: (Union[type, Sequence[type]]) -> None self.tp = tp def load(self, doc, baseuri, loadingOptions, docRoot=None): if not isinstance(doc, self.tp): raise ValidationException("Expected a %s but got %s" % (self.tp, type(doc))) return doc def __repr__(self): return str(self.tp) class _ArrayLoader(_Loader): def __init__(self, items): # type: (_Loader) -> None self.items = items def load(self, doc, baseuri, loadingOptions, docRoot=None): if not isinstance(doc, list): raise ValidationException("Expected a list") r = [] errors = [] for i in range(0, len(doc)): try: lf = load_field(doc[i], _UnionLoader((self, self.items)), baseuri, loadingOptions) if isinstance(lf, list): r.extend(lf) else: r.append(lf) except ValidationException as e: errors.append(SourceLine(doc, i, str).makeError(six.text_type(e))) if errors: raise ValidationException("\n".join(errors)) return r def __repr__(self): return "array<%s>" % self.items class _EnumLoader(_Loader): def __init__(self, symbols): # type: (Sequence[Text]) -> None self.symbols = symbols def load(self, doc, baseuri, loadingOptions, docRoot=None): if doc in self.symbols: return doc else: raise ValidationException("Expected one of %s" % (self.symbols,)) class _RecordLoader(_Loader): def __init__(self, classtype): # type: (type) -> None self.classtype = classtype def load(self, doc, baseuri, loadingOptions, docRoot=None): if not isinstance(doc, dict): raise ValidationException("Expected a dict") return self.classtype(doc, baseuri, loadingOptions, docRoot=docRoot) def __repr__(self): return str(self.classtype) class _UnionLoader(_Loader): def __init__(self, alternates): # type: (Sequence[_Loader]) -> None self.alternates = alternates def load(self, doc, baseuri, loadingOptions, docRoot=None): errors = [] for t in self.alternates: try: return t.load(doc, baseuri, loadingOptions, docRoot=docRoot) except ValidationException as e: errors.append("tried %s but\n%s" % (t, indent(str(e)))) raise ValidationException(bullets(errors, "- ")) def __repr__(self): return " | ".join(str(a) for a in self.alternates) class _URILoader(_Loader): def __init__(self, inner, scoped_id, vocab_term, scoped_ref): # type: (_Loader, bool, bool, Union[int, None]) -> None self.inner = inner self.scoped_id = scoped_id self.vocab_term = vocab_term self.scoped_ref = scoped_ref def load(self, doc, baseuri, loadingOptions, docRoot=None): if isinstance(doc, list): doc = [expand_url(i, baseuri, loadingOptions, self.scoped_id, self.vocab_term, self.scoped_ref) for i in doc] if isinstance(doc, six.string_types): doc = expand_url(doc, baseuri, loadingOptions, self.scoped_id, self.vocab_term, self.scoped_ref) return self.inner.load(doc, baseuri, loadingOptions) class _TypeDSLLoader(_Loader): typeDSLregex = re.compile(u"^([^[?]+)(\[\])?(\?)?$") def __init__(self, inner, refScope): # type: (_Loader, Union[int, None]) -> None self.inner = inner self.refScope = refScope def resolve(self, doc, baseuri, loadingOptions): m = self.typeDSLregex.match(doc) if m: first = expand_url(m.group(1), baseuri, loadingOptions, False, True, self.refScope) second = third = None if bool(m.group(2)): second = {"type": "array", "items": first} #second = CommentedMap((("type", "array"), # ("items", first))) #second.lc.add_kv_line_col("type", lc) #second.lc.add_kv_line_col("items", lc) #second.lc.filename = filename if bool(m.group(3)): third = [u"null", second or first] #third = CommentedSeq([u"null", second or first]) #third.lc.add_kv_line_col(0, lc) #third.lc.add_kv_line_col(1, lc) #third.lc.filename = filename doc = third or second or first return doc def load(self, doc, baseuri, loadingOptions, docRoot=None): if isinstance(doc, list): r = [] for d in doc: if isinstance(d, six.string_types): resolved = self.resolve(d, baseuri, loadingOptions) if isinstance(resolved, list): for i in resolved: if i not in r: r.append(i) else: if resolved not in r: r.append(resolved) else: r.append(d) doc = r elif isinstance(doc, six.string_types): doc = self.resolve(doc, baseuri, loadingOptions) return self.inner.load(doc, baseuri, loadingOptions) class _IdMapLoader(_Loader): def __init__(self, inner, mapSubject, mapPredicate): # type: (_Loader, Text, Union[Text, None]) -> None self.inner = inner self.mapSubject = mapSubject self.mapPredicate = mapPredicate def load(self, doc, baseuri, loadingOptions, docRoot=None): if isinstance(doc, dict): r = [] for k in sorted(doc.keys()): val = doc[k] if isinstance(val, dict): v = copy.copy(val) if hasattr(val, 'lc'): v.lc.data = val.lc.data v.lc.filename = val.lc.filename else: if self.mapPredicate: v = {self.mapPredicate: val} else: raise ValidationException("No mapPredicate") v[self.mapSubject] = k r.append(v) doc = r return self.inner.load(doc, baseuri, loadingOptions) def _document_load(loader, doc, baseuri, loadingOptions): if isinstance(doc, six.string_types): return _document_load_by_url(loader, loadingOptions.fetcher.urljoin(baseuri, doc), loadingOptions) if isinstance(doc, dict): if "$namespaces" in doc: loadingOptions = LoadingOptions(copyfrom=loadingOptions, namespaces=doc["$namespaces"]) if "$base" in doc: baseuri = doc["$base"] if "$graph" in doc: return loader.load(doc["$graph"], baseuri, loadingOptions) else: return loader.load(doc, baseuri, loadingOptions, docRoot=baseuri) if isinstance(doc, list): return loader.load(doc, baseuri, loadingOptions) raise ValidationException() def _document_load_by_url(loader, url, loadingOptions): if url in loadingOptions.idx: return _document_load(loader, loadingOptions.idx[url], url, loadingOptions) text = loadingOptions.fetcher.fetch_text(url) if isinstance(text, bytes): textIO = StringIO(text.decode('utf-8')) else: textIO = StringIO(text) textIO.name = url # type: ignore result = yaml.round_trip_load(textIO) add_lc_filename(result, url) loadingOptions.idx[url] = result loadingOptions = LoadingOptions(copyfrom=loadingOptions, fileuri=url) return _document_load(loader, result, url, loadingOptions) def file_uri(path, split_frag=False): # type: (str, bool) -> str if path.startswith("file://"): return path if split_frag: pathsp = path.split("#", 2) frag = "#" + urllib.parse.quote(str(pathsp[1])) if len(pathsp) == 2 else "" urlpath = urllib.request.pathname2url(str(pathsp[0])) else: urlpath = urllib.request.pathname2url(path) frag = "" if urlpath.startswith("//"): return "file:%s%s" % (urlpath, frag) else: return "file://%s%s" % (urlpath, frag) schema-salad-2.6.20171201034858/schema_salad/utils.py0000644000175100017510000000204713203345013021670 0ustar peterpeter00000000000000from __future__ import absolute_import import os from typing import Any, Dict, List def add_dictlist(di, key, val): # type: (Dict, Any, Any) -> None if key not in di: di[key] = [] di[key].append(val) def aslist(l): # type: (Any) -> List """Convenience function to wrap single items and lists, and return lists unchanged.""" if isinstance(l, list): return l else: return [l] # http://rightfootin.blogspot.com/2006/09/more-on-python-flatten.html def flatten(l, ltypes=(list, tuple)): # type: (Any, Any) -> Any if l is None: return [] if not isinstance(l, ltypes): return [l] ltype = type(l) lst = list(l) i = 0 while i < len(lst): while isinstance(lst[i], ltypes): if not lst[i]: lst.pop(i) i -= 1 break else: lst[i:i + 1] = lst[i] i += 1 return ltype(lst) # Check if we are on windows OS def onWindows(): # type: () -> (bool) return os.name == 'nt' schema-salad-2.6.20171201034858/schema_salad/codegen_base.py0000644000175100017510000000456313203345013023133 0ustar peterpeter00000000000000import collections from six.moves import urllib from typing import List, Text, Dict, Union, Any from . import schema def shortname(inputid): # type: (Text) -> Text d = urllib.parse.urlparse(inputid) if d.fragment: return d.fragment.split(u"/")[-1] else: return d.path.split(u"/")[-1] class TypeDef(object): def __init__(self, name, init): # type: (Text, Text) -> None self.name = name self.init = init class CodeGenBase(object): def __init__(self): # type: () -> None self.collected_types = collections.OrderedDict() # type: collections.OrderedDict[Text, TypeDef] self.vocab = {} # type: Dict[Text, Text] def declare_type(self, t): # type: (TypeDef) -> TypeDef if t not in self.collected_types: self.collected_types[t.name] = t return t def add_vocab(self, name, uri): # type: (Text, Text) -> None self.vocab[name] = uri def prologue(self): # type: () -> None raise NotImplementedError() def safe_name(self, n): # type: (Text) -> Text return schema.avro_name(n) def begin_class(self, classname, extends, doc, abstract): # type: (Text, List[Text], Text, bool) -> None raise NotImplementedError() def end_class(self, classname): # type: (Text) -> None raise NotImplementedError() def type_loader(self, t): # type: (Union[List[Any], Dict[Text, Any]]) -> TypeDef raise NotImplementedError() def declare_field(self, name, typedef, doc, optional): # type: (Text, TypeDef, Text, bool) -> None raise NotImplementedError() def declare_id_field(self, name, typedef, doc): # type: (Text, TypeDef, Text) -> None raise NotImplementedError() def uri_loader(self, inner, scoped_id, vocab_term, refScope): # type: (TypeDef, bool, bool, Union[int, None]) -> TypeDef raise NotImplementedError() def idmap_loader(self, field, inner, mapSubject, mapPredicate): # type: (Text, TypeDef, Text, Union[Text, None]) -> TypeDef raise NotImplementedError() def typedsl_loader(self, inner, refScope): # type: (TypeDef, Union[int, None]) -> TypeDef raise NotImplementedError() def epilogue(self, rootLoader): # type: (TypeDef) -> None raise NotImplementedError() schema-salad-2.6.20171201034858/schema_salad/metaschema/0000755000175100017510000000000013211573301022264 5ustar peterpeter00000000000000schema-salad-2.6.20171201034858/schema_salad/metaschema/vocab_res_proc.yml0000644000175100017510000000036312651763266026021 0ustar peterpeter00000000000000 { "form": { "things": [ { "voc": "red", }, { "voc": "red", }, { "voc": "http://example.com/acid#blue", } ] } } schema-salad-2.6.20171201034858/schema_salad/metaschema/link_res_proc.yml0000644000175100017510000000063312651763266025664 0ustar peterpeter00000000000000{ "$base": "http://example.com/base", "link": "http://example.com/base/zero", "form": { "link": "http://example.com/one", "things": [ { "link": "http://example.com/two" }, { "link": "http://example.com/base#three" }, { "link": "http://example.com/four#five", }, { "link": "http://example.com/acid#six", } ] } } schema-salad-2.6.20171201034858/schema_salad/metaschema/typedsl_res_schema.yml0000644000175100017510000000045313060036611026666 0ustar peterpeter00000000000000{ "$graph": [ {"$import": "metaschema_base.yml"}, { "name": "TypeDSLExample", "type": "record", "documentRoot": true, "fields": [{ "name": "extype", "type": "string", "jsonldPredicate": { _type: "@vocab", "typeDSL": true } }] }] } schema-salad-2.6.20171201034858/schema_salad/metaschema/metaschema.html0000644000175100017510000014774213165562750025316 0ustar peterpeter00000000000000 Semantic Annotations for Linked Avro Data (SALAD)

Semantic Annotations for Linked Avro Data (SALAD)

Author:

Contributors:

Abstract

Salad is a schema language for describing structured linked data documents in JSON or YAML documents. A Salad schema provides rules for preprocessing, structural validation, and link checking for documents described by a Salad schema. Salad builds on JSON-LD and the Apache Avro data serialization system, and extends Avro with features for rich data modeling such as inheritance, template specialization, object identifiers, and object references. Salad was developed to provide a bridge between the record oriented data modeling supported by Apache Avro and the Semantic Web.

Status of This Document

This document is the product of the Common Workflow Language working group. The latest version of this document is available in the "schema_salad" repository at

https://github.com/common-workflow-language/schema_salad

The products of the CWL working group (including this document) are made available under the terms of the Apache License, version 2.0.

Table of contents

1. Introduction

The JSON data model is an extremely popular way to represent structured data. It is attractive because of its relative simplicity and is a natural fit with the standard types of many programming languages. However, this simplicity means that basic JSON lacks expressive features useful for working with complex data structures and document formats, such as schemas, object references, and namespaces.

JSON-LD is a W3C standard providing a way to describe how to interpret a JSON document as Linked Data by means of a "context". JSON-LD provides a powerful solution for representing object references and namespaces in JSON based on standard web URIs, but is not itself a schema language. Without a schema providing a well defined structure, it is difficult to process an arbitrary JSON-LD document as idiomatic JSON because there are many ways to express the same data that are logically equivalent but structurally distinct.

Several schema languages exist for describing and validating JSON data, such as the Apache Avro data serialization system, however none understand linked data. As a result, to fully take advantage of JSON-LD to build the next generation of linked data applications, one must maintain separate JSON schema, JSON-LD context, RDF schema, and human documentation, despite significant overlap of content and obvious need for these documents to stay synchronized.

Schema Salad is designed to address this gap. It provides a schema language and processing rules for describing structured JSON content permitting URI resolution and strict document validation. The schema language supports linked data through annotations that describe the linked data interpretation of the content, enables generation of JSON-LD context and RDF schema, and production of RDF triples by applying the JSON-LD context. The schema language also provides for robust support of inline documentation.

1.1 Introduction to v1.0

This is the second version of of the Schema Salad specification. It is developed concurrently with v1.0 of the Common Workflow Language for use in specifying the Common Workflow Language, however Schema Salad is intended to be useful to a broader audience. Compared to the draft-1 schema salad specification, the following changes have been made:

1.2 References to Other Specifications

Javascript Object Notation (JSON): http://json.org

JSON Linked Data (JSON-LD): http://json-ld.org

YAML: http://yaml.org

Avro: https://avro.apache.org/docs/current/spec.html

Uniform Resource Identifier (URI) Generic Syntax: https://tools.ietf.org/html/rfc3986)

Resource Description Framework (RDF): http://www.w3.org/RDF/

UTF-8: https://www.ietf.org/rfc/rfc2279.txt)

1.3 Scope

This document describes the syntax, data model, algorithms, and schema language for working with Salad documents. It is not intended to document a specific implementation of Salad, however it may serve as a reference for the behavior of conforming implementations.

1.4 Terminology

The terminology used to describe Salad documents is defined in the Concepts section of the specification. The terms defined in the following list are used in building those definitions and in describing the actions of an Salad implementation:

may: Conforming Salad documents and Salad implementations are permitted but not required to be interpreted as described.

must: Conforming Salad documents and Salad implementations are required to be interpreted as described; otherwise they are in error.

error: A violation of the rules of this specification; results are undefined. Conforming implementations may detect and report an error and may recover from it.

fatal error: A violation of the rules of this specification; results are undefined. Conforming implementations must not continue to process the document and may report an error.

at user option: Conforming software may or must (depending on the modal verb in the sentence) behave as described; if it does, it must provide users a means to enable or disable the behavior described.

2. Document model

2.1 Data concepts

An object is a data structure equivalent to the "object" type in JSON, consisting of a unordered set of name/value pairs (referred to here as fields) and where the name is a string and the value is a string, number, boolean, array, or object.

A document is a file containing a serialized object, or an array of objects.

A document type is a class of files that share a common structure and semantics.

A document schema is a formal description of the grammar of a document type.

A base URI is a context-dependent URI used to resolve relative references.

An identifier is a URI that designates a single document or single object within a document.

A vocabulary is the set of symbolic field names and enumerated symbols defined by a document schema, where each term maps to absolute URI.

2.2 Syntax

Conforming Salad documents are serialized and loaded using YAML syntax and UTF-8 text encoding. Salad documents are written using the JSON-compatible subset of YAML. Features of YAML such as headers and type tags that are not found in the standard JSON data model must not be used in conforming Salad documents. It is a fatal error if the document is not valid YAML.

A Salad document must consist only of either a single root object or an array of objects.

2.3 Document context

2.3.1 Implied context

The implicit context consists of the vocabulary defined by the schema and the base URI. By default, the base URI must be the URI that was used to load the document. It may be overridden by an explicit context.

2.3.2 Explicit context

If a document consists of a root object, this object may contain the fields $base, $namespaces, $schemas, and $graph:

  • $base: Must be a string. Set the base URI for the document used to resolve relative references.

  • $namespaces: Must be an object with strings as values. The keys of the object are namespace prefixes used in the document; the values of the object are the prefix expansions.

  • $schemas: Must be an array of strings. This field may list URI references to documents in RDF-XML format which will be queried for RDF schema data. The subjects and predicates described by the RDF schema may provide additional semantic context for the document, and may be used for validation of prefixed extension fields found in the document.

Other directives beginning with $ must be ignored.

2.4 Document graph

If a document consists of a single root object, this object may contain the field $graph. This field must be an array of objects. If present, this field holds the primary content of the document. A document that consists of array of objects at the root is an implicit graph.

2.5 Document metadata

If a document consists of a single root object, metadata about the document, such as authorship, may be declared in the root object.

2.6 Document schema

Document preprocessing, link validation and schema validation require a document schema. A schema may consist of:

  • At least one record definition object which defines valid fields that make up a record type. Record field definitions include the valid types that may be assigned to each field and annotations to indicate fields that represent identifiers and links, described below in "Semantic Annotations".

  • Any number of enumerated type objects which define a set of finite set of symbols that are valid value of the type.

  • Any number of documentation objects which allow in-line documentation of the schema.

The schema for defining a salad schema (the metaschema) is described in detail in "Schema validation".

2.6.1 Record field annotations

In a document schema, record field definitions may include the field jsonldPredicate, which may be either a string or object. Implementations must use the following document preprocessing of fields by the following rules:

  • If the value of jsonldPredicate is @id, the field is an identifier field.

  • If the value of jsonldPredicate is an object, and contains that object contains the field _type with the value @id, the field is a link field.

  • If the value of jsonldPredicate is an object, and contains that object contains the field _type with the value @vocab, the field is a vocabulary field, which is a subtype of link field.

2.7 Document traversal

To perform document document preprocessing, link validation and schema validation, the document must be traversed starting from the fields or array items of the root object or array and recursively visiting each child item which contains an object or arrays.

3. Document preprocessing

After processing the explicit context (if any), document preprocessing begins. Starting from the document root, object fields values or array items which contain objects or arrays are recursively traversed depth-first. For each visited object, field names, identifier fields, link fields, vocabulary fields, and $import and $include directives must be processed as described in this section. The order of traversal of child nodes within a parent node is undefined.

3.1 Field name resolution

The document schema declares the vocabulary of known field names. During preprocessing traversal, field name in the document which are not part of the schema vocabulary must be resolved to absolute URIs. Under "strict" validation, it is an error for a document to include fields which are not part of the vocabulary and not resolvable to absolute URIs. Fields names which are not part of the vocabulary are resolved using the following rules:

  • If an field name URI begins with a namespace prefix declared in the document context (@context) followed by a colon :, the prefix and colon must be replaced by the namespace declared in @context.

  • If there is a vocabulary term which maps to the URI of a resolved field, the field name must be replace with the vocabulary term.

  • If a field name URI is an absolute URI consisting of a scheme and path and is not part of the vocabulary, no processing occurs.

Field name resolution is not relative. It must not be affected by the base URI.

3.1.1 Field name resolution example

Given the following schema:

{
  "$namespaces": {
    "acid": "http://example.com/acid#"
  },
  "$graph": [{
    "name": "ExampleType",
    "type": "record",
    "fields": [{
      "name": "base",
      "type": "string",
      "jsonldPredicate": "http://example.com/base"
    }]
  }]
}

Process the following example:

    {
      "base": "one",
      "form": {
        "http://example.com/base": "two",
        "http://example.com/three": "three",
      },
      "acid:four": "four"
    }

This becomes:

    {
      "base": "one",
      "form": {
        "base": "two",
        "http://example.com/three": "three",
      },
      "http://example.com/acid#four": "four"
    }

3.2 Identifier resolution

The schema may designate one or more fields as identifier fields to identify specific objects. Processing must resolve relative identifiers to absolute identifiers using the following rules:

  • If an identifier URI is prefixed with # it is a URI relative fragment identifier. It is resolved relative to the base URI by setting or replacing the fragment portion of the base URI.

  • If an identifier URI does not contain a scheme and is not prefixed # it is a parent relative fragment identifier. It is resolved relative to the base URI by the following rule: if the base URI does not contain a document fragment, set the fragment portion of the base URI. If the base URI does contain a document fragment, append a slash / followed by the identifier field to the fragment portion of the base URI.

  • If an identifier URI begins with a namespace prefix declared in $namespaces followed by a colon :, the prefix and colon must be replaced by the namespace declared in $namespaces.

  • If an identifier URI is an absolute URI consisting of a scheme and path, no processing occurs.

When preprocessing visits a node containing an identifier, that identifier must be used as the base URI to process child nodes.

It is an error for more than one object in a document to have the same absolute URI.

3.2.1 Identifier resolution example

Given the following schema:

{
  "$namespaces": {
    "acid": "http://example.com/acid#"
  },
  "$graph": [{
    "name": "ExampleType",
    "type": "record",
    "fields": [{
      "name": "id",
      "type": "string",
      "jsonldPredicate": "@id"
    }]
  }]
}

Process the following example:

    {
      "id": "http://example.com/base",
      "form": {
        "id": "one",
        "things": [
          {
            "id": "two"
          },
          {
            "id": "#three",
          },
          {
            "id": "four#five",
          },
          {
            "id": "acid:six",
          }
        ]
      }
    }

This becomes:

{
  "id": "http://example.com/base",
  "form": {
    "id": "http://example.com/base#one",
    "things": [
      {
        "id": "http://example.com/base#one/two"
      },
      {
        "id": "http://example.com/base#three"
      },
      {
        "id": "http://example.com/four#five",
      },
      {
        "id": "http://example.com/acid#six",
      }
    ]
  }
}

The schema may designate one or more fields as link fields reference other objects. Processing must resolve links to either absolute URIs using the following rules:

  • If a reference URI is prefixed with # it is a relative fragment identifier. It is resolved relative to the base URI by setting or replacing the fragment portion of the base URI.

  • If a reference URI does not contain a scheme and is not prefixed with # it is a path relative reference. If the reference URI contains # in any position other than the first character, the reference URI must be divided into a path portion and a fragment portion split on the first instance of #. The path portion is resolved relative to the base URI by the following rule: if the path portion of the base URI ends in a slash /, append the path portion of the reference URI to the path portion of the base URI. If the path portion of the base URI does not end in a slash, replace the final path segment with the path portion of the reference URI. Replace the fragment portion of the base URI with the fragment portion of the reference URI.

  • If a reference URI begins with a namespace prefix declared in $namespaces followed by a colon :, the prefix and colon must be replaced by the namespace declared in $namespaces.

  • If a reference URI is an absolute URI consisting of a scheme and path, no processing occurs.

Link resolution must not affect the base URI used to resolve identifiers and other links.

Given the following schema:

{
  "$namespaces": {
    "acid": "http://example.com/acid#"
  },
  "$graph": [{
    "name": "ExampleType",
    "type": "record",
    "fields": [{
      "name": "link",
      "type": "string",
      "jsonldPredicate": {
        "_type": "@id"
      }
    }]
  }]
}

Process the following example:

{
  "$base": "http://example.com/base",
  "link": "http://example.com/base/zero",
  "form": {
    "link": "one",
    "things": [
      {
        "link": "two"
      },
      {
        "link": "#three",
      },
      {
        "link": "four#five",
      },
      {
        "link": "acid:six",
      }
    ]
  }
}

This becomes:

{
  "$base": "http://example.com/base",
  "link": "http://example.com/base/zero",
  "form": {
    "link": "http://example.com/one",
    "things": [
      {
        "link": "http://example.com/two"
      },
      {
        "link": "http://example.com/base#three"
      },
      {
        "link": "http://example.com/four#five",
      },
      {
        "link": "http://example.com/acid#six",
      }
    ]
  }
}

3.4 Vocabulary resolution

The schema may designate one or more vocabulary fields which use terms defined in the vocabulary. Processing must resolve vocabulary fields to either vocabulary terms or absolute URIs by first applying the link resolution rules defined above, then applying the following additional rule:

* If a reference URI is a vocabulary field, and there is a vocabulary
term which maps to the resolved URI, the reference must be replace with
the vocabulary term.

3.4.1 Vocabulary resolution example

Given the following schema:

{
  "$namespaces": {
    "acid": "http://example.com/acid#"
  },
  "$graph": [{
    "name": "Colors",
    "type": "enum",
    "symbols": ["acid:red"]
  },
  {
    "name": "ExampleType",
    "type": "record",
    "fields": [{
      "name": "voc",
      "type": "string",
      "jsonldPredicate": {
        "_type": "@vocab"
      }
    }]
  }]
}

Process the following example:

    {
      "form": {
        "things": [
          {
            "voc": "red",
          },
          {
            "voc": "http://example.com/acid#red",
          },
          {
            "voc": "http://example.com/acid#blue",
          }
        ]
      }
    }

This becomes:

    {
      "form": {
        "things": [
          {
            "voc": "red",
          },
          {
            "voc": "red",
          },
          {
            "voc": "http://example.com/acid#blue",
          }
        ]
      }
    }

3.5 Import

During preprocessing traversal, an implementation must resolve $import directives. An $import directive is an object consisting of exactly one field $import specifying resource by URI string. It is an error if there are additional fields in the $import object, such additional fields must be ignored.

The URI string must be resolved to an absolute URI using the link resolution rules described previously. Implementations must support loading from file, http and https resources. The URI referenced by $import must be loaded and recursively preprocessed as a Salad document. The external imported document does not inherit the context of the importing document, and the default base URI for processing the imported document must be the URI used to retrieve the imported document. If the $import URI includes a document fragment, the fragment must be excluded from the base URI used to preprocess the imported document.

Once loaded and processed, the $import node is replaced in the document structure by the object or array yielded from the import operation.

URIs may reference document fragments which refer to specific an object in the target document. This indicates that the $import node must be replaced by only the object with the appropriate fragment identifier.

It is a fatal error if an import directive refers to an external resource or resource fragment which does not exist or is not accessible.

3.5.1 Import example

import.yml:

{
  "hello": "world"
}

parent.yml:

{
  "form": {
    "bar": {
      "$import": "import.yml"
      }
  }
}

This becomes:

{
  "form": {
    "bar": {
      "hello": "world"
    }
  }
}

3.6 Include

During preprocessing traversal, an implementation must resolve $include directives. An $include directive is an object consisting of exactly one field $include specifying a URI string. It is an error if there are additional fields in the $include object, such additional fields must be ignored.

The URI string must be resolved to an absolute URI using the link resolution rules described previously. The URI referenced by $include must be loaded as a text data. Implementations must support loading from file, http and https resources. Implementations may transcode the character encoding of the text data to match that of the parent document, but must not interpret or parse the text document in any other way.

Once loaded, the $include node is replaced in the document structure by a string containing the text data loaded from the resource.

It is a fatal error if an import directive refers to an external resource which does not exist or is not accessible.

3.6.1 Include example

parent.yml:

{
  "form": {
    "bar": {
      "$include": "include.txt"
      }
  }
}

include.txt:

hello world

This becomes:

{
  "form": {
    "bar": "hello world"
  }
}

3.7 Mixin

During preprocessing traversal, an implementation must resolve $mixin directives. An $mixin directive is an object consisting of the field $mixin specifying resource by URI string. If there are additional fields in the $mixin object, these fields override fields in the object which is loaded from the $mixin URI.

The URI string must be resolved to an absolute URI using the link resolution rules described previously. Implementations must support loading from file, http and https resources. The URI referenced by $mixin must be loaded and recursively preprocessed as a Salad document. The external imported document must inherit the context of the importing document, however the file URI for processing the imported document must be the URI used to retrieve the imported document. The $mixin URI must not include a document fragment.

Once loaded and processed, the $mixin node is replaced in the document structure by the object or array yielded from the import operation.

URIs may reference document fragments which refer to specific an object in the target document. This indicates that the $mixin node must be replaced by only the object with the appropriate fragment identifier.

It is a fatal error if an import directive refers to an external resource or resource fragment which does not exist or is not accessible.

3.7.1 Mixin example

mixin.yml:

{
  "hello": "world",
  "carrot": "orange"
}

parent.yml:

{
  "form": {
    "bar": {
      "$mixin": "mixin.yml"
      "carrot": "cake"
      }
  }
}

This becomes:

{
  "form": {
    "bar": {
      "hello": "world",
      "carrot": "cake"
    }
  }
}

3.8 Identifier maps

The schema may designate certain fields as having a mapSubject. If the value of the field is a JSON object, it must be transformed into an array of JSON objects. Each key-value pair from the source JSON object is a list item, each list item must be a JSON objects, and the value of the key is assigned to the field specified by mapSubject.

Fields which have mapSubject specified may also supply a mapPredicate. If the value of a map item is not a JSON object, the item is transformed to a JSON object with the key assigned to the field specified by mapSubject and the value assigned to the field specified by mapPredicate.

3.8.1 Identifier map example

Given the following schema:

{
  "$graph": [{
    "name": "MappedType",
    "type": "record",
    "documentRoot": true,
    "fields": [{
      "name": "mapped",
      "type": {
        "type": "array",
        "items": "ExampleRecord"
      },
      "jsonldPredicate": {
        "mapSubject": "key",
        "mapPredicate": "value"
      }
    }],
  },
  {
    "name": "ExampleRecord",
    "type": "record",
    "fields": [{
      "name": "key",
      "type": "string"
      }, {
      "name": "value",
      "type": "string"
      }
    ]
  }]
}

Process the following example:

{
  "mapped": {
    "shaggy": {
      "value": "scooby"
    },
    "fred": "daphne"
  }
}

This becomes:

{
    "mapped": [
        {
            "value": "daphne",
            "key": "fred"
        },
        {
            "value": "scooby",
            "key": "shaggy"
        }
    ]
}

3.9 Domain Specific Language for types

Fields may be tagged typeDSL: true. If so, the field is expanded using the following micro-DSL for schema salad types:

  • If the type ends with a question mark ? it is expanded to a union with null
  • If the type ends with square brackets [] it is expanded to an array with items of the preceeding type symbol
  • The type may end with both []? to indicate it is an optional array.
  • Identifier resolution is applied after type DSL expansion.

3.9.1 Type DSL example

Given the following schema:

{
  "$graph": [
  {"$import": "metaschema_base.yml"},
  {
    "name": "TypeDSLExample",
    "type": "record",
    "documentRoot": true,
    "fields": [{
      "name": "extype",
      "type": "string",
      "jsonldPredicate": {
        _type: "@vocab",
        "typeDSL": true
      }
    }]
  }]
}

Process the following example:

[{
  "extype": "string"
}, {
  "extype": "string?"
}, {
  "extype": "string[]"
}, {
  "extype": "string[]?"
}]

This becomes:

[
    {
        "extype": "string"
    }, 
    {
        "extype": [
            "null", 
            "string"
        ]
    }, 
    {
        "extype": {
            "type": "array", 
            "items": "string"
        }
    }, 
    {
        "extype": [
            "null", 
            {
                "type": "array", 
                "items": "string"
            }
        ]
    }
]

4. Link validation

Once a document has been preprocessed, an implementation may validate links. The link validation traversal may visit fields which the schema designates as link fields and check that each URI references an existing object in the current document, an imported document, file system, or network resource. Failure to validate links may be a fatal error. Link validation behavior for individual fields may be modified by identity and noLinkCheck in the jsonldPredicate section of the field schema.

5. Schema

5.1 SaladRecordSchema

Fields

fieldtyperequireddescription
namestringTrue

The identifier for this type

typeRecord_symbolTrue

Must be record

fieldsarray<SaladRecordField>False

Defines the fields of the record.

docstring | array<string>False

A documentation string for this type, or an array of strings which should be concatenated.

docParentstringFalse

Hint to indicate that during documentation generation, documentation for this type should appear in a subsection under docParent.

docChildstring | array<string>False

Hint to indicate that during documentation generation, documentation for docChild should appear in a subsection under this type.

docAfterstringFalse

Hint to indicate that during documentation generation, documentation for this type should appear after the docAfter section at the same level.

jsonldPredicatestring | JsonldPredicateFalse

Annotate this type with linked data context.

documentRootbooleanFalse

If true, indicates that the type is a valid at the document root. At least one type in a schema must be tagged with documentRoot: true.

abstractbooleanFalse

If true, this record is abstract and may be used as a base for other records, but is not valid on its own.

extendsstring | array<string>False

Indicates that this record inherits fields from one or more base records.

specializearray<SpecializeDef>False

Only applies if extends is declared. Apply type specialization using the base record as a template. For each field inherited from the base record, replace any instance of the type specializeFrom with specializeTo.

5.1.1 SaladRecordField

A field of a record.

Fields

fieldtyperequireddescription
namestringTrue

The name of the field

typePrimitiveType | RecordSchema | EnumSchema | ArraySchema | string | array<PrimitiveType | RecordSchema | EnumSchema | ArraySchema | string>True

The field type

docstringFalse

A documentation string for this field

jsonldPredicatestring | JsonldPredicateFalse

Annotate this type with linked data context.

5.1.1.1 PrimitiveType

Salad data types are based on Avro schema declarations. Refer to the Avro schema declaration documentation for detailed information.

Symbols

symboldescription
null no value
boolean a binary value
int 32-bit signed integer
long 64-bit signed integer
float single precision (32-bit) IEEE 754 floating-point number
double double precision (64-bit) IEEE 754 floating-point number
string Unicode character sequence

5.1.1.2 Any

The Any type validates for any non-null value.

Symbols

symboldescription
Any

5.1.1.3 RecordSchema

Fields

fieldtyperequireddescription
typeRecord_symbolTrue

Must be record

fieldsarray<RecordField>False

Defines the fields of the record.

5.1.1.4 RecordField

A field of a record.

Fields

fieldtyperequireddescription
namestringTrue

The name of the field

typePrimitiveType | RecordSchema | EnumSchema | ArraySchema | string | array<PrimitiveType | RecordSchema | EnumSchema | ArraySchema | string>True

The field type

docstringFalse

A documentation string for this field

5.1.1.4.1 EnumSchema

Define an enumerated type.

Fields

fieldtyperequireddescription
symbolsarray<string>True

Defines the set of valid symbols.

typeEnum_symbolTrue

Must be enum

5.1.1.4.2 ArraySchema

Fields

fieldtyperequireddescription
itemsPrimitiveType | RecordSchema | EnumSchema | ArraySchema | string | array<PrimitiveType | RecordSchema | EnumSchema | ArraySchema | string>True

Defines the type of the array elements.

typeArray_symbolTrue

Must be array

5.1.1.5 JsonldPredicate

Attached to a record field to define how the parent record field is handled for URI resolution and JSON-LD context generation.

Fields

fieldtyperequireddescription
_idstringFalse

The predicate URI that this field corresponds to. Corresponds to JSON-LD @id directive.

_typestringFalse

The context type hint, corresponds to JSON-LD @type directive.

  • If the value of this field is @id and identity is false or unspecified, the parent field must be resolved using the link resolution rules. If identity is true, the parent field must be resolved using the identifier expansion rules.

  • If the value of this field is @vocab, the parent field must be resolved using the vocabulary resolution rules.

_containerstringFalse

Structure hint, corresponds to JSON-LD @container directive.

identitybooleanFalse

If true and _type is @id this indicates that the parent field must be resolved according to identity resolution rules instead of link resolution rules. In addition, the field value is considered an assertion that the linked value exists; absence of an object in the loaded document with the URI is not an error.

noLinkCheckbooleanFalse

If true, this indicates that link validation traversal must stop at this field. This field (it is is a URI) or any fields under it (if it is an object or array) are not subject to link checking.

mapSubjectstringFalse

If the value of the field is a JSON object, it must be transformed into an array of JSON objects, where each key-value pair from the source JSON object is a list item, the list items must be JSON objects, and the key is assigned to the field specified by mapSubject.

mapPredicatestringFalse

Only applies if mapSubject is also provided. If the value of the field is a JSON object, it is transformed as described in mapSubject, with the addition that when the value of a map item is not an object, the item is transformed to a JSON object with the key assigned to the field specified by mapSubject and the value assigned to the field specified by mapPredicate.

refScopeintFalse

If the field contains a relative reference, it must be resolved by searching for valid document references in each successive parent scope in the document fragment. For example, a reference of foo in the context #foo/bar/baz will first check for the existence of #foo/bar/baz/foo, followed by #foo/bar/foo, then #foo/foo and then finally #foo. The first valid URI in the search order shall be used as the fully resolved value of the identifier. The value of the refScope field is the specified number of levels from the containing identifer scope before starting the search, so if refScope: 2 then "baz" and "bar" must be stripped to get the base #foo and search #foo/foo and the #foo. The last scope searched must be the top level scope before determining if the identifier cannot be resolved.

typeDSLbooleanFalse

Field must be expanded based on the the Schema Salad type DSL.

5.1.2 SpecializeDef

Fields

fieldtyperequireddescription
specializeFromstringTrue

The data type to be replaced

specializeTostringTrue

The new data type to replace with

5.2 SaladEnumSchema

Define an enumerated type.

Fields

fieldtyperequireddescription
symbolsarray<string>True

Defines the set of valid symbols.

typeEnum_symbolTrue

Must be enum

docstring | array<string>False

A documentation string for this type, or an array of strings which should be concatenated.

docParentstringFalse

Hint to indicate that during documentation generation, documentation for this type should appear in a subsection under docParent.

docChildstring | array<string>False

Hint to indicate that during documentation generation, documentation for docChild should appear in a subsection under this type.

docAfterstringFalse

Hint to indicate that during documentation generation, documentation for this type should appear after the docAfter section at the same level.

jsonldPredicatestring | JsonldPredicateFalse

Annotate this type with linked data context.

documentRootbooleanFalse

If true, indicates that the type is a valid at the document root. At least one type in a schema must be tagged with documentRoot: true.

extendsstring | array<string>False

Indicates that this enum inherits symbols from a base enum.

5.3 Documentation

A documentation section. This type exists to facilitate self-documenting schemas but has no role in formal validation.

Fields

fieldtyperequireddescription
namestringTrue

The identifier for this type

typeDocumentation_symbolTrue

Must be documentation

docstring | array<string>False

A documentation string for this type, or an array of strings which should be concatenated.

docParentstringFalse

Hint to indicate that during documentation generation, documentation for this type should appear in a subsection under docParent.

docChildstring | array<string>False

Hint to indicate that during documentation generation, documentation for docChild should appear in a subsection under this type.

docAfterstringFalse

Hint to indicate that during documentation generation, documentation for this type should appear after the docAfter section at the same level.

schema-salad-2.6.20171201034858/schema_salad/metaschema/map_res_src.yml0000644000175100017510000000013113060036611025277 0ustar peterpeter00000000000000{ "mapped": { "shaggy": { "value": "scooby" }, "fred": "daphne" } }schema-salad-2.6.20171201034858/schema_salad/metaschema/link_res_schema.yml0000644000175100017510000000041012651763266026152 0ustar peterpeter00000000000000{ "$namespaces": { "acid": "http://example.com/acid#" }, "$graph": [{ "name": "ExampleType", "type": "record", "fields": [{ "name": "link", "type": "string", "jsonldPredicate": { "_type": "@id" } }] }] } schema-salad-2.6.20171201034858/schema_salad/metaschema/field_name_schema.yml0000644000175100017510000000040112651763266026427 0ustar peterpeter00000000000000{ "$namespaces": { "acid": "http://example.com/acid#" }, "$graph": [{ "name": "ExampleType", "type": "record", "fields": [{ "name": "base", "type": "string", "jsonldPredicate": "http://example.com/base" }] }] } schema-salad-2.6.20171201034858/schema_salad/metaschema/typedsl_res.yml0000644000175100017510000000137513060036611025352 0ustar peterpeter00000000000000- | ## Domain Specific Language for types Fields may be tagged `typeDSL: true`. If so, the field is expanded using the following micro-DSL for schema salad types: * If the type ends with a question mark `?` it is expanded to a union with `null` * If the type ends with square brackets `[]` it is expanded to an array with items of the preceeding type symbol * The type may end with both `[]?` to indicate it is an optional array. * Identifier resolution is applied after type DSL expansion. ### Type DSL example Given the following schema: ``` - $include: typedsl_res_schema.yml - | ``` Process the following example: ``` - $include: typedsl_res_src.yml - | ``` This becomes: ``` - $include: typedsl_res_proc.yml - | ``` schema-salad-2.6.20171201034858/schema_salad/metaschema/link_res_src.yml0000644000175100017510000000047112651763266025510 0ustar peterpeter00000000000000{ "$base": "http://example.com/base", "link": "http://example.com/base/zero", "form": { "link": "one", "things": [ { "link": "two" }, { "link": "#three", }, { "link": "four#five", }, { "link": "acid:six", } ] } } schema-salad-2.6.20171201034858/schema_salad/metaschema/map_res_proc.yml0000644000175100017510000000026613060036611025464 0ustar peterpeter00000000000000{ "mapped": [ { "value": "daphne", "key": "fred" }, { "value": "scooby", "key": "shaggy" } ] }schema-salad-2.6.20171201034858/schema_salad/metaschema/field_name_proc.yml0000644000175100017510000000025313057626000026120 0ustar peterpeter00000000000000 { "base": "one", "form": { "base": "two", "http://example.com/three": "three", }, "http://example.com/acid#four": "four" } schema-salad-2.6.20171201034858/schema_salad/metaschema/ident_res_schema.yml0000644000175100017510000000035312651763266026326 0ustar peterpeter00000000000000{ "$namespaces": { "acid": "http://example.com/acid#" }, "$graph": [{ "name": "ExampleType", "type": "record", "fields": [{ "name": "id", "type": "string", "jsonldPredicate": "@id" }] }] } schema-salad-2.6.20171201034858/schema_salad/metaschema/link_res.yml0000644000175100017510000000341612651763266024643 0ustar peterpeter00000000000000- | ## Link resolution The schema may designate one or more fields as link fields reference other objects. Processing must resolve links to either absolute URIs using the following rules: * If a reference URI is prefixed with `#` it is a relative fragment identifier. It is resolved relative to the base URI by setting or replacing the fragment portion of the base URI. * If a reference URI does not contain a scheme and is not prefixed with `#` it is a path relative reference. If the reference URI contains `#` in any position other than the first character, the reference URI must be divided into a path portion and a fragment portion split on the first instance of `#`. The path portion is resolved relative to the base URI by the following rule: if the path portion of the base URI ends in a slash `/`, append the path portion of the reference URI to the path portion of the base URI. If the path portion of the base URI does not end in a slash, replace the final path segment with the path portion of the reference URI. Replace the fragment portion of the base URI with the fragment portion of the reference URI. * If a reference URI begins with a namespace prefix declared in `$namespaces` followed by a colon `:`, the prefix and colon must be replaced by the namespace declared in `$namespaces`. * If a reference URI is an absolute URI consisting of a scheme and path, no processing occurs. Link resolution must not affect the base URI used to resolve identifiers and other links. ### Link resolution example Given the following schema: ``` - $include: link_res_schema.yml - | ``` Process the following example: ``` - $include: link_res_src.yml - | ``` This becomes: ``` - $include: link_res_proc.yml - | ``` schema-salad-2.6.20171201034858/schema_salad/metaschema/metaschema.yml0000644000175100017510000002360313203345013025120 0ustar peterpeter00000000000000$base: "https://w3id.org/cwl/salad#" $namespaces: sld: "https://w3id.org/cwl/salad#" dct: "http://purl.org/dc/terms/" rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdfs: "http://www.w3.org/2000/01/rdf-schema#" xsd: "http://www.w3.org/2001/XMLSchema#" $graph: - name: "Semantic_Annotations_for_Linked_Avro_Data" type: documentation doc: - $include: salad.md - $import: field_name.yml - $import: ident_res.yml - $import: link_res.yml - $import: vocab_res.yml - $include: import_include.md - $import: map_res.yml - $import: typedsl_res.yml - name: "Link_Validation" type: documentation doc: | # Link validation Once a document has been preprocessed, an implementation may validate links. The link validation traversal may visit fields which the schema designates as link fields and check that each URI references an existing object in the current document, an imported document, file system, or network resource. Failure to validate links may be a fatal error. Link validation behavior for individual fields may be modified by `identity` and `noLinkCheck` in the `jsonldPredicate` section of the field schema. - name: "Schema_validation" type: documentation doc: "" # - name: "JSON_LD_Context" # type: documentation # doc: | # # Generating JSON-LD Context # How to generate the json-ld context... - $import: metaschema_base.yml - name: JsonldPredicate type: record doc: | Attached to a record field to define how the parent record field is handled for URI resolution and JSON-LD context generation. fields: - name: _id type: string? jsonldPredicate: _id: sld:_id _type: "@id" identity: true doc: | The predicate URI that this field corresponds to. Corresponds to JSON-LD `@id` directive. - name: _type type: string? doc: | The context type hint, corresponds to JSON-LD `@type` directive. * If the value of this field is `@id` and `identity` is false or unspecified, the parent field must be resolved using the link resolution rules. If `identity` is true, the parent field must be resolved using the identifier expansion rules. * If the value of this field is `@vocab`, the parent field must be resolved using the vocabulary resolution rules. - name: _container type: string? doc: | Structure hint, corresponds to JSON-LD `@container` directive. - name: identity type: boolean? doc: | If true and `_type` is `@id` this indicates that the parent field must be resolved according to identity resolution rules instead of link resolution rules. In addition, the field value is considered an assertion that the linked value exists; absence of an object in the loaded document with the URI is not an error. - name: noLinkCheck type: boolean? doc: | If true, this indicates that link validation traversal must stop at this field. This field (it is is a URI) or any fields under it (if it is an object or array) are not subject to link checking. - name: mapSubject type: string? doc: | If the value of the field is a JSON object, it must be transformed into an array of JSON objects, where each key-value pair from the source JSON object is a list item, the list items must be JSON objects, and the key is assigned to the field specified by `mapSubject`. - name: mapPredicate type: string? doc: | Only applies if `mapSubject` is also provided. If the value of the field is a JSON object, it is transformed as described in `mapSubject`, with the addition that when the value of a map item is not an object, the item is transformed to a JSON object with the key assigned to the field specified by `mapSubject` and the value assigned to the field specified by `mapPredicate`. - name: refScope type: int? doc: | If the field contains a relative reference, it must be resolved by searching for valid document references in each successive parent scope in the document fragment. For example, a reference of `foo` in the context `#foo/bar/baz` will first check for the existence of `#foo/bar/baz/foo`, followed by `#foo/bar/foo`, then `#foo/foo` and then finally `#foo`. The first valid URI in the search order shall be used as the fully resolved value of the identifier. The value of the refScope field is the specified number of levels from the containing identifer scope before starting the search, so if `refScope: 2` then "baz" and "bar" must be stripped to get the base `#foo` and search `#foo/foo` and the `#foo`. The last scope searched must be the top level scope before determining if the identifier cannot be resolved. - name: typeDSL type: boolean? doc: | Field must be expanded based on the the Schema Salad type DSL. - name: SpecializeDef type: record fields: - name: specializeFrom type: string doc: "The data type to be replaced" jsonldPredicate: _id: "sld:specializeFrom" _type: "@id" refScope: 1 - name: specializeTo type: string doc: "The new data type to replace with" jsonldPredicate: _id: "sld:specializeTo" _type: "@id" refScope: 1 - name: NamedType type: record abstract: true docParent: "#Schema" fields: - name: name type: string jsonldPredicate: "@id" doc: "The identifier for this type" - name: inVocab type: boolean? doc: | By default or if "true", include the short name of this type in the vocabulary (the keys of the JSON-LD context). If false, do not include the short name in the vocabulary. - name: DocType type: record abstract: true docParent: "#Schema" fields: - name: doc type: - string? - string[]? doc: "A documentation string for this type, or an array of strings which should be concatenated." jsonldPredicate: "rdfs:comment" - name: docParent type: string? doc: | Hint to indicate that during documentation generation, documentation for this type should appear in a subsection under `docParent`. jsonldPredicate: _id: "sld:docParent" _type: "@id" - name: docChild type: - string? - string[]? doc: | Hint to indicate that during documentation generation, documentation for `docChild` should appear in a subsection under this type. jsonldPredicate: _id: "sld:docChild" _type: "@id" - name: docAfter type: string? doc: | Hint to indicate that during documentation generation, documentation for this type should appear after the `docAfter` section at the same level. jsonldPredicate: _id: "sld:docAfter" _type: "@id" - name: SchemaDefinedType type: record extends: DocType doc: | Abstract base for schema-defined types. abstract: true fields: - name: jsonldPredicate type: - string? - JsonldPredicate? doc: | Annotate this type with linked data context. jsonldPredicate: sld:jsonldPredicate - name: documentRoot type: boolean? doc: | If true, indicates that the type is a valid at the document root. At least one type in a schema must be tagged with `documentRoot: true`. - name: SaladRecordField type: record extends: RecordField doc: "A field of a record." fields: - name: jsonldPredicate type: - string? - JsonldPredicate? doc: | Annotate this type with linked data context. jsonldPredicate: "sld:jsonldPredicate" - name: SaladRecordSchema docParent: "#Schema" type: record extends: [NamedType, RecordSchema, SchemaDefinedType] documentRoot: true specialize: RecordField: SaladRecordField fields: - name: abstract type: boolean? doc: | If true, this record is abstract and may be used as a base for other records, but is not valid on its own. - name: extends type: - string? - string[]? jsonldPredicate: _id: "sld:extends" _type: "@id" refScope: 1 doc: | Indicates that this record inherits fields from one or more base records. - name: specialize type: - SpecializeDef[]? doc: | Only applies if `extends` is declared. Apply type specialization using the base record as a template. For each field inherited from the base record, replace any instance of the type `specializeFrom` with `specializeTo`. jsonldPredicate: _id: "sld:specialize" mapSubject: specializeFrom mapPredicate: specializeTo - name: SaladEnumSchema docParent: "#Schema" type: record extends: [NamedType, EnumSchema, SchemaDefinedType] documentRoot: true doc: | Define an enumerated type. fields: - name: extends type: - string? - string[]? jsonldPredicate: _id: "sld:extends" _type: "@id" refScope: 1 doc: | Indicates that this enum inherits symbols from a base enum. - name: Documentation type: record docParent: "#Schema" extends: [NamedType, DocType] documentRoot: true doc: | A documentation section. This type exists to facilitate self-documenting schemas but has no role in formal validation. fields: - name: type doc: "Must be `documentation`" type: name: Documentation_symbol type: enum symbols: - "sld:documentation" jsonldPredicate: _id: "sld:type" _type: "@vocab" typeDSL: true refScope: 2 schema-salad-2.6.20171201034858/schema_salad/metaschema/map_res_schema.yml0000644000175100017510000000100613060036611025752 0ustar peterpeter00000000000000{ "$graph": [{ "name": "MappedType", "type": "record", "documentRoot": true, "fields": [{ "name": "mapped", "type": { "type": "array", "items": "ExampleRecord" }, "jsonldPredicate": { "mapSubject": "key", "mapPredicate": "value" } }], }, { "name": "ExampleRecord", "type": "record", "fields": [{ "name": "key", "type": "string" }, { "name": "value", "type": "string" } ] }] } schema-salad-2.6.20171201034858/schema_salad/metaschema/typedsl_res_proc.yml0000644000175100017510000000061213060036611026366 0ustar peterpeter00000000000000[ { "extype": "string" }, { "extype": [ "null", "string" ] }, { "extype": { "type": "array", "items": "string" } }, { "extype": [ "null", { "type": "array", "items": "string" } ] } ] schema-salad-2.6.20171201034858/schema_salad/metaschema/vocab_res_schema.yml0000644000175100017510000000053113060022334026266 0ustar peterpeter00000000000000{ "$namespaces": { "acid": "http://example.com/acid#" }, "$graph": [{ "name": "Colors", "type": "enum", "symbols": ["acid:red"] }, { "name": "ExampleType", "type": "record", "fields": [{ "name": "voc", "type": "string", "jsonldPredicate": { "_type": "@vocab" } }] }] } schema-salad-2.6.20171201034858/schema_salad/metaschema/vocab_res_src.yml0000644000175100017510000000041312651763266025641 0ustar peterpeter00000000000000 { "form": { "things": [ { "voc": "red", }, { "voc": "http://example.com/acid#red", }, { "voc": "http://example.com/acid#blue", } ] } } schema-salad-2.6.20171201034858/schema_salad/metaschema/ident_res.yml0000644000175100017510000000323412651763266025007 0ustar peterpeter00000000000000- | ## Identifier resolution The schema may designate one or more fields as identifier fields to identify specific objects. Processing must resolve relative identifiers to absolute identifiers using the following rules: * If an identifier URI is prefixed with `#` it is a URI relative fragment identifier. It is resolved relative to the base URI by setting or replacing the fragment portion of the base URI. * If an identifier URI does not contain a scheme and is not prefixed `#` it is a parent relative fragment identifier. It is resolved relative to the base URI by the following rule: if the base URI does not contain a document fragment, set the fragment portion of the base URI. If the base URI does contain a document fragment, append a slash `/` followed by the identifier field to the fragment portion of the base URI. * If an identifier URI begins with a namespace prefix declared in `$namespaces` followed by a colon `:`, the prefix and colon must be replaced by the namespace declared in `$namespaces`. * If an identifier URI is an absolute URI consisting of a scheme and path, no processing occurs. When preprocessing visits a node containing an identifier, that identifier must be used as the base URI to process child nodes. It is an error for more than one object in a document to have the same absolute URI. ### Identifier resolution example Given the following schema: ``` - $include: ident_res_schema.yml - | ``` Process the following example: ``` - $include: ident_res_src.yml - | ``` This becomes: ``` - $include: ident_res_proc.yml - | ``` schema-salad-2.6.20171201034858/schema_salad/metaschema/field_name.yml0000644000175100017510000000246012651763266025116 0ustar peterpeter00000000000000- | ## Field name resolution The document schema declares the vocabulary of known field names. During preprocessing traversal, field name in the document which are not part of the schema vocabulary must be resolved to absolute URIs. Under "strict" validation, it is an error for a document to include fields which are not part of the vocabulary and not resolvable to absolute URIs. Fields names which are not part of the vocabulary are resolved using the following rules: * If an field name URI begins with a namespace prefix declared in the document context (`@context`) followed by a colon `:`, the prefix and colon must be replaced by the namespace declared in `@context`. * If there is a vocabulary term which maps to the URI of a resolved field, the field name must be replace with the vocabulary term. * If a field name URI is an absolute URI consisting of a scheme and path and is not part of the vocabulary, no processing occurs. Field name resolution is not relative. It must not be affected by the base URI. ### Field name resolution example Given the following schema: ``` - $include: field_name_schema.yml - | ``` Process the following example: ``` - $include: field_name_src.yml - | ``` This becomes: ``` - $include: field_name_proc.yml - | ``` schema-salad-2.6.20171201034858/schema_salad/metaschema/salad.md0000644000175100017510000002512513060036611023677 0ustar peterpeter00000000000000# Semantic Annotations for Linked Avro Data (SALAD) Author: * Peter Amstutz , Curoverse Contributors: * The developers of Apache Avro * The developers of JSON-LD * Nebojša Tijanić , Seven Bridges Genomics # Abstract Salad is a schema language for describing structured linked data documents in JSON or YAML documents. A Salad schema provides rules for preprocessing, structural validation, and link checking for documents described by a Salad schema. Salad builds on JSON-LD and the Apache Avro data serialization system, and extends Avro with features for rich data modeling such as inheritance, template specialization, object identifiers, and object references. Salad was developed to provide a bridge between the record oriented data modeling supported by Apache Avro and the Semantic Web. # Status of This Document This document is the product of the [Common Workflow Language working group](https://groups.google.com/forum/#!forum/common-workflow-language). The latest version of this document is available in the "schema_salad" repository at https://github.com/common-workflow-language/schema_salad The products of the CWL working group (including this document) are made available under the terms of the Apache License, version 2.0. # Introduction The JSON data model is an extremely popular way to represent structured data. It is attractive because of its relative simplicity and is a natural fit with the standard types of many programming languages. However, this simplicity means that basic JSON lacks expressive features useful for working with complex data structures and document formats, such as schemas, object references, and namespaces. JSON-LD is a W3C standard providing a way to describe how to interpret a JSON document as Linked Data by means of a "context". JSON-LD provides a powerful solution for representing object references and namespaces in JSON based on standard web URIs, but is not itself a schema language. Without a schema providing a well defined structure, it is difficult to process an arbitrary JSON-LD document as idiomatic JSON because there are many ways to express the same data that are logically equivalent but structurally distinct. Several schema languages exist for describing and validating JSON data, such as the Apache Avro data serialization system, however none understand linked data. As a result, to fully take advantage of JSON-LD to build the next generation of linked data applications, one must maintain separate JSON schema, JSON-LD context, RDF schema, and human documentation, despite significant overlap of content and obvious need for these documents to stay synchronized. Schema Salad is designed to address this gap. It provides a schema language and processing rules for describing structured JSON content permitting URI resolution and strict document validation. The schema language supports linked data through annotations that describe the linked data interpretation of the content, enables generation of JSON-LD context and RDF schema, and production of RDF triples by applying the JSON-LD context. The schema language also provides for robust support of inline documentation. ## Introduction to v1.0 This is the second version of of the Schema Salad specification. It is developed concurrently with v1.0 of the Common Workflow Language for use in specifying the Common Workflow Language, however Schema Salad is intended to be useful to a broader audience. Compared to the draft-1 schema salad specification, the following changes have been made: * Use of [mapSubject and mapPredicate](#Identifier_maps) to transform maps to lists of records. * Resolution of the [domain Specific Language for types](#Domain_Specific_Language_for_types) * Consolidation of the formal [schema into section 5](#Schema). ## References to Other Specifications **Javascript Object Notation (JSON)**: http://json.org **JSON Linked Data (JSON-LD)**: http://json-ld.org **YAML**: http://yaml.org **Avro**: https://avro.apache.org/docs/current/spec.html **Uniform Resource Identifier (URI) Generic Syntax**: https://tools.ietf.org/html/rfc3986) **Resource Description Framework (RDF)**: http://www.w3.org/RDF/ **UTF-8**: https://www.ietf.org/rfc/rfc2279.txt) ## Scope This document describes the syntax, data model, algorithms, and schema language for working with Salad documents. It is not intended to document a specific implementation of Salad, however it may serve as a reference for the behavior of conforming implementations. ## Terminology The terminology used to describe Salad documents is defined in the Concepts section of the specification. The terms defined in the following list are used in building those definitions and in describing the actions of an Salad implementation: **may**: Conforming Salad documents and Salad implementations are permitted but not required to be interpreted as described. **must**: Conforming Salad documents and Salad implementations are required to be interpreted as described; otherwise they are in error. **error**: A violation of the rules of this specification; results are undefined. Conforming implementations may detect and report an error and may recover from it. **fatal error**: A violation of the rules of this specification; results are undefined. Conforming implementations must not continue to process the document and may report an error. **at user option**: Conforming software may or must (depending on the modal verb in the sentence) behave as described; if it does, it must provide users a means to enable or disable the behavior described. # Document model ## Data concepts An **object** is a data structure equivalent to the "object" type in JSON, consisting of a unordered set of name/value pairs (referred to here as **fields**) and where the name is a string and the value is a string, number, boolean, array, or object. A **document** is a file containing a serialized object, or an array of objects. A **document type** is a class of files that share a common structure and semantics. A **document schema** is a formal description of the grammar of a document type. A **base URI** is a context-dependent URI used to resolve relative references. An **identifier** is a URI that designates a single document or single object within a document. A **vocabulary** is the set of symbolic field names and enumerated symbols defined by a document schema, where each term maps to absolute URI. ## Syntax Conforming Salad documents are serialized and loaded using YAML syntax and UTF-8 text encoding. Salad documents are written using the JSON-compatible subset of YAML. Features of YAML such as headers and type tags that are not found in the standard JSON data model must not be used in conforming Salad documents. It is a fatal error if the document is not valid YAML. A Salad document must consist only of either a single root object or an array of objects. ## Document context ### Implied context The implicit context consists of the vocabulary defined by the schema and the base URI. By default, the base URI must be the URI that was used to load the document. It may be overridden by an explicit context. ### Explicit context If a document consists of a root object, this object may contain the fields `$base`, `$namespaces`, `$schemas`, and `$graph`: * `$base`: Must be a string. Set the base URI for the document used to resolve relative references. * `$namespaces`: Must be an object with strings as values. The keys of the object are namespace prefixes used in the document; the values of the object are the prefix expansions. * `$schemas`: Must be an array of strings. This field may list URI references to documents in RDF-XML format which will be queried for RDF schema data. The subjects and predicates described by the RDF schema may provide additional semantic context for the document, and may be used for validation of prefixed extension fields found in the document. Other directives beginning with `$` must be ignored. ## Document graph If a document consists of a single root object, this object may contain the field `$graph`. This field must be an array of objects. If present, this field holds the primary content of the document. A document that consists of array of objects at the root is an implicit graph. ## Document metadata If a document consists of a single root object, metadata about the document, such as authorship, may be declared in the root object. ## Document schema Document preprocessing, link validation and schema validation require a document schema. A schema may consist of: * At least one record definition object which defines valid fields that make up a record type. Record field definitions include the valid types that may be assigned to each field and annotations to indicate fields that represent identifiers and links, described below in "Semantic Annotations". * Any number of enumerated type objects which define a set of finite set of symbols that are valid value of the type. * Any number of documentation objects which allow in-line documentation of the schema. The schema for defining a salad schema (the metaschema) is described in detail in "Schema validation". ### Record field annotations In a document schema, record field definitions may include the field `jsonldPredicate`, which may be either a string or object. Implementations must use the following document preprocessing of fields by the following rules: * If the value of `jsonldPredicate` is `@id`, the field is an identifier field. * If the value of `jsonldPredicate` is an object, and contains that object contains the field `_type` with the value `@id`, the field is a link field. * If the value of `jsonldPredicate` is an object, and contains that object contains the field `_type` with the value `@vocab`, the field is a vocabulary field, which is a subtype of link field. ## Document traversal To perform document document preprocessing, link validation and schema validation, the document must be traversed starting from the fields or array items of the root object or array and recursively visiting each child item which contains an object or arrays. # Document preprocessing After processing the explicit context (if any), document preprocessing begins. Starting from the document root, object fields values or array items which contain objects or arrays are recursively traversed depth-first. For each visited object, field names, identifier fields, link fields, vocabulary fields, and `$import` and `$include` directives must be processed as described in this section. The order of traversal of child nodes within a parent node is undefined. schema-salad-2.6.20171201034858/schema_salad/metaschema/ident_res_src.yml0000644000175100017510000000052212651763266025653 0ustar peterpeter00000000000000 { "id": "http://example.com/base", "form": { "id": "one", "things": [ { "id": "two" }, { "id": "#three", }, { "id": "four#five", }, { "id": "acid:six", } ] } } schema-salad-2.6.20171201034858/schema_salad/metaschema/metaschema_base.yml0000644000175100017510000000716013060036611026114 0ustar peterpeter00000000000000$base: "https://w3id.org/cwl/salad#" $namespaces: sld: "https://w3id.org/cwl/salad#" dct: "http://purl.org/dc/terms/" rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdfs: "http://www.w3.org/2000/01/rdf-schema#" xsd: "http://www.w3.org/2001/XMLSchema#" $graph: - name: "Schema" type: documentation doc: | # Schema - name: PrimitiveType type: enum symbols: - "sld:null" - "xsd:boolean" - "xsd:int" - "xsd:long" - "xsd:float" - "xsd:double" - "xsd:string" doc: - | Salad data types are based on Avro schema declarations. Refer to the [Avro schema declaration documentation](https://avro.apache.org/docs/current/spec.html#schemas) for detailed information. - "null: no value" - "boolean: a binary value" - "int: 32-bit signed integer" - "long: 64-bit signed integer" - "float: single precision (32-bit) IEEE 754 floating-point number" - "double: double precision (64-bit) IEEE 754 floating-point number" - "string: Unicode character sequence" - name: Any type: enum symbols: ["#Any"] docAfter: "#PrimitiveType" doc: | The **Any** type validates for any non-null value. - name: RecordField type: record doc: A field of a record. fields: - name: name type: string jsonldPredicate: "@id" doc: | The name of the field - name: doc type: string? doc: | A documentation string for this field jsonldPredicate: "rdfs:comment" - name: type type: - PrimitiveType - RecordSchema - EnumSchema - ArraySchema - string - type: array items: - PrimitiveType - RecordSchema - EnumSchema - ArraySchema - string jsonldPredicate: _id: sld:type _type: "@vocab" typeDSL: true refScope: 2 doc: | The field type - name: RecordSchema type: record fields: type: doc: "Must be `record`" type: name: Record_symbol type: enum symbols: - "sld:record" jsonldPredicate: _id: "sld:type" _type: "@vocab" typeDSL: true refScope: 2 fields: type: RecordField[]? jsonldPredicate: _id: sld:fields mapSubject: name mapPredicate: type doc: "Defines the fields of the record." - name: EnumSchema type: record doc: | Define an enumerated type. fields: type: doc: "Must be `enum`" type: name: Enum_symbol type: enum symbols: - "sld:enum" jsonldPredicate: _id: "sld:type" _type: "@vocab" typeDSL: true refScope: 2 symbols: type: string[] jsonldPredicate: _id: "sld:symbols" _type: "@id" identity: true doc: "Defines the set of valid symbols." - name: ArraySchema type: record fields: type: doc: "Must be `array`" type: name: Array_symbol type: enum symbols: - "sld:array" jsonldPredicate: _id: "sld:type" _type: "@vocab" typeDSL: true refScope: 2 items: type: - PrimitiveType - RecordSchema - EnumSchema - ArraySchema - string - type: array items: - PrimitiveType - RecordSchema - EnumSchema - ArraySchema - string jsonldPredicate: _id: "sld:items" _type: "@vocab" refScope: 2 doc: "Defines the type of the array elements." schema-salad-2.6.20171201034858/schema_salad/metaschema/typedsl_res_src.yml0000644000175100017510000000015713060036611026216 0ustar peterpeter00000000000000[{ "extype": "string" }, { "extype": "string?" }, { "extype": "string[]" }, { "extype": "string[]?" }] schema-salad-2.6.20171201034858/schema_salad/metaschema/map_res.yml0000644000175100017510000000164513060036611024443 0ustar peterpeter00000000000000- | ## Identifier maps The schema may designate certain fields as having a `mapSubject`. If the value of the field is a JSON object, it must be transformed into an array of JSON objects. Each key-value pair from the source JSON object is a list item, each list item must be a JSON objects, and the value of the key is assigned to the field specified by `mapSubject`. Fields which have `mapSubject` specified may also supply a `mapPredicate`. If the value of a map item is not a JSON object, the item is transformed to a JSON object with the key assigned to the field specified by `mapSubject` and the value assigned to the field specified by `mapPredicate`. ### Identifier map example Given the following schema: ``` - $include: map_res_schema.yml - | ``` Process the following example: ``` - $include: map_res_src.yml - | ``` This becomes: ``` - $include: map_res_proc.yml - | ``` schema-salad-2.6.20171201034858/schema_salad/metaschema/metaschema2.yml0000644000175100017510000002271713165562750025227 0ustar peterpeter00000000000000$base: "https://w3id.org/cwl/salad#" $namespaces: sld: "https://w3id.org/cwl/salad#" dct: "http://purl.org/dc/terms/" rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdfs: "http://www.w3.org/2000/01/rdf-schema#" xsd: "http://www.w3.org/2001/XMLSchema#" $graph: - name: "Semantic_Annotations_for_Linked_Avro_Data" type: documentation doc: - $include: salad.md - $import: field_name.yml - $import: ident_res.yml - $import: link_res.yml - $import: vocab_res.yml - $include: import_include.md - name: "Link_Validation" type: documentation doc: | # Link validation Once a document has been preprocessed, an implementation may validate links. The link validation traversal may visit fields which the schema designates as link fields and check that each URI references an existing object in the current document, an imported document, file system, or network resource. Failure to validate links may be a fatal error. Link validation behavior for individual fields may be modified by `identity` and `noLinkCheck` in the `jsonldPredicate` section of the field schema. - name: "Schema_validation" type: documentation doc: "" # - name: "JSON_LD_Context" # type: documentation # doc: | # # Generating JSON-LD Context # How to generate the json-ld context... - $import: metaschema_base.yml - name: JsonldPredicate type: record doc: | Attached to a record field to define how the parent record field is handled for URI resolution and JSON-LD context generation. fields: - name: _id type: string? jsonldPredicate: _id: sld:_id _type: "@id" identity: true doc: | The predicate URI that this field corresponds to. Corresponds to JSON-LD `@id` directive. - name: _type type: string? doc: | The context type hint, corresponds to JSON-LD `@type` directive. * If the value of this field is `@id` and `identity` is false or unspecified, the parent field must be resolved using the link resolution rules. If `identity` is true, the parent field must be resolved using the identifier expansion rules. * If the value of this field is `@vocab`, the parent field must be resolved using the vocabulary resolution rules. - name: _container type: string? doc: | Structure hint, corresponds to JSON-LD `@container` directive. - name: identity type: boolean? doc: | If true and `_type` is `@id` this indicates that the parent field must be resolved according to identity resolution rules instead of link resolution rules. In addition, the field value is considered an assertion that the linked value exists; absence of an object in the loaded document with the URI is not an error. - name: noLinkCheck type: boolean? doc: | If true, this indicates that link validation traversal must stop at this field. This field (it is is a URI) or any fields under it (if it is an object or array) are not subject to link checking. - name: mapSubject type: string? doc: | If the value of the field is a JSON object, it must be transformed into an array of JSON objects, where each key-value pair from the source JSON object is a list item, the list items must be JSON objects, and the key is assigned to the field specified by `mapSubject`. - name: mapPredicate type: string? doc: | Only applies if `mapSubject` is also provided. If the value of the field is a JSON object, it is transformed as described in `mapSubject`, with the addition that when the value of a map item is not an object, the item is transformed to a JSON object with the key assigned to the field specified by `mapSubject` and the value assigned to the field specified by `mapPredicate`. - name: refScope type: int? doc: | If the field contains a relative reference, it must be resolved by searching for valid document references in each successive parent scope in the document fragment. For example, a reference of `foo` in the context `#foo/bar/baz` will first check for the existence of `#foo/bar/baz/foo`, followed by `#foo/bar/foo`, then `#foo/foo` and then finally `#foo`. The first valid URI in the search order shall be used as the fully resolved value of the identifier. The value of the refScope field is the specified number of levels from the containing identifer scope before starting the search, so if `refScope: 2` then "baz" and "bar" must be stripped to get the base `#foo` and search `#foo/foo` and the `#foo`. The last scope searched must be the top level scope before determining if the identifier cannot be resolved. - name: typeDSL type: boolean? doc: | Field must be expanded based on the the Schema Salad type DSL. - name: SpecializeDef type: record fields: - name: specializeFrom type: string doc: "The data type to be replaced" jsonldPredicate: _id: "sld:specializeFrom" _type: "@id" refScope: 1 - name: specializeTo type: string doc: "The new data type to replace with" jsonldPredicate: _id: "sld:specializeTo" _type: "@id" refScope: 1 - name: NamedType type: record abstract: true fields: - name: name type: string jsonldPredicate: "@id" doc: "The identifier for this type" - name: DocType type: record abstract: true fields: - name: doc type: - string? - string[]? doc: "A documentation string for this type, or an array of strings which should be concatenated." jsonldPredicate: "rdfs:comment" - name: docParent type: string? doc: | Hint to indicate that during documentation generation, documentation for this type should appear in a subsection under `docParent`. jsonldPredicate: _id: "sld:docParent" _type: "@id" - name: docChild type: - string? - string[]? doc: | Hint to indicate that during documentation generation, documentation for `docChild` should appear in a subsection under this type. jsonldPredicate: _id: "sld:docChild" _type: "@id" - name: docAfter type: string? doc: | Hint to indicate that during documentation generation, documentation for this type should appear after the `docAfter` section at the same level. jsonldPredicate: _id: "sld:docAfter" _type: "@id" - name: SchemaDefinedType type: record extends: DocType doc: | Abstract base for schema-defined types. abstract: true fields: - name: jsonldPredicate type: - string? - JsonldPredicate? doc: | Annotate this type with linked data context. jsonldPredicate: sld:jsonldPredicate - name: documentRoot type: boolean? doc: | If true, indicates that the type is a valid at the document root. At least one type in a schema must be tagged with `documentRoot: true`. - name: SaladRecordField type: record extends: RecordField doc: "A field of a record." fields: - name: jsonldPredicate type: - string? - JsonldPredicate? doc: | Annotate this type with linked data context. jsonldPredicate: "sld:jsonldPredicate" - name: SaladRecordSchema type: record extends: [NamedType, RecordSchema, SchemaDefinedType] documentRoot: true specialize: RecordField: SaladRecordField fields: - name: abstract type: boolean? doc: | If true, this record is abstract and may be used as a base for other records, but is not valid on its own. - name: extends type: - string? - string[]? jsonldPredicate: _id: "sld:extends" _type: "@id" refScope: 1 doc: | Indicates that this record inherits fields from one or more base records. - name: specialize type: - SpecializeDef[]? doc: | Only applies if `extends` is declared. Apply type specialization using the base record as a template. For each field inherited from the base record, replace any instance of the type `specializeFrom` with `specializeTo`. jsonldPredicate: _id: "sld:specialize" mapSubject: specializeFrom mapPredicate: specializeTo - name: SaladEnumSchema type: record extends: [EnumSchema, SchemaDefinedType] documentRoot: true doc: | Define an enumerated type. fields: - name: extends type: - string? - string[]? jsonldPredicate: _id: "sld:extends" _type: "@id" refScope: 1 doc: | Indicates that this enum inherits symbols from a base enum. - name: Documentation type: record extends: [NamedType, DocType] documentRoot: true doc: | A documentation section. This type exists to facilitate self-documenting schemas but has no role in formal validation. fields: type: doc: {foo: "Must be `documentation`"} type: name: Documentation_symbol type: enum symbols: - "sld:documentation" jsonldPredicate: _id: "sld:type" _type: "@vocab" typeDSL: true refScope: 2 schema-salad-2.6.20171201034858/schema_salad/metaschema/field_name_src.yml0000644000175100017510000000025312651763266025763 0ustar peterpeter00000000000000 { "base": "one", "form": { "http://example.com/base": "two", "http://example.com/three": "three", }, "acid:four": "four" } schema-salad-2.6.20171201034858/schema_salad/metaschema/ident_res_proc.yml0000644000175100017510000000056212651763266026033 0ustar peterpeter00000000000000{ "id": "http://example.com/base", "form": { "id": "http://example.com/base#one", "things": [ { "id": "http://example.com/base#one/two" }, { "id": "http://example.com/base#three" }, { "id": "http://example.com/four#five", }, { "id": "http://example.com/acid#six", } ] } } schema-salad-2.6.20171201034858/schema_salad/metaschema/import_include.md0000644000175100017510000001076312737501265025646 0ustar peterpeter00000000000000## Import During preprocessing traversal, an implementation must resolve `$import` directives. An `$import` directive is an object consisting of exactly one field `$import` specifying resource by URI string. It is an error if there are additional fields in the `$import` object, such additional fields must be ignored. The URI string must be resolved to an absolute URI using the link resolution rules described previously. Implementations must support loading from `file`, `http` and `https` resources. The URI referenced by `$import` must be loaded and recursively preprocessed as a Salad document. The external imported document does not inherit the context of the importing document, and the default base URI for processing the imported document must be the URI used to retrieve the imported document. If the `$import` URI includes a document fragment, the fragment must be excluded from the base URI used to preprocess the imported document. Once loaded and processed, the `$import` node is replaced in the document structure by the object or array yielded from the import operation. URIs may reference document fragments which refer to specific an object in the target document. This indicates that the `$import` node must be replaced by only the object with the appropriate fragment identifier. It is a fatal error if an import directive refers to an external resource or resource fragment which does not exist or is not accessible. ### Import example import.yml: ``` { "hello": "world" } ``` parent.yml: ``` { "form": { "bar": { "$import": "import.yml" } } } ``` This becomes: ``` { "form": { "bar": { "hello": "world" } } } ``` ## Include During preprocessing traversal, an implementation must resolve `$include` directives. An `$include` directive is an object consisting of exactly one field `$include` specifying a URI string. It is an error if there are additional fields in the `$include` object, such additional fields must be ignored. The URI string must be resolved to an absolute URI using the link resolution rules described previously. The URI referenced by `$include` must be loaded as a text data. Implementations must support loading from `file`, `http` and `https` resources. Implementations may transcode the character encoding of the text data to match that of the parent document, but must not interpret or parse the text document in any other way. Once loaded, the `$include` node is replaced in the document structure by a string containing the text data loaded from the resource. It is a fatal error if an import directive refers to an external resource which does not exist or is not accessible. ### Include example parent.yml: ``` { "form": { "bar": { "$include": "include.txt" } } } ``` include.txt: ``` hello world ``` This becomes: ``` { "form": { "bar": "hello world" } } ``` ## Mixin During preprocessing traversal, an implementation must resolve `$mixin` directives. An `$mixin` directive is an object consisting of the field `$mixin` specifying resource by URI string. If there are additional fields in the `$mixin` object, these fields override fields in the object which is loaded from the `$mixin` URI. The URI string must be resolved to an absolute URI using the link resolution rules described previously. Implementations must support loading from `file`, `http` and `https` resources. The URI referenced by `$mixin` must be loaded and recursively preprocessed as a Salad document. The external imported document must inherit the context of the importing document, however the file URI for processing the imported document must be the URI used to retrieve the imported document. The `$mixin` URI must not include a document fragment. Once loaded and processed, the `$mixin` node is replaced in the document structure by the object or array yielded from the import operation. URIs may reference document fragments which refer to specific an object in the target document. This indicates that the `$mixin` node must be replaced by only the object with the appropriate fragment identifier. It is a fatal error if an import directive refers to an external resource or resource fragment which does not exist or is not accessible. ### Mixin example mixin.yml: ``` { "hello": "world", "carrot": "orange" } ``` parent.yml: ``` { "form": { "bar": { "$mixin": "mixin.yml" "carrot": "cake" } } } ``` This becomes: ``` { "form": { "bar": { "hello": "world", "carrot": "cake" } } } ``` schema-salad-2.6.20171201034858/schema_salad/metaschema/vocab_res.yml0000644000175100017510000000142312651763266024774 0ustar peterpeter00000000000000- | ## Vocabulary resolution The schema may designate one or more vocabulary fields which use terms defined in the vocabulary. Processing must resolve vocabulary fields to either vocabulary terms or absolute URIs by first applying the link resolution rules defined above, then applying the following additional rule: * If a reference URI is a vocabulary field, and there is a vocabulary term which maps to the resolved URI, the reference must be replace with the vocabulary term. ### Vocabulary resolution example Given the following schema: ``` - $include: vocab_res_schema.yml - | ``` Process the following example: ``` - $include: vocab_res_src.yml - | ``` This becomes: ``` - $include: vocab_res_proc.yml - | ``` schema-salad-2.6.20171201034858/schema_salad/makedoc.py0000644000175100017510000004504413203345013022137 0ustar peterpeter00000000000000from __future__ import absolute_import import mistune import argparse import json import os import copy import re import sys import logging from io import open from . import schema from .utils import add_dictlist, aslist import six from six.moves import range from six.moves import urllib from six import StringIO from typing import cast, Any, Dict, IO, List, Optional, Set, Text, Union _logger = logging.getLogger("salad") def has_types(items): # type: (Any) -> List[Text] r = [] # type: List if isinstance(items, dict): if items["type"] == "https://w3id.org/cwl/salad#record": return [items["name"]] for n in ("type", "items", "values"): if n in items: r.extend(has_types(items[n])) return r if isinstance(items, list): for i in items: r.extend(has_types(i)) return r if isinstance(items, six.string_types): return [items] return [] def linkto(item): # type: (Text) -> Text _, frg = urllib.parse.urldefrag(item) return "[%s](#%s)" % (frg, to_id(frg)) class MyRenderer(mistune.Renderer): def __init__(self): # type: () -> None super(mistune.Renderer, self).__init__() self.options = {} def header(self, text, level, raw=None): # type: (Text, int, Any) -> Text return """%s""" % (level, to_id(text), text, level) def table(self, header, body): # type: (Text, Text) -> Text return ( '\n%s\n' '\n%s\n
\n' ) % (header, body) def to_id(text): # type: (Text) -> Text textid = text if text[0] in ("0", "1", "2", "3", "4", "5", "6", "7", "8", "9"): try: textid = text[text.index(" ") + 1:] except ValueError: pass textid = textid.replace(" ", "_") return textid class ToC(object): def __init__(self): # type: () -> None self.first_toc_entry = True self.numbering = [0] self.toc = "" self.start_numbering = True def add_entry(self, thisdepth, title): # type: (int, str) -> str depth = len(self.numbering) if thisdepth < depth: self.toc += "" for n in range(0, depth - thisdepth): self.numbering.pop() self.toc += "" self.numbering[-1] += 1 elif thisdepth == depth: if not self.first_toc_entry: self.toc += "" else: self.first_toc_entry = False self.numbering[-1] += 1 elif thisdepth > depth: self.numbering.append(1) if self.start_numbering: num = "%i.%s" % (self.numbering[0], ".".join( [str(n) for n in self.numbering[1:]])) else: num = "" self.toc += """
  • %s %s
      \n""" % (to_id(title), num, title) return num def contents(self, idn): # type: (str) -> str c = """

      Table of contents

    " c += """""" return c basicTypes = ("https://w3id.org/cwl/salad#null", "http://www.w3.org/2001/XMLSchema#boolean", "http://www.w3.org/2001/XMLSchema#int", "http://www.w3.org/2001/XMLSchema#long", "http://www.w3.org/2001/XMLSchema#float", "http://www.w3.org/2001/XMLSchema#double", "http://www.w3.org/2001/XMLSchema#string", "https://w3id.org/cwl/salad#record", "https://w3id.org/cwl/salad#enum", "https://w3id.org/cwl/salad#array") def number_headings(toc, maindoc): # type: (ToC, str) -> str mdlines = [] skip = False for line in maindoc.splitlines(): if line.strip() == "# Introduction": toc.start_numbering = True toc.numbering = [0] if "```" in line: skip = not skip if not skip: m = re.match(r'^(#+) (.*)', line) if m is not None: num = toc.add_entry(len(m.group(1)), m.group(2)) line = "%s %s %s" % (m.group(1), num, m.group(2)) line = re.sub(r'^(https?://\S+)', r'[\1](\1)', line) mdlines.append(line) maindoc = '\n'.join(mdlines) return maindoc def fix_doc(doc): # type: (Union[List[str], str]) -> str if isinstance(doc, list): docstr = "".join(doc) else: docstr = doc return "\n".join( [re.sub(r"<([^>@]+@[^>]+)>", r"[\1](mailto:\1)", d) for d in docstr.splitlines()]) class RenderType(object): def __init__(self, toc, j, renderlist, redirects, primitiveType): # type: (ToC, List[Dict], str, Dict, str) -> None self.typedoc = StringIO() self.toc = toc self.subs = {} # type: Dict[str, str] self.docParent = {} # type: Dict[str, List] self.docAfter = {} # type: Dict[str, List] self.rendered = set() # type: Set[str] self.redirects = redirects self.title = None # type: Optional[str] self.primitiveType = primitiveType for t in j: if "extends" in t: for e in aslist(t["extends"]): add_dictlist(self.subs, e, t["name"]) # if "docParent" not in t and "docAfter" not in t: # add_dictlist(self.docParent, e, t["name"]) if t.get("docParent"): add_dictlist(self.docParent, t["docParent"], t["name"]) if t.get("docChild"): for c in aslist(t["docChild"]): add_dictlist(self.docParent, t["name"], c) if t.get("docAfter"): add_dictlist(self.docAfter, t["docAfter"], t["name"]) metaschema_loader = schema.get_metaschema()[2] alltypes = schema.extend_and_specialize(j, metaschema_loader) self.typemap = {} # type: Dict self.uses = {} # type: Dict self.record_refs = {} # type: Dict for t in alltypes: self.typemap[t["name"]] = t try: if t["type"] == "record": self.record_refs[t["name"]] = [] for f in t.get("fields", []): p = has_types(f) for tp in p: if tp not in self.uses: self.uses[tp] = [] if (t["name"], f["name"]) not in self.uses[tp]: _, frg1 = urllib.parse.urldefrag(t["name"]) _, frg2 = urllib.parse.urldefrag(f["name"]) self.uses[tp].append((frg1, frg2)) if tp not in basicTypes and tp not in self.record_refs[t["name"]]: self.record_refs[t["name"]].append(tp) except KeyError as e: _logger.error("Did not find 'type' in %s", t) raise for f in alltypes: if (f["name"] in renderlist or ((not renderlist) and ("extends" not in f) and ("docParent" not in f) and ("docAfter" not in f))): self.render_type(f, 1) def typefmt(self, tp, # type: Any redirects, # type: Dict[str, str] nbsp=False, # type: bool jsonldPredicate=None # type: Optional[Dict[str, str]] ): # type: (...) -> Text if isinstance(tp, list): if nbsp and len(tp) <= 3: return " | ".join([self.typefmt(n, redirects, jsonldPredicate=jsonldPredicate) for n in tp]) else: return " | ".join([self.typefmt(n, redirects) for n in tp]) if isinstance(tp, dict): if tp["type"] == "https://w3id.org/cwl/salad#array": ar = "array<%s>" % (self.typefmt( tp["items"], redirects, nbsp=True)) if jsonldPredicate is not None and "mapSubject" in jsonldPredicate: if "mapPredicate" in jsonldPredicate: ar += " | map<%s.%s, %s.%s>" % (self.typefmt(tp["items"], redirects), jsonldPredicate[ "mapSubject"], self.typefmt( tp["items"], redirects), jsonldPredicate["mapPredicate"]) ar += " | map<%s.%s, %s>" % (self.typefmt(tp["items"], redirects), jsonldPredicate[ "mapSubject"], self.typefmt(tp["items"], redirects)) return ar if tp["type"] in ("https://w3id.org/cwl/salad#record", "https://w3id.org/cwl/salad#enum"): frg = cast(Text, schema.avro_name(tp["name"])) if tp["name"] in redirects: return """%s""" % (redirects[tp["name"]], frg) elif tp["name"] in self.typemap: return """%s""" % (to_id(frg), frg) else: return frg if isinstance(tp["type"], dict): return self.typefmt(tp["type"], redirects) else: if str(tp) in redirects: return """%s""" % (redirects[tp], redirects[tp]) elif str(tp) in basicTypes: return """%s""" % (self.primitiveType, schema.avro_name(str(tp))) else: _, frg = urllib.parse.urldefrag(tp) if frg is not '': tp = frg return """%s""" % (to_id(tp), tp) raise Exception("We should not be here!") def render_type(self, f, depth): # type: (Dict[str, Any], int) -> None if f["name"] in self.rendered or f["name"] in self.redirects: return self.rendered.add(f["name"]) if f.get("abstract"): return if "doc" not in f: f["doc"] = "" f["type"] = copy.deepcopy(f) f["doc"] = "" f = f["type"] if "doc" not in f: f["doc"] = "" def extendsfrom(item, ex): # type: (Dict[str, Any], List[Dict[str, Any]]) -> None if "extends" in item: for e in aslist(item["extends"]): ex.insert(0, self.typemap[e]) extendsfrom(self.typemap[e], ex) ex = [f] extendsfrom(f, ex) enumDesc = {} if f["type"] == "enum" and isinstance(f["doc"], list): for e in ex: for i in e["doc"]: idx = i.find(":") if idx > -1: enumDesc[i[:idx]] = i[idx + 1:] e["doc"] = [i for i in e["doc"] if i.find( ":") == -1 or i.find(" ") < i.find(":")] f["doc"] = fix_doc(f["doc"]) if f["type"] == "record": for field in f.get("fields", []): if "doc" not in field: field["doc"] = "" if f["type"] != "documentation": lines = [] for line in f["doc"].splitlines(): if len(line) > 0 and line[0] == "#": line = ("#" * depth) + line lines.append(line) f["doc"] = "\n".join(lines) _, frg = urllib.parse.urldefrag(f["name"]) num = self.toc.add_entry(depth, frg) doc = u"%s %s %s\n" % (("#" * depth), num, frg) else: doc = u"" if self.title is None and f["doc"]: title = f["doc"][0:f["doc"].index("\n")] if title.startswith('# '): self.title = title[2:] else: self.title = title if f["type"] == "documentation": f["doc"] = number_headings(self.toc, f["doc"]) # if "extends" in f: # doc += "\n\nExtends " # doc += ", ".join([" %s" % linkto(ex) for ex in aslist(f["extends"])]) # if f["name"] in self.subs: # doc += "\n\nExtended by" # doc += ", ".join([" %s" % linkto(s) for s in self.subs[f["name"]]]) # if f["name"] in self.uses: # doc += "\n\nReferenced by" # doc += ", ".join([" [%s.%s](#%s)" % (s[0], s[1], to_id(s[0])) # for s in self.uses[f["name"]]]) doc = doc + "\n\n" + f["doc"] doc = mistune.markdown(doc, renderer=MyRenderer()) if f["type"] == "record": doc += "

    Fields

    " doc += """""" doc += "" required = [] optional = [] for i in f.get("fields", []): tp = i["type"] if isinstance(tp, list) and tp[0] == "https://w3id.org/cwl/salad#null": opt = False tp = tp[1:] else: opt = True desc = i["doc"] # if "inherited_from" in i: # desc = "%s _Inherited from %s_" % (desc, linkto(i["inherited_from"])) rfrg = schema.avro_name(i["name"]) tr = ""\ "" % ( rfrg, self.typefmt(tp, self.redirects, jsonldPredicate=i.get("jsonldPredicate")), opt, mistune.markdown(desc)) if opt: required.append(tr) else: optional.append(tr) for i in required + optional: doc += "" + i + "" doc += """
    fieldtyperequireddescription
    %s%s%s%s
    """ elif f["type"] == "enum": doc += "

    Symbols

    " doc += """""" doc += "" for e in ex: for i in e.get("symbols", []): doc += "" efrg = schema.avro_name(i) doc += "" % ( efrg, enumDesc.get(efrg, "")) doc += "" doc += """
    symboldescription
    %s%s
    """ f["doc"] = doc self.typedoc.write(f["doc"]) subs = self.docParent.get(f["name"], []) + \ self.record_refs.get(f["name"], []) if len(subs) == 1: self.render_type(self.typemap[subs[0]], depth) else: for s in subs: self.render_type(self.typemap[s], depth + 1) for s in self.docAfter.get(f["name"], []): self.render_type(self.typemap[s], depth) def avrold_doc(j, outdoc, renderlist, redirects, brand, brandlink, primtype): # type: (List[Dict[Text, Any]], IO[Any], str, Dict, str, str, str) -> None toc = ToC() toc.start_numbering = False rt = RenderType(toc, j, renderlist, redirects, primtype) content = rt.typedoc.getvalue() # type: Text outdoc.write(""" """) outdoc.write("%s" % (rt.title)) outdoc.write(""" """) outdoc.write(""" """) outdoc.write("""
    """) outdoc.write("""
    """) outdoc.write("""
    """) outdoc.write(content.encode("utf-8")) outdoc.write("""
    """) outdoc.write("""
    """) def main(): # type: () -> None parser = argparse.ArgumentParser() parser.add_argument("schema") parser.add_argument('--only', action='append') parser.add_argument('--redirect', action='append') parser.add_argument('--brand') parser.add_argument('--brandlink') parser.add_argument('--primtype', default="#PrimitiveType") args = parser.parse_args() s = [] # type: List[Dict[Text, Any]] a = args.schema with open(a, encoding='utf-8') as f: if a.endswith("md"): s.append({"name": os.path.splitext(os.path.basename(a))[0], "type": "documentation", "doc": f.read() }) else: uri = "file://" + os.path.abspath(a) metaschema_loader = schema.get_metaschema()[2] j, schema_metadata = metaschema_loader.resolve_ref(uri, "") if isinstance(j, list): s.extend(j) elif isinstance(j, dict): s.append(j) else: raise ValueError("Schema must resolve to a list or a dict") redirect = {} for r in (args.redirect or []): redirect[r.split("=")[0]] = r.split("=")[1] renderlist = args.only if args.only else [] avrold_doc(s, sys.stdout, renderlist, redirect, args.brand, args.brandlink, args.primtype) if __name__ == "__main__": main() schema-salad-2.6.20171201034858/schema_salad/schema.py0000644000175100017510000005074513163007514022006 0ustar peterpeter00000000000000from __future__ import absolute_import import avro import copy from schema_salad.utils import add_dictlist, aslist, flatten import sys import pprint from pkg_resources import resource_stream import ruamel.yaml as yaml import avro.schema from . import validate import json import os import six from six.moves import urllib AvroSchemaFromJSONData = avro.schema.make_avsc_object from avro.schema import Names, SchemaParseException from . import ref_resolver from .ref_resolver import Loader, DocumentType import logging from . import jsonld_context from .sourceline import SourceLine, strip_dup_lineno, add_lc_filename, bullets, relname from typing import cast, Any, AnyStr, Dict, List, Set, Tuple, TypeVar, Union, Text from ruamel.yaml.comments import CommentedSeq, CommentedMap _logger = logging.getLogger("salad") salad_files = ('metaschema.yml', 'metaschema_base.yml', 'salad.md', 'field_name.yml', 'import_include.md', 'link_res.yml', 'ident_res.yml', 'vocab_res.yml', 'vocab_res.yml', 'field_name_schema.yml', 'field_name_src.yml', 'field_name_proc.yml', 'ident_res_schema.yml', 'ident_res_src.yml', 'ident_res_proc.yml', 'link_res_schema.yml', 'link_res_src.yml', 'link_res_proc.yml', 'vocab_res_schema.yml', 'vocab_res_src.yml', 'vocab_res_proc.yml', 'map_res.yml', 'map_res_schema.yml', 'map_res_src.yml', 'map_res_proc.yml', 'typedsl_res.yml', 'typedsl_res_schema.yml', 'typedsl_res_src.yml', 'typedsl_res_proc.yml') def get_metaschema(): # type: () -> Tuple[Names, List[Dict[Text, Any]], Loader] loader = ref_resolver.Loader({ "Any": "https://w3id.org/cwl/salad#Any", "ArraySchema": "https://w3id.org/cwl/salad#ArraySchema", "DocType": "https://w3id.org/cwl/salad#DocType", "Documentation": "https://w3id.org/cwl/salad#Documentation", "EnumSchema": "https://w3id.org/cwl/salad#EnumSchema", "JsonldPredicate": "https://w3id.org/cwl/salad#JsonldPredicate", "NamedType": "https://w3id.org/cwl/salad#NamedType", "RecordField": "https://w3id.org/cwl/salad#RecordField", "RecordSchema": "https://w3id.org/cwl/salad#RecordSchema", "SaladEnumSchema": "https://w3id.org/cwl/salad#SaladEnumSchema", "SaladRecordField": "https://w3id.org/cwl/salad#SaladRecordField", "SaladRecordSchema": "https://w3id.org/cwl/salad#SaladRecordSchema", "SchemaDefinedType": "https://w3id.org/cwl/salad#SchemaDefinedType", "SpecializeDef": "https://w3id.org/cwl/salad#SpecializeDef", "_container": "https://w3id.org/cwl/salad#JsonldPredicate/_container", "_id": { "@id": "https://w3id.org/cwl/salad#_id", "@type": "@id", "identity": True }, "_type": "https://w3id.org/cwl/salad#JsonldPredicate/_type", "abstract": "https://w3id.org/cwl/salad#SaladRecordSchema/abstract", "array": "https://w3id.org/cwl/salad#array", "boolean": "http://www.w3.org/2001/XMLSchema#boolean", "dct": "http://purl.org/dc/terms/", "doc": "sld:doc", "docAfter": { "@id": "https://w3id.org/cwl/salad#docAfter", "@type": "@id" }, "docChild": { "@id": "https://w3id.org/cwl/salad#docChild", "@type": "@id" }, "docParent": { "@id": "https://w3id.org/cwl/salad#docParent", "@type": "@id" }, "documentRoot": "https://w3id.org/cwl/salad#SchemaDefinedType/documentRoot", "documentation": "https://w3id.org/cwl/salad#documentation", "double": "http://www.w3.org/2001/XMLSchema#double", "enum": "https://w3id.org/cwl/salad#enum", "extends": { "@id": "https://w3id.org/cwl/salad#extends", "@type": "@id", "refScope": 1 }, "fields": { "@id": "https://w3id.org/cwl/salad#fields", "mapPredicate": "type", "mapSubject": "name" }, "float": "http://www.w3.org/2001/XMLSchema#float", "identity": "https://w3id.org/cwl/salad#JsonldPredicate/identity", "int": "http://www.w3.org/2001/XMLSchema#int", "items": { "@id": "https://w3id.org/cwl/salad#items", "@type": "@vocab", "refScope": 2 }, "jsonldPredicate": "sld:jsonldPredicate", "long": "http://www.w3.org/2001/XMLSchema#long", "mapPredicate": "https://w3id.org/cwl/salad#JsonldPredicate/mapPredicate", "mapSubject": "https://w3id.org/cwl/salad#JsonldPredicate/mapSubject", "name": "@id", "noLinkCheck": "https://w3id.org/cwl/salad#JsonldPredicate/noLinkCheck", "null": "https://w3id.org/cwl/salad#null", "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#", "rdfs": "http://www.w3.org/2000/01/rdf-schema#", "record": "https://w3id.org/cwl/salad#record", "refScope": "https://w3id.org/cwl/salad#JsonldPredicate/refScope", "sld": "https://w3id.org/cwl/salad#", "specialize": { "@id": "https://w3id.org/cwl/salad#specialize", "mapPredicate": "specializeTo", "mapSubject": "specializeFrom" }, "specializeFrom": { "@id": "https://w3id.org/cwl/salad#specializeFrom", "@type": "@id", "refScope": 1 }, "specializeTo": { "@id": "https://w3id.org/cwl/salad#specializeTo", "@type": "@id", "refScope": 1 }, "string": "http://www.w3.org/2001/XMLSchema#string", "symbols": { "@id": "https://w3id.org/cwl/salad#symbols", "@type": "@id", "identity": True }, "type": { "@id": "https://w3id.org/cwl/salad#type", "@type": "@vocab", "refScope": 2, "typeDSL": True }, "typeDSL": "https://w3id.org/cwl/salad#JsonldPredicate/typeDSL", "xsd": "http://www.w3.org/2001/XMLSchema#" }) for f in salad_files: rs = resource_stream(__name__, 'metaschema/' + f) loader.cache["https://w3id.org/cwl/" + f] = rs.read() rs.close() rs = resource_stream(__name__, 'metaschema/metaschema.yml') loader.cache["https://w3id.org/cwl/salad"] = rs.read() rs.close() j = yaml.round_trip_load(loader.cache["https://w3id.org/cwl/salad"]) add_lc_filename(j, "metaschema.yml") j, _ = loader.resolve_all(j, "https://w3id.org/cwl/salad#") # pprint.pprint(j) (sch_names, sch_obj) = make_avro_schema(j, loader) if isinstance(sch_names, Exception): _logger.error("Metaschema error, avro was:\n%s", json.dumps(sch_obj, indent=4)) raise sch_names validate_doc(sch_names, j, loader, strict=True) return (sch_names, j, loader) def load_schema(schema_ref, # type: Union[CommentedMap, CommentedSeq, Text] cache=None # type: Dict ): # type: (...) -> Tuple[Loader, Union[Names, SchemaParseException], Dict[Text, Any], Loader] """Load a schema that can be used to validate documents using load_and_validate. return document_loader, avsc_names, schema_metadata, metaschema_loader""" metaschema_names, metaschema_doc, metaschema_loader = get_metaschema() if cache is not None: metaschema_loader.cache.update(cache) schema_doc, schema_metadata = metaschema_loader.resolve_ref(schema_ref, "") if not isinstance(schema_doc, list): raise ValueError("Schema reference must resolve to a list.") validate_doc(metaschema_names, schema_doc, metaschema_loader, True) metactx = schema_metadata.get("@context", {}) metactx.update(schema_metadata.get("$namespaces", {})) (schema_ctx, rdfs) = jsonld_context.salad_to_jsonld_context( schema_doc, metactx) # Create the loader that will be used to load the target document. document_loader = Loader(schema_ctx, cache=cache) # Make the Avro validation that will be used to validate the target # document (avsc_names, avsc_obj) = make_avro_schema(schema_doc, document_loader) return document_loader, avsc_names, schema_metadata, metaschema_loader def load_and_validate(document_loader, # type: Loader avsc_names, # type: Names document, # type: Union[CommentedMap, Text] strict # type: bool ): # type: (...) -> Tuple[Any, Dict[Text, Any]] """Load a document and validate it with the provided schema. return data, metadata """ try: if isinstance(document, CommentedMap): source = document["id"] data, metadata = document_loader.resolve_all( document, document["id"], checklinks=False) else: source = document data, metadata = document_loader.resolve_ref( document, checklinks=False) except validate.ValidationException as v: raise validate.ValidationException(strip_dup_lineno(str(v))) validationErrors = u"" try: document_loader.validate_links(data, u"", {}) except validate.ValidationException as v: validationErrors = six.text_type(v) + "\n" try: validate_doc(avsc_names, data, document_loader, strict, source=source) except validate.ValidationException as v: validationErrors += six.text_type(v) if validationErrors != u"": raise validate.ValidationException(validationErrors) return data, metadata def validate_doc(schema_names, # type: Names doc, # type: Union[Dict[Text, Any], List[Dict[Text, Any]], Text, None] loader, # type: Loader strict, # type: bool source=None ): # type: (...) -> None has_root = False for r in schema_names.names.values(): if ((hasattr(r, 'get_prop') and r.get_prop(u"documentRoot")) or ( u"documentRoot" in r.props)): has_root = True break if not has_root: raise validate.ValidationException( "No document roots defined in the schema") if isinstance(doc, list): validate_doc = doc elif isinstance(doc, CommentedMap): validate_doc = CommentedSeq([doc]) validate_doc.lc.add_kv_line_col(0, [doc.lc.line, doc.lc.col]) validate_doc.lc.filename = doc.lc.filename else: raise validate.ValidationException("Document must be dict or list") roots = [] for r in schema_names.names.values(): if ((hasattr(r, "get_prop") and r.get_prop(u"documentRoot")) or ( r.props.get(u"documentRoot"))): roots.append(r) anyerrors = [] for pos, item in enumerate(validate_doc): sl = SourceLine(validate_doc, pos, six.text_type) success = False for r in roots: success = validate.validate_ex( r, item, loader.identifiers, strict, foreign_properties=loader.foreign_properties, raise_ex=False) if success: break if not success: errors = [] # type: List[Text] for r in roots: if hasattr(r, "get_prop"): name = r.get_prop(u"name") elif hasattr(r, "name"): name = r.name try: validate.validate_ex( r, item, loader.identifiers, strict, foreign_properties=loader.foreign_properties, raise_ex=True) except validate.ClassValidationException as e: errors = [sl.makeError(u"tried `%s` but\n%s" % ( name, validate.indent(str(e), nolead=False)))] break except validate.ValidationException as e: errors.append(sl.makeError(u"tried `%s` but\n%s" % ( name, validate.indent(str(e), nolead=False)))) objerr = sl.makeError(u"Invalid") for ident in loader.identifiers: if ident in item: objerr = sl.makeError( u"Object `%s` is not valid because" % (relname(item[ident]))) break anyerrors.append(u"%s\n%s" % (objerr, validate.indent(bullets(errors, "- ")))) if len(anyerrors) > 0: raise validate.ValidationException( strip_dup_lineno(bullets(anyerrors, "* "))) def replace_type(items, spec, loader, found): # type: (Any, Dict[Text, Any], Loader, Set[Text]) -> Any """ Go through and replace types in the 'spec' mapping""" if isinstance(items, dict): # recursively check these fields for types to replace if "type" in items and items["type"] in ("record", "enum"): if items.get("name"): if items["name"] in found: return items["name"] else: found.add(items["name"]) items = copy.copy(items) for n in ("type", "items", "fields"): if n in items: items[n] = replace_type(items[n], spec, loader, found) if isinstance(items[n], list): items[n] = flatten(items[n]) return items elif isinstance(items, list): # recursively transform list return [replace_type(i, spec, loader, found) for i in items] elif isinstance(items, (str, six.text_type)): # found a string which is a symbol corresponding to a type. replace_with = None if items in loader.vocab: # If it's a vocabulary term, first expand it to its fully qualified # URI items = loader.vocab[items] if items in spec: # Look up in specialization map replace_with = spec[items] if replace_with: return replace_type(replace_with, spec, loader, found) return items def avro_name(url): # type: (AnyStr) -> AnyStr doc_url, frg = urllib.parse.urldefrag(url) if frg != '': if '/' in frg: return frg[frg.rindex('/') + 1:] else: return frg return url Avro = TypeVar('Avro', Dict[Text, Any], List[Any], Text) def make_valid_avro(items, # type: Avro alltypes, # type: Dict[Text, Dict[Text, Any]] found, # type: Set[Text] union=False # type: bool ): # type: (...) -> Union[Avro, Dict, Text] if isinstance(items, dict): items = copy.copy(items) if items.get("name"): if items.get("inVocab", True): items["name"] = avro_name(items["name"]) if "type" in items and items["type"] in ("https://w3id.org/cwl/salad#record", "https://w3id.org/cwl/salad#enum", "record", "enum"): if (hasattr(items, "get") and items.get("abstract")) or ("abstract" in items): return items if not items.get("name"): raise Exception( "Named schemas must have a non-empty name: %s" % items) if items["name"] in found: return cast(Text, items["name"]) else: found.add(items["name"]) for n in ("type", "items", "values", "fields"): if n in items: items[n] = make_valid_avro( items[n], alltypes, found, union=True) if "symbols" in items: items["symbols"] = [avro_name(sym) for sym in items["symbols"]] return items if isinstance(items, list): ret = [] for i in items: ret.append(make_valid_avro(i, alltypes, found, union=union)) # type: ignore return ret if union and isinstance(items, six.string_types): if items in alltypes and avro_name(items) not in found: return cast(Dict, make_valid_avro(alltypes[items], alltypes, found, union=union)) items = avro_name(items) return items def deepcopy_strip(item): # type: (Any) -> Any """Make a deep copy of list and dict objects. Intentionally do not copy attributes. This is to discard CommentedMap and CommentedSeq metadata which is very expensive with regular copy.deepcopy. """ if isinstance(item, dict): return {k: deepcopy_strip(v) for k,v in six.iteritems(item)} elif isinstance(item, list): return [deepcopy_strip(k) for k in item] else: return item def extend_and_specialize(items, loader): # type: (List[Dict[Text, Any]], Loader) -> List[Dict[Text, Any]] """Apply 'extend' and 'specialize' to fully materialize derived record types.""" items = deepcopy_strip(items) types = {t["name"]: t for t in items} # type: Dict[Text, Any] n = [] for t in items: if "extends" in t: spec = {} # type: Dict[Text, Text] if "specialize" in t: for sp in aslist(t["specialize"]): spec[sp["specializeFrom"]] = sp["specializeTo"] exfields = [] # type: List[Text] exsym = [] # type: List[Text] for ex in aslist(t["extends"]): if ex not in types: raise Exception("Extends %s in %s refers to invalid base type" % ( t["extends"], t["name"])) basetype = copy.copy(types[ex]) if t["type"] == "record": if len(spec) > 0: basetype["fields"] = replace_type( basetype.get("fields", []), spec, loader, set()) for f in basetype.get("fields", []): if "inherited_from" not in f: f["inherited_from"] = ex exfields.extend(basetype.get("fields", [])) elif t["type"] == "enum": exsym.extend(basetype.get("symbols", [])) if t["type"] == "record": t = copy.copy(t) exfields.extend(t.get("fields", [])) t["fields"] = exfields fieldnames = set() # type: Set[Text] for field in t["fields"]: if field["name"] in fieldnames: raise validate.ValidationException( "Field name %s appears twice in %s" % (field["name"], t["name"])) else: fieldnames.add(field["name"]) elif t["type"] == "enum": t = copy.copy(t) exsym.extend(t.get("symbols", [])) t["symbol"] = exsym types[t["name"]] = t n.append(t) ex_types = {} for t in n: ex_types[t["name"]] = t extended_by = {} # type: Dict[Text, Text] for t in n: if "extends" in t: for ex in aslist(t["extends"]): if ex_types[ex].get("abstract"): add_dictlist(extended_by, ex, ex_types[t["name"]]) add_dictlist(extended_by, avro_name(ex), ex_types[ex]) for t in n: if t.get("abstract") and t["name"] not in extended_by: raise validate.ValidationException( "%s is abstract but missing a concrete subtype" % t["name"]) for t in n: if "fields" in t: t["fields"] = replace_type(t["fields"], extended_by, loader, set()) return n def make_avro_schema(i, # type: List[Dict[Text, Any]] loader # type: Loader ): # type: (...) -> Tuple[Union[Names, SchemaParseException], List[Dict[Text, Any]]] names = avro.schema.Names() j = extend_and_specialize(i, loader) name_dict = {} # type: Dict[Text, Dict[Text, Any]] for t in j: name_dict[t["name"]] = t j2 = make_valid_avro(j, name_dict, set()) j3 = [t for t in j2 if isinstance(t, dict) and not t.get( "abstract") and t.get("type") != "documentation"] try: AvroSchemaFromJSONData(j3, names) except avro.schema.SchemaParseException as e: return (e, j3) return (names, j3) schema-salad-2.6.20171201034858/schema_salad/codegen.py0000644000175100017510000000654513203345013022143 0ustar peterpeter00000000000000import json import sys from six.moves import urllib, cStringIO import collections import logging from pkg_resources import resource_stream from .utils import aslist, flatten from . import schema from .codegen_base import shortname, CodeGenBase from .python_codegen import PythonCodeGen from .java_codegen import JavaCodeGen from .ref_resolver import Loader from typing import List, Dict, Text, Any, Union, Text from ruamel.yaml.comments import CommentedSeq, CommentedMap class GoCodeGen(object): pass def codegen(lang, # type: str i, # type: List[Dict[Text, Any]] schema_metadata, # type: Dict[Text, Any] loader # type: Loader ): # type: (...) -> None j = schema.extend_and_specialize(i, loader) cg = None # type: CodeGenBase if lang == "python": cg = PythonCodeGen(sys.stdout) elif lang == "java": cg = JavaCodeGen(schema_metadata.get("$base", schema_metadata.get("id"))) else: raise Exception("Unsupported code generation language '%s'" % lang) cg.prologue() documentRoots = [] for rec in j: if rec["type"] in ("enum", "record"): cg.type_loader(rec) cg.add_vocab(shortname(rec["name"]), rec["name"]) for rec in j: if rec["type"] == "enum": for s in rec["symbols"]: cg.add_vocab(shortname(s), s) if rec["type"] == "record": if rec.get("documentRoot"): documentRoots.append(rec["name"]) cg.begin_class(rec["name"], aslist(rec.get("extends", [])), rec.get("doc"), rec.get("abstract")) cg.add_vocab(shortname(rec["name"]), rec["name"]) for f in rec.get("fields", []): if f.get("jsonldPredicate") == "@id": fieldpred = f["name"] tl = cg.uri_loader(cg.type_loader(f["type"]), True, False, None) cg.declare_id_field(fieldpred, tl, f.get("doc")) break for f in rec.get("fields", []): optional = bool("https://w3id.org/cwl/salad#null" in f["type"]) tl = cg.type_loader(f["type"]) jld = f.get("jsonldPredicate") fieldpred = f["name"] if isinstance(jld, dict): refScope = jld.get("refScope") if jld.get("typeDSL"): tl = cg.typedsl_loader(tl, refScope) elif jld.get("_type") == "@id": tl = cg.uri_loader(tl, jld.get("identity"), False, refScope) elif jld.get("_type") == "@vocab": tl = cg.uri_loader(tl, False, True, refScope) mapSubject = jld.get("mapSubject") if mapSubject: tl = cg.idmap_loader(f["name"], tl, mapSubject, jld.get("mapPredicate")) if "_id" in jld and jld["_id"][0] != "@": fieldpred = jld["_id"] if jld == "@id": continue cg.declare_field(fieldpred, tl, f.get("doc"), optional) cg.end_class(rec["name"]) rootType = list(documentRoots) rootType.append({ "type": "array", "items": documentRoots }) cg.epilogue(cg.type_loader(rootType)) schema-salad-2.6.20171201034858/schema_salad/validate.py0000644000175100017510000003146413203345013022326 0ustar peterpeter00000000000000from __future__ import absolute_import import pprint import avro.schema from avro.schema import Schema import sys import re import logging import six from six.moves import urllib from six.moves import range from typing import Any, List, Set, Union, Text from .sourceline import SourceLine, lineno_re, bullets, indent _logger = logging.getLogger("salad") class ValidationException(Exception): pass class ClassValidationException(ValidationException): pass def validate(expected_schema, # type: Schema datum, # type: Any identifiers=[], # type: List[Text] strict=False, # type: bool foreign_properties=set() # type: Set[Text] ): # type: (...) -> bool return validate_ex( expected_schema, datum, identifiers, strict=strict, foreign_properties=foreign_properties, raise_ex=False) INT_MIN_VALUE = -(1 << 31) INT_MAX_VALUE = (1 << 31) - 1 LONG_MIN_VALUE = -(1 << 63) LONG_MAX_VALUE = (1 << 63) - 1 def friendly(v): # type: (Any) -> Any if isinstance(v, avro.schema.NamedSchema): return v.name if isinstance(v, avro.schema.ArraySchema): return "array of <%s>" % friendly(v.items) elif isinstance(v, avro.schema.PrimitiveSchema): return v.type elif isinstance(v, avro.schema.UnionSchema): return " or ".join([friendly(s) for s in v.schemas]) else: return v def vpformat(datum): # type: (Any) -> str a = pprint.pformat(datum) if len(a) > 160: a = a[0:160] + "[...]" return a def validate_ex(expected_schema, # type: Schema datum, # type: Any identifiers=None, # type: List[Text] strict=False, # type: bool foreign_properties=None, # type: Set[Text] raise_ex=True, # type: bool strict_foreign_properties=False, # type: bool logger=_logger # type: logging.Logger ): # type: (...) -> bool """Determine if a python datum is an instance of a schema.""" if not identifiers: identifiers = [] if not foreign_properties: foreign_properties = set() schema_type = expected_schema.type if schema_type == 'null': if datum is None: return True else: if raise_ex: raise ValidationException(u"the value is not null") else: return False elif schema_type == 'boolean': if isinstance(datum, bool): return True else: if raise_ex: raise ValidationException(u"the value is not boolean") else: return False elif schema_type == 'string': if isinstance(datum, six.string_types): return True elif isinstance(datum, bytes): datum = datum.decode(u"utf-8") return True else: if raise_ex: raise ValidationException(u"the value is not string") else: return False elif schema_type == 'bytes': if isinstance(datum, str): return True else: if raise_ex: raise ValidationException( u"the value `%s` is not bytes" % vpformat(datum)) else: return False elif schema_type == 'int': if (isinstance(datum, six.integer_types) and INT_MIN_VALUE <= datum <= INT_MAX_VALUE): return True else: if raise_ex: raise ValidationException(u"`%s` is not int" % vpformat(datum)) else: return False elif schema_type == 'long': if ((isinstance(datum, six.integer_types)) and LONG_MIN_VALUE <= datum <= LONG_MAX_VALUE): return True else: if raise_ex: raise ValidationException( u"the value `%s` is not long" % vpformat(datum)) else: return False elif schema_type in ['float', 'double']: if (isinstance(datum, six.integer_types) or isinstance(datum, float)): return True else: if raise_ex: raise ValidationException( u"the value `%s` is not float or double" % vpformat(datum)) else: return False elif isinstance(expected_schema, avro.schema.EnumSchema): if expected_schema.name == "Any": if datum is not None: return True else: if raise_ex: raise ValidationException(u"'Any' type must be non-null") else: return False if not isinstance(datum, six.string_types): if raise_ex: raise ValidationException( u"value is a %s but expected a string" % (type(datum).__name__)) else: return False if datum in expected_schema.symbols: return True else: if raise_ex: raise ValidationException(u"the value %s is not a valid %s, expected %s%s" % (vpformat(datum), expected_schema.name, "one of " if len( expected_schema.symbols) > 1 else "", "'" + "', '".join(expected_schema.symbols) + "'")) else: return False elif isinstance(expected_schema, avro.schema.ArraySchema): if isinstance(datum, list): for i, d in enumerate(datum): try: sl = SourceLine(datum, i, ValidationException) if not validate_ex(expected_schema.items, d, identifiers, strict=strict, foreign_properties=foreign_properties, raise_ex=raise_ex, strict_foreign_properties=strict_foreign_properties, logger=logger): return False except ValidationException as v: if raise_ex: raise sl.makeError( six.text_type("item is invalid because\n%s" % (indent(str(v))))) else: return False return True else: if raise_ex: raise ValidationException(u"the value %s is not a list, expected list of %s" % ( vpformat(datum), friendly(expected_schema.items))) else: return False elif isinstance(expected_schema, avro.schema.UnionSchema): for s in expected_schema.schemas: if validate_ex(s, datum, identifiers, strict=strict, raise_ex=False, strict_foreign_properties=strict_foreign_properties, logger=logger): return True if not raise_ex: return False errors = [] # type: List[Text] checked = [] for s in expected_schema.schemas: if isinstance(datum, list) and not isinstance(s, avro.schema.ArraySchema): continue elif isinstance(datum, dict) and not isinstance(s, avro.schema.RecordSchema): continue elif (isinstance(datum, (bool, six.integer_types, float, six.string_types)) and # type: ignore isinstance(s, (avro.schema.ArraySchema, avro.schema.RecordSchema))): continue elif datum is not None and s.type == "null": continue checked.append(s) try: validate_ex(s, datum, identifiers, strict=strict, foreign_properties=foreign_properties, raise_ex=True, strict_foreign_properties=strict_foreign_properties, logger=logger) except ClassValidationException as e: raise except ValidationException as e: errors.append(six.text_type(e)) if bool(errors): raise ValidationException(bullets(["tried %s but\n%s" % (friendly( checked[i]), indent(errors[i])) for i in range(0, len(errors))], "- ")) else: raise ValidationException("value is a %s, expected %s" % ( type(datum).__name__, friendly(expected_schema))) elif isinstance(expected_schema, avro.schema.RecordSchema): if not isinstance(datum, dict): if raise_ex: raise ValidationException(u"is not a dict") else: return False classmatch = None for f in expected_schema.fields: if f.name in ("class",): d = datum.get(f.name) if not d: if raise_ex: raise ValidationException( u"Missing '%s' field" % (f.name)) else: return False if expected_schema.name != d: if raise_ex: raise ValidationException( u"Expected class '%s' but this is '%s'" % (expected_schema.name, d)) else: return False classmatch = d break errors = [] for f in expected_schema.fields: if f.name in ("class",): continue if f.name in datum: fieldval = datum[f.name] else: try: fieldval = f.default except KeyError: fieldval = None try: sl = SourceLine(datum, f.name, six.text_type) if not validate_ex(f.type, fieldval, identifiers, strict=strict, foreign_properties=foreign_properties, raise_ex=raise_ex, strict_foreign_properties=strict_foreign_properties, logger=logger): return False except ValidationException as v: if f.name not in datum: errors.append(u"missing required field `%s`" % f.name) else: errors.append(sl.makeError(u"the `%s` field is not valid because\n%s" % ( f.name, indent(str(v))))) for d in datum: found = False for f in expected_schema.fields: if d == f.name: found = True if not found: sl = SourceLine(datum, d, six.text_type) if d not in identifiers and d not in foreign_properties and d[0] not in ("@", "$"): if (d not in identifiers and strict) and ( d not in foreign_properties and strict_foreign_properties) and not raise_ex: return False split = urllib.parse.urlsplit(d) if split.scheme: err = sl.makeError(u"unrecognized extension field `%s`%s." " Did you include " "a $schemas section?" % ( d, " and strict_foreign_properties is True" if strict_foreign_properties else "")) if strict_foreign_properties: errors.append(err) else: logger.warn(err) else: err = sl.makeError(u"invalid field `%s`, expected one of: %s" % ( d, ", ".join("'%s'" % fn.name for fn in expected_schema.fields))) if strict: errors.append(err) else: logger.warn(err) if bool(errors): if raise_ex: if classmatch: raise ClassValidationException(bullets(errors, "* ")) else: raise ValidationException(bullets(errors, "* ")) else: return False else: return True if raise_ex: raise ValidationException(u"Unrecognized schema_type %s" % schema_type) else: return False schema-salad-2.6.20171201034858/schema_salad/metaschema.py0000644000175100017510000017270013203345013022643 0ustar peterpeter00000000000000# # This file was autogenerated using schema-salad-tool --codegen=python # from __future__ import absolute_import import ruamel.yaml from ruamel.yaml.comments import CommentedBase, CommentedMap, CommentedSeq import re import os import traceback from typing import (Any, AnyStr, Callable, cast, Dict, List, Iterable, Tuple, TypeVar, Union, Text) import six lineno_re = re.compile(u"^(.*?:[0-9]+:[0-9]+: )(( *)(.*))") def _add_lc_filename(r, source): # type: (ruamel.yaml.comments.CommentedBase, AnyStr) -> None if isinstance(r, ruamel.yaml.comments.CommentedBase): r.lc.filename = source if isinstance(r, list): for d in r: _add_lc_filename(d, source) elif isinstance(r, dict): for d in six.itervalues(r): _add_lc_filename(d, source) def relname(source): # type: (Text) -> Text if source.startswith("file://"): source = source[7:] source = os.path.relpath(source) return source def add_lc_filename(r, source): # type: (ruamel.yaml.comments.CommentedBase, Text) -> None _add_lc_filename(r, relname(source)) def reflow(text, maxline, shift=""): # type: (Text, int, Text) -> Text if maxline < 20: maxline = 20 if len(text) > maxline: sp = text.rfind(' ', 0, maxline) if sp < 1: sp = text.find(' ', sp+1) if sp == -1: sp = len(text) if sp < len(text): return "%s\n%s%s" % (text[0:sp], shift, reflow(text[sp+1:], maxline, shift)) return text def indent(v, nolead=False, shift=u" ", bullet=u" "): # type: (Text, bool, Text, Text) -> Text if nolead: return v.splitlines()[0] + u"\n".join([shift + l for l in v.splitlines()[1:]]) else: def lineno(i, l): # type: (int, Text) -> Text r = lineno_re.match(l) if bool(r): return r.group(1) + (bullet if i == 0 else shift) + r.group(2) else: return (bullet if i == 0 else shift) + l return u"\n".join([lineno(i, l) for i, l in enumerate(v.splitlines())]) def bullets(textlist, bul): # type: (List[Text], Text) -> Text if len(textlist) == 1: return textlist[0] else: return "\n".join(indent(t, bullet=bul) for t in textlist) def strip_dup_lineno(text, maxline=None): # type: (Text, int) -> Text if maxline is None: maxline = int(os.environ.get("COLUMNS", "100")) pre = None msg = [] for l in text.splitlines(): g = lineno_re.match(l) if not g: msg.append(l) continue shift = len(g.group(1)) + len(g.group(3)) g2 = reflow(g.group(2), maxline-shift, " " * shift) if g.group(1) != pre: pre = g.group(1) msg.append(pre + g2) else: g2 = reflow(g.group(2), maxline-len(g.group(1)), " " * (len(g.group(1))+len(g.group(3)))) msg.append(" " * len(g.group(1)) + g2) return "\n".join(msg) def cmap(d, lc=None, fn=None): # type: (Union[int, float, str, Text, Dict, List], List[int], Text) -> Union[int, float, str, Text, CommentedMap, CommentedSeq] if lc is None: lc = [0, 0, 0, 0] if fn is None: fn = "test" if isinstance(d, CommentedMap): fn = d.lc.filename if hasattr(d.lc, "filename") else fn for k,v in six.iteritems(d): if k in d.lc.data: d[k] = cmap(v, lc=d.lc.data[k], fn=fn) else: d[k] = cmap(v, lc, fn=fn) return d if isinstance(d, CommentedSeq): fn = d.lc.filename if hasattr(d.lc, "filename") else fn for k,v in enumerate(d): if k in d.lc.data: d[k] = cmap(v, lc=d.lc.data[k], fn=fn) else: d[k] = cmap(v, lc, fn=fn) return d if isinstance(d, dict): cm = CommentedMap() for k in sorted(d.keys()): v = d[k] if isinstance(v, CommentedBase): uselc = [v.lc.line, v.lc.col, v.lc.line, v.lc.col] vfn = v.lc.filename if hasattr(v.lc, "filename") else fn else: uselc = lc vfn = fn cm[k] = cmap(v, lc=uselc, fn=vfn) cm.lc.add_kv_line_col(k, uselc) cm.lc.filename = fn return cm if isinstance(d, list): cs = CommentedSeq() for k,v in enumerate(d): if isinstance(v, CommentedBase): uselc = [v.lc.line, v.lc.col, v.lc.line, v.lc.col] vfn = v.lc.filename if hasattr(v.lc, "filename") else fn else: uselc = lc vfn = fn cs.append(cmap(v, lc=uselc, fn=vfn)) cs.lc.add_kv_line_col(k, uselc) cs.lc.filename = fn return cs else: return d class SourceLine(object): def __init__(self, item, key=None, raise_type=six.text_type, include_traceback=False): # type: (Any, Any, Callable, bool) -> None self.item = item self.key = key self.raise_type = raise_type self.include_traceback = include_traceback def __enter__(self): # type: () -> SourceLine return self def __exit__(self, exc_type, # type: Any exc_value, # type: Any tb # type: Any ): # -> Any if not exc_value: return if self.include_traceback: raise self.makeError("\n".join(traceback.format_exception(exc_type, exc_value, tb))) else: raise self.makeError(six.text_type(exc_value)) def makeLead(self): # type: () -> Text if self.key is None or self.item.lc.data is None or self.key not in self.item.lc.data: return "%s:%i:%i:" % (self.item.lc.filename if hasattr(self.item.lc, "filename") else "", (self.item.lc.line or 0)+1, (self.item.lc.col or 0)+1) else: return "%s:%i:%i:" % (self.item.lc.filename if hasattr(self.item.lc, "filename") else "", (self.item.lc.data[self.key][0] or 0)+1, (self.item.lc.data[self.key][1] or 0)+1) def makeError(self, msg): # type: (Text) -> Any if not isinstance(self.item, ruamel.yaml.comments.CommentedBase): return self.raise_type(msg) errs = [] lead = self.makeLead() for m in msg.splitlines(): if bool(lineno_re.match(m)): errs.append(m) else: errs.append("%s %s" % (lead, m)) return self.raise_type("\n".join(errs)) import six from six.moves import urllib, StringIO import ruamel.yaml as yaml import copy import re from typing import List, Text, Dict, Union, Any, Sequence class ValidationException(Exception): pass class Savable(object): pass class LoadingOptions(object): def __init__(self, fetcher=None, namespaces=None, fileuri=None, copyfrom=None): if copyfrom is not None: self.idx = copyfrom.idx if fetcher is None: fetcher = copyfrom.fetcher if fileuri is None: fileuri = copyfrom.fileuri else: self.idx = {} if fetcher is None: import os import requests from cachecontrol.wrapper import CacheControl from cachecontrol.caches import FileCache from schema_salad.ref_resolver import DefaultFetcher if "HOME" in os.environ: session = CacheControl( requests.Session(), cache=FileCache(os.path.join(os.environ["HOME"], ".cache", "salad"))) elif "TMP" in os.environ: session = CacheControl( requests.Session(), cache=FileCache(os.path.join(os.environ["TMP"], ".cache", "salad"))) else: session = CacheControl( requests.Session(), cache=FileCache("/tmp", ".cache", "salad")) self.fetcher = DefaultFetcher({}, session) else: self.fetcher = fetcher self.fileuri = fileuri self.vocab = _vocab self.rvocab = _rvocab if namespaces is not None: self.vocab = self.vocab.copy() self.rvocab = self.rvocab.copy() for k,v in six.iteritems(namespaces): self.vocab[k] = v self.rvocab[v] = k def load_field(val, fieldtype, baseuri, loadingOptions): if isinstance(val, dict): if "$import" in val: return _document_load_by_url(fieldtype, loadingOptions.fetcher.urljoin(loadingOptions.fileuri, val["$import"]), loadingOptions) elif "$include" in val: val = loadingOptions.fetcher.fetch_text(loadingOptions.fetcher.urljoin(loadingOptions.fileuri, val["$include"])) return fieldtype.load(val, baseuri, loadingOptions) def save(val): if isinstance(val, Savable): return val.save() if isinstance(val, list): return [save(v) for v in val] return val def expand_url(url, # type: Union[str, Text] base_url, # type: Union[str, Text] loadingOptions, # type: LoadingOptions scoped_id=False, # type: bool vocab_term=False, # type: bool scoped_ref=None # type: int ): # type: (...) -> Text if not isinstance(url, six.string_types): return url url = Text(url) if url in (u"@id", u"@type"): return url if vocab_term and url in loadingOptions.vocab: return url if bool(loadingOptions.vocab) and u":" in url: prefix = url.split(u":")[0] if prefix in loadingOptions.vocab: url = loadingOptions.vocab[prefix] + url[len(prefix) + 1:] split = urllib.parse.urlsplit(url) if ((bool(split.scheme) and split.scheme in [u'http', u'https', u'file']) or url.startswith(u"$(") or url.startswith(u"${")): pass elif scoped_id and not bool(split.fragment): splitbase = urllib.parse.urlsplit(base_url) frg = u"" if bool(splitbase.fragment): frg = splitbase.fragment + u"/" + split.path else: frg = split.path pt = splitbase.path if splitbase.path != '' else "/" url = urllib.parse.urlunsplit( (splitbase.scheme, splitbase.netloc, pt, splitbase.query, frg)) elif scoped_ref is not None and not bool(split.fragment): splitbase = urllib.parse.urlsplit(base_url) sp = splitbase.fragment.split(u"/") n = scoped_ref while n > 0 and len(sp) > 0: sp.pop() n -= 1 sp.append(url) url = urllib.parse.urlunsplit(( splitbase.scheme, splitbase.netloc, splitbase.path, splitbase.query, u"/".join(sp))) else: url = loadingOptions.fetcher.urljoin(base_url, url) if vocab_term: split = urllib.parse.urlsplit(url) if bool(split.scheme): if url in loadingOptions.rvocab: return loadingOptions.rvocab[url] else: raise ValidationException("Term '%s' not in vocabulary" % url) return url class _Loader(object): def load(self, doc, baseuri, loadingOptions, docRoot=None): # type: (Any, Text, LoadingOptions, Union[Text, None]) -> Any pass class _AnyLoader(_Loader): def load(self, doc, baseuri, loadingOptions, docRoot=None): if doc is not None: return doc raise ValidationException("Expected non-null") class _PrimitiveLoader(_Loader): def __init__(self, tp): # type: (Union[type, Sequence[type]]) -> None self.tp = tp def load(self, doc, baseuri, loadingOptions, docRoot=None): if not isinstance(doc, self.tp): raise ValidationException("Expected a %s but got %s" % (self.tp, type(doc))) return doc def __repr__(self): return str(self.tp) class _ArrayLoader(_Loader): def __init__(self, items): # type: (_Loader) -> None self.items = items def load(self, doc, baseuri, loadingOptions, docRoot=None): if not isinstance(doc, list): raise ValidationException("Expected a list") r = [] errors = [] for i in range(0, len(doc)): try: lf = load_field(doc[i], _UnionLoader((self, self.items)), baseuri, loadingOptions) if isinstance(lf, list): r.extend(lf) else: r.append(lf) except ValidationException as e: errors.append(SourceLine(doc, i, str).makeError(six.text_type(e))) if errors: raise ValidationException("\n".join(errors)) return r def __repr__(self): return "array<%s>" % self.items class _EnumLoader(_Loader): def __init__(self, symbols): # type: (Sequence[Text]) -> None self.symbols = symbols def load(self, doc, baseuri, loadingOptions, docRoot=None): if doc in self.symbols: return doc else: raise ValidationException("Expected one of %s" % (self.symbols,)) class _RecordLoader(_Loader): def __init__(self, classtype): # type: (type) -> None self.classtype = classtype def load(self, doc, baseuri, loadingOptions, docRoot=None): if not isinstance(doc, dict): raise ValidationException("Expected a dict") return self.classtype(doc, baseuri, loadingOptions, docRoot=docRoot) def __repr__(self): return str(self.classtype) class _UnionLoader(_Loader): def __init__(self, alternates): # type: (Sequence[_Loader]) -> None self.alternates = alternates def load(self, doc, baseuri, loadingOptions, docRoot=None): errors = [] for t in self.alternates: try: return t.load(doc, baseuri, loadingOptions, docRoot=docRoot) except ValidationException as e: errors.append("tried %s but\n%s" % (t, indent(str(e)))) raise ValidationException(bullets(errors, "- ")) def __repr__(self): return " | ".join(str(a) for a in self.alternates) class _URILoader(_Loader): def __init__(self, inner, scoped_id, vocab_term, scoped_ref): # type: (_Loader, bool, bool, Union[int, None]) -> None self.inner = inner self.scoped_id = scoped_id self.vocab_term = vocab_term self.scoped_ref = scoped_ref def load(self, doc, baseuri, loadingOptions, docRoot=None): if isinstance(doc, list): doc = [expand_url(i, baseuri, loadingOptions, self.scoped_id, self.vocab_term, self.scoped_ref) for i in doc] if isinstance(doc, six.string_types): doc = expand_url(doc, baseuri, loadingOptions, self.scoped_id, self.vocab_term, self.scoped_ref) return self.inner.load(doc, baseuri, loadingOptions) class _TypeDSLLoader(_Loader): typeDSLregex = re.compile(u"^([^[?]+)(\[\])?(\?)?$") def __init__(self, inner, refScope): # type: (_Loader, Union[int, None]) -> None self.inner = inner self.refScope = refScope def resolve(self, doc, baseuri, loadingOptions): m = self.typeDSLregex.match(doc) if m: first = expand_url(m.group(1), baseuri, loadingOptions, False, True, self.refScope) second = third = None if bool(m.group(2)): second = {"type": "array", "items": first} #second = CommentedMap((("type", "array"), # ("items", first))) #second.lc.add_kv_line_col("type", lc) #second.lc.add_kv_line_col("items", lc) #second.lc.filename = filename if bool(m.group(3)): third = [u"null", second or first] #third = CommentedSeq([u"null", second or first]) #third.lc.add_kv_line_col(0, lc) #third.lc.add_kv_line_col(1, lc) #third.lc.filename = filename doc = third or second or first return doc def load(self, doc, baseuri, loadingOptions, docRoot=None): if isinstance(doc, list): r = [] for d in doc: if isinstance(d, six.string_types): resolved = self.resolve(d, baseuri, loadingOptions) if isinstance(resolved, list): for i in resolved: if i not in r: r.append(i) else: if resolved not in r: r.append(resolved) else: r.append(d) doc = r elif isinstance(doc, six.string_types): doc = self.resolve(doc, baseuri, loadingOptions) return self.inner.load(doc, baseuri, loadingOptions) class _IdMapLoader(_Loader): def __init__(self, inner, mapSubject, mapPredicate): # type: (_Loader, Text, Union[Text, None]) -> None self.inner = inner self.mapSubject = mapSubject self.mapPredicate = mapPredicate def load(self, doc, baseuri, loadingOptions, docRoot=None): if isinstance(doc, dict): r = [] for k in sorted(doc.keys()): val = doc[k] if isinstance(val, dict): v = copy.copy(val) if hasattr(val, 'lc'): v.lc.data = val.lc.data v.lc.filename = val.lc.filename else: if self.mapPredicate: v = {self.mapPredicate: val} else: raise ValidationException("No mapPredicate") v[self.mapSubject] = k r.append(v) doc = r return self.inner.load(doc, baseuri, loadingOptions) def _document_load(loader, doc, baseuri, loadingOptions): if isinstance(doc, six.string_types): return _document_load_by_url(loader, loadingOptions.fetcher.urljoin(baseuri, doc), loadingOptions) if isinstance(doc, dict): if "$namespaces" in doc: loadingOptions = LoadingOptions(copyfrom=loadingOptions, namespaces=doc["$namespaces"]) if "$base" in doc: baseuri = doc["$base"] if "$graph" in doc: return loader.load(doc["$graph"], baseuri, loadingOptions) else: return loader.load(doc, baseuri, loadingOptions, docRoot=baseuri) if isinstance(doc, list): return loader.load(doc, baseuri, loadingOptions) raise ValidationException() def _document_load_by_url(loader, url, loadingOptions): if url in loadingOptions.idx: return _document_load(loader, loadingOptions.idx[url], url, loadingOptions) text = loadingOptions.fetcher.fetch_text(url) if isinstance(text, bytes): textIO = StringIO(text.decode('utf-8')) else: textIO = StringIO(text) textIO.name = url # type: ignore result = yaml.round_trip_load(textIO) add_lc_filename(result, url) loadingOptions.idx[url] = result loadingOptions = LoadingOptions(copyfrom=loadingOptions, fileuri=url) return _document_load(loader, result, url, loadingOptions) def file_uri(path, split_frag=False): # type: (str, bool) -> str if path.startswith("file://"): return path if split_frag: pathsp = path.split("#", 2) frag = "#" + urllib.parse.quote(str(pathsp[1])) if len(pathsp) == 2 else "" urlpath = urllib.request.pathname2url(str(pathsp[0])) else: urlpath = urllib.request.pathname2url(path) frag = "" if urlpath.startswith("//"): return "file:%s%s" % (urlpath, frag) else: return "file://%s%s" % (urlpath, frag) class RecordField(Savable): """ A field of a record. """ def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} if 'name' in doc: try: self.name = load_field(doc.get('name'), uri_strtype_True_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'name', str).makeError("the `name` field is not valid because:\n"+str(e))) else: self.name = None if self.name is None: if docRoot is not None: self.name = docRoot else: raise ValidationException("Missing name") baseuri = self.name if 'doc' in doc: try: self.doc = load_field(doc.get('doc'), union_of_None_type_or_strtype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'doc', str).makeError("the `doc` field is not valid because:\n"+str(e))) else: self.doc = None try: self.type = load_field(doc.get('type'), typedsl_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_or_array_of_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_2, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'type', str).makeError("the `type` field is not valid because:\n"+str(e))) if errors: raise ValidationException("Trying 'RecordField'\n"+"\n".join(errors)) def save(self): r = {} if self.name is not None: r['name'] = save(self.name) if self.doc is not None: r['doc'] = save(self.doc) if self.type is not None: r['type'] = save(self.type) return r class RecordSchema(Savable): def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} if 'fields' in doc: try: self.fields = load_field(doc.get('fields'), idmap_fields_union_of_None_type_or_array_of_RecordFieldLoader, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'fields', str).makeError("the `fields` field is not valid because:\n"+str(e))) else: self.fields = None try: self.type = load_field(doc.get('type'), typedsl_Record_symbolLoader_2, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'type', str).makeError("the `type` field is not valid because:\n"+str(e))) if errors: raise ValidationException("Trying 'RecordSchema'\n"+"\n".join(errors)) def save(self): r = {} if self.fields is not None: r['fields'] = save(self.fields) if self.type is not None: r['type'] = save(self.type) return r class EnumSchema(Savable): """ Define an enumerated type. """ def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} try: self.symbols = load_field(doc.get('symbols'), uri_array_of_strtype_True_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'symbols', str).makeError("the `symbols` field is not valid because:\n"+str(e))) try: self.type = load_field(doc.get('type'), typedsl_Enum_symbolLoader_2, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'type', str).makeError("the `type` field is not valid because:\n"+str(e))) if errors: raise ValidationException("Trying 'EnumSchema'\n"+"\n".join(errors)) def save(self): r = {} if self.symbols is not None: r['symbols'] = save(self.symbols) if self.type is not None: r['type'] = save(self.type) return r class ArraySchema(Savable): def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} try: self.items = load_field(doc.get('items'), uri_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_or_array_of_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_False_True_2, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'items', str).makeError("the `items` field is not valid because:\n"+str(e))) try: self.type = load_field(doc.get('type'), typedsl_Array_symbolLoader_2, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'type', str).makeError("the `type` field is not valid because:\n"+str(e))) if errors: raise ValidationException("Trying 'ArraySchema'\n"+"\n".join(errors)) def save(self): r = {} if self.items is not None: r['items'] = save(self.items) if self.type is not None: r['type'] = save(self.type) return r class JsonldPredicate(Savable): """ Attached to a record field to define how the parent record field is handled for URI resolution and JSON-LD context generation. """ def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} if '_id' in doc: try: self._id = load_field(doc.get('_id'), uri_union_of_None_type_or_strtype_True_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, '_id', str).makeError("the `_id` field is not valid because:\n"+str(e))) else: self._id = None if '_type' in doc: try: self._type = load_field(doc.get('_type'), union_of_None_type_or_strtype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, '_type', str).makeError("the `_type` field is not valid because:\n"+str(e))) else: self._type = None if '_container' in doc: try: self._container = load_field(doc.get('_container'), union_of_None_type_or_strtype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, '_container', str).makeError("the `_container` field is not valid because:\n"+str(e))) else: self._container = None if 'identity' in doc: try: self.identity = load_field(doc.get('identity'), union_of_None_type_or_booltype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'identity', str).makeError("the `identity` field is not valid because:\n"+str(e))) else: self.identity = None if 'noLinkCheck' in doc: try: self.noLinkCheck = load_field(doc.get('noLinkCheck'), union_of_None_type_or_booltype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'noLinkCheck', str).makeError("the `noLinkCheck` field is not valid because:\n"+str(e))) else: self.noLinkCheck = None if 'mapSubject' in doc: try: self.mapSubject = load_field(doc.get('mapSubject'), union_of_None_type_or_strtype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'mapSubject', str).makeError("the `mapSubject` field is not valid because:\n"+str(e))) else: self.mapSubject = None if 'mapPredicate' in doc: try: self.mapPredicate = load_field(doc.get('mapPredicate'), union_of_None_type_or_strtype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'mapPredicate', str).makeError("the `mapPredicate` field is not valid because:\n"+str(e))) else: self.mapPredicate = None if 'refScope' in doc: try: self.refScope = load_field(doc.get('refScope'), union_of_None_type_or_inttype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'refScope', str).makeError("the `refScope` field is not valid because:\n"+str(e))) else: self.refScope = None if 'typeDSL' in doc: try: self.typeDSL = load_field(doc.get('typeDSL'), union_of_None_type_or_booltype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'typeDSL', str).makeError("the `typeDSL` field is not valid because:\n"+str(e))) else: self.typeDSL = None if errors: raise ValidationException("Trying 'JsonldPredicate'\n"+"\n".join(errors)) def save(self): r = {} if self._id is not None: r['_id'] = save(self._id) if self._type is not None: r['_type'] = save(self._type) if self._container is not None: r['_container'] = save(self._container) if self.identity is not None: r['identity'] = save(self.identity) if self.noLinkCheck is not None: r['noLinkCheck'] = save(self.noLinkCheck) if self.mapSubject is not None: r['mapSubject'] = save(self.mapSubject) if self.mapPredicate is not None: r['mapPredicate'] = save(self.mapPredicate) if self.refScope is not None: r['refScope'] = save(self.refScope) if self.typeDSL is not None: r['typeDSL'] = save(self.typeDSL) return r class SpecializeDef(Savable): def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} try: self.specializeFrom = load_field(doc.get('specializeFrom'), uri_strtype_None_False_1, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'specializeFrom', str).makeError("the `specializeFrom` field is not valid because:\n"+str(e))) try: self.specializeTo = load_field(doc.get('specializeTo'), uri_strtype_None_False_1, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'specializeTo', str).makeError("the `specializeTo` field is not valid because:\n"+str(e))) if errors: raise ValidationException("Trying 'SpecializeDef'\n"+"\n".join(errors)) def save(self): r = {} if self.specializeFrom is not None: r['specializeFrom'] = save(self.specializeFrom) if self.specializeTo is not None: r['specializeTo'] = save(self.specializeTo) return r class NamedType(Savable): pass class DocType(Savable): pass class SchemaDefinedType(DocType): """ Abstract base for schema-defined types. """ pass class SaladRecordField(RecordField): """ A field of a record. """ def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} if 'name' in doc: try: self.name = load_field(doc.get('name'), uri_strtype_True_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'name', str).makeError("the `name` field is not valid because:\n"+str(e))) else: self.name = None if self.name is None: if docRoot is not None: self.name = docRoot else: raise ValidationException("Missing name") baseuri = self.name if 'doc' in doc: try: self.doc = load_field(doc.get('doc'), union_of_None_type_or_strtype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'doc', str).makeError("the `doc` field is not valid because:\n"+str(e))) else: self.doc = None try: self.type = load_field(doc.get('type'), typedsl_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_or_array_of_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_2, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'type', str).makeError("the `type` field is not valid because:\n"+str(e))) if 'jsonldPredicate' in doc: try: self.jsonldPredicate = load_field(doc.get('jsonldPredicate'), union_of_None_type_or_strtype_or_JsonldPredicateLoader, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'jsonldPredicate', str).makeError("the `jsonldPredicate` field is not valid because:\n"+str(e))) else: self.jsonldPredicate = None if errors: raise ValidationException("Trying 'SaladRecordField'\n"+"\n".join(errors)) def save(self): r = {} if self.name is not None: r['name'] = save(self.name) if self.doc is not None: r['doc'] = save(self.doc) if self.type is not None: r['type'] = save(self.type) if self.jsonldPredicate is not None: r['jsonldPredicate'] = save(self.jsonldPredicate) return r class SaladRecordSchema(NamedType, RecordSchema, SchemaDefinedType): def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} if 'name' in doc: try: self.name = load_field(doc.get('name'), uri_strtype_True_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'name', str).makeError("the `name` field is not valid because:\n"+str(e))) else: self.name = None if self.name is None: if docRoot is not None: self.name = docRoot else: raise ValidationException("Missing name") baseuri = self.name if 'inVocab' in doc: try: self.inVocab = load_field(doc.get('inVocab'), union_of_None_type_or_booltype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'inVocab', str).makeError("the `inVocab` field is not valid because:\n"+str(e))) else: self.inVocab = None if 'fields' in doc: try: self.fields = load_field(doc.get('fields'), idmap_fields_union_of_None_type_or_array_of_SaladRecordFieldLoader, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'fields', str).makeError("the `fields` field is not valid because:\n"+str(e))) else: self.fields = None try: self.type = load_field(doc.get('type'), typedsl_Record_symbolLoader_2, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'type', str).makeError("the `type` field is not valid because:\n"+str(e))) if 'doc' in doc: try: self.doc = load_field(doc.get('doc'), union_of_None_type_or_strtype_or_array_of_strtype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'doc', str).makeError("the `doc` field is not valid because:\n"+str(e))) else: self.doc = None if 'docParent' in doc: try: self.docParent = load_field(doc.get('docParent'), uri_union_of_None_type_or_strtype_None_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docParent', str).makeError("the `docParent` field is not valid because:\n"+str(e))) else: self.docParent = None if 'docChild' in doc: try: self.docChild = load_field(doc.get('docChild'), uri_union_of_None_type_or_strtype_or_array_of_strtype_None_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docChild', str).makeError("the `docChild` field is not valid because:\n"+str(e))) else: self.docChild = None if 'docAfter' in doc: try: self.docAfter = load_field(doc.get('docAfter'), uri_union_of_None_type_or_strtype_None_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docAfter', str).makeError("the `docAfter` field is not valid because:\n"+str(e))) else: self.docAfter = None if 'jsonldPredicate' in doc: try: self.jsonldPredicate = load_field(doc.get('jsonldPredicate'), union_of_None_type_or_strtype_or_JsonldPredicateLoader, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'jsonldPredicate', str).makeError("the `jsonldPredicate` field is not valid because:\n"+str(e))) else: self.jsonldPredicate = None if 'documentRoot' in doc: try: self.documentRoot = load_field(doc.get('documentRoot'), union_of_None_type_or_booltype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'documentRoot', str).makeError("the `documentRoot` field is not valid because:\n"+str(e))) else: self.documentRoot = None if 'abstract' in doc: try: self.abstract = load_field(doc.get('abstract'), union_of_None_type_or_booltype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'abstract', str).makeError("the `abstract` field is not valid because:\n"+str(e))) else: self.abstract = None if 'extends' in doc: try: self.extends = load_field(doc.get('extends'), uri_union_of_None_type_or_strtype_or_array_of_strtype_None_False_1, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'extends', str).makeError("the `extends` field is not valid because:\n"+str(e))) else: self.extends = None if 'specialize' in doc: try: self.specialize = load_field(doc.get('specialize'), idmap_specialize_union_of_None_type_or_array_of_SpecializeDefLoader, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'specialize', str).makeError("the `specialize` field is not valid because:\n"+str(e))) else: self.specialize = None if errors: raise ValidationException("Trying 'SaladRecordSchema'\n"+"\n".join(errors)) def save(self): r = {} if self.name is not None: r['name'] = save(self.name) if self.inVocab is not None: r['inVocab'] = save(self.inVocab) if self.fields is not None: r['fields'] = save(self.fields) if self.type is not None: r['type'] = save(self.type) if self.doc is not None: r['doc'] = save(self.doc) if self.docParent is not None: r['docParent'] = save(self.docParent) if self.docChild is not None: r['docChild'] = save(self.docChild) if self.docAfter is not None: r['docAfter'] = save(self.docAfter) if self.jsonldPredicate is not None: r['jsonldPredicate'] = save(self.jsonldPredicate) if self.documentRoot is not None: r['documentRoot'] = save(self.documentRoot) if self.abstract is not None: r['abstract'] = save(self.abstract) if self.extends is not None: r['extends'] = save(self.extends) if self.specialize is not None: r['specialize'] = save(self.specialize) return r class SaladEnumSchema(NamedType, EnumSchema, SchemaDefinedType): """ Define an enumerated type. """ def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} if 'name' in doc: try: self.name = load_field(doc.get('name'), uri_strtype_True_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'name', str).makeError("the `name` field is not valid because:\n"+str(e))) else: self.name = None if self.name is None: if docRoot is not None: self.name = docRoot else: raise ValidationException("Missing name") baseuri = self.name if 'inVocab' in doc: try: self.inVocab = load_field(doc.get('inVocab'), union_of_None_type_or_booltype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'inVocab', str).makeError("the `inVocab` field is not valid because:\n"+str(e))) else: self.inVocab = None try: self.symbols = load_field(doc.get('symbols'), uri_array_of_strtype_True_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'symbols', str).makeError("the `symbols` field is not valid because:\n"+str(e))) try: self.type = load_field(doc.get('type'), typedsl_Enum_symbolLoader_2, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'type', str).makeError("the `type` field is not valid because:\n"+str(e))) if 'doc' in doc: try: self.doc = load_field(doc.get('doc'), union_of_None_type_or_strtype_or_array_of_strtype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'doc', str).makeError("the `doc` field is not valid because:\n"+str(e))) else: self.doc = None if 'docParent' in doc: try: self.docParent = load_field(doc.get('docParent'), uri_union_of_None_type_or_strtype_None_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docParent', str).makeError("the `docParent` field is not valid because:\n"+str(e))) else: self.docParent = None if 'docChild' in doc: try: self.docChild = load_field(doc.get('docChild'), uri_union_of_None_type_or_strtype_or_array_of_strtype_None_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docChild', str).makeError("the `docChild` field is not valid because:\n"+str(e))) else: self.docChild = None if 'docAfter' in doc: try: self.docAfter = load_field(doc.get('docAfter'), uri_union_of_None_type_or_strtype_None_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docAfter', str).makeError("the `docAfter` field is not valid because:\n"+str(e))) else: self.docAfter = None if 'jsonldPredicate' in doc: try: self.jsonldPredicate = load_field(doc.get('jsonldPredicate'), union_of_None_type_or_strtype_or_JsonldPredicateLoader, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'jsonldPredicate', str).makeError("the `jsonldPredicate` field is not valid because:\n"+str(e))) else: self.jsonldPredicate = None if 'documentRoot' in doc: try: self.documentRoot = load_field(doc.get('documentRoot'), union_of_None_type_or_booltype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'documentRoot', str).makeError("the `documentRoot` field is not valid because:\n"+str(e))) else: self.documentRoot = None if 'extends' in doc: try: self.extends = load_field(doc.get('extends'), uri_union_of_None_type_or_strtype_or_array_of_strtype_None_False_1, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'extends', str).makeError("the `extends` field is not valid because:\n"+str(e))) else: self.extends = None if errors: raise ValidationException("Trying 'SaladEnumSchema'\n"+"\n".join(errors)) def save(self): r = {} if self.name is not None: r['name'] = save(self.name) if self.inVocab is not None: r['inVocab'] = save(self.inVocab) if self.symbols is not None: r['symbols'] = save(self.symbols) if self.type is not None: r['type'] = save(self.type) if self.doc is not None: r['doc'] = save(self.doc) if self.docParent is not None: r['docParent'] = save(self.docParent) if self.docChild is not None: r['docChild'] = save(self.docChild) if self.docAfter is not None: r['docAfter'] = save(self.docAfter) if self.jsonldPredicate is not None: r['jsonldPredicate'] = save(self.jsonldPredicate) if self.documentRoot is not None: r['documentRoot'] = save(self.documentRoot) if self.extends is not None: r['extends'] = save(self.extends) return r class Documentation(NamedType, DocType): """ A documentation section. This type exists to facilitate self-documenting schemas but has no role in formal validation. """ def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} if 'name' in doc: try: self.name = load_field(doc.get('name'), uri_strtype_True_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'name', str).makeError("the `name` field is not valid because:\n"+str(e))) else: self.name = None if self.name is None: if docRoot is not None: self.name = docRoot else: raise ValidationException("Missing name") baseuri = self.name if 'inVocab' in doc: try: self.inVocab = load_field(doc.get('inVocab'), union_of_None_type_or_booltype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'inVocab', str).makeError("the `inVocab` field is not valid because:\n"+str(e))) else: self.inVocab = None if 'doc' in doc: try: self.doc = load_field(doc.get('doc'), union_of_None_type_or_strtype_or_array_of_strtype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'doc', str).makeError("the `doc` field is not valid because:\n"+str(e))) else: self.doc = None if 'docParent' in doc: try: self.docParent = load_field(doc.get('docParent'), uri_union_of_None_type_or_strtype_None_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docParent', str).makeError("the `docParent` field is not valid because:\n"+str(e))) else: self.docParent = None if 'docChild' in doc: try: self.docChild = load_field(doc.get('docChild'), uri_union_of_None_type_or_strtype_or_array_of_strtype_None_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docChild', str).makeError("the `docChild` field is not valid because:\n"+str(e))) else: self.docChild = None if 'docAfter' in doc: try: self.docAfter = load_field(doc.get('docAfter'), uri_union_of_None_type_or_strtype_None_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docAfter', str).makeError("the `docAfter` field is not valid because:\n"+str(e))) else: self.docAfter = None try: self.type = load_field(doc.get('type'), typedsl_Documentation_symbolLoader_2, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'type', str).makeError("the `type` field is not valid because:\n"+str(e))) if errors: raise ValidationException("Trying 'Documentation'\n"+"\n".join(errors)) def save(self): r = {} if self.name is not None: r['name'] = save(self.name) if self.inVocab is not None: r['inVocab'] = save(self.inVocab) if self.doc is not None: r['doc'] = save(self.doc) if self.docParent is not None: r['docParent'] = save(self.docParent) if self.docChild is not None: r['docChild'] = save(self.docChild) if self.docAfter is not None: r['docAfter'] = save(self.docAfter) if self.type is not None: r['type'] = save(self.type) return r _vocab = { "Any": "https://w3id.org/cwl/salad#Any", "ArraySchema": "https://w3id.org/cwl/salad#ArraySchema", "DocType": "https://w3id.org/cwl/salad#DocType", "Documentation": "https://w3id.org/cwl/salad#Documentation", "EnumSchema": "https://w3id.org/cwl/salad#EnumSchema", "JsonldPredicate": "https://w3id.org/cwl/salad#JsonldPredicate", "NamedType": "https://w3id.org/cwl/salad#NamedType", "PrimitiveType": "https://w3id.org/cwl/salad#PrimitiveType", "RecordField": "https://w3id.org/cwl/salad#RecordField", "RecordSchema": "https://w3id.org/cwl/salad#RecordSchema", "SaladEnumSchema": "https://w3id.org/cwl/salad#SaladEnumSchema", "SaladRecordField": "https://w3id.org/cwl/salad#SaladRecordField", "SaladRecordSchema": "https://w3id.org/cwl/salad#SaladRecordSchema", "SchemaDefinedType": "https://w3id.org/cwl/salad#SchemaDefinedType", "SpecializeDef": "https://w3id.org/cwl/salad#SpecializeDef", "array": "https://w3id.org/cwl/salad#array", "boolean": "http://www.w3.org/2001/XMLSchema#boolean", "documentation": "https://w3id.org/cwl/salad#documentation", "double": "http://www.w3.org/2001/XMLSchema#double", "enum": "https://w3id.org/cwl/salad#enum", "float": "http://www.w3.org/2001/XMLSchema#float", "int": "http://www.w3.org/2001/XMLSchema#int", "long": "http://www.w3.org/2001/XMLSchema#long", "null": "https://w3id.org/cwl/salad#null", "record": "https://w3id.org/cwl/salad#record", "string": "http://www.w3.org/2001/XMLSchema#string", } _rvocab = { "https://w3id.org/cwl/salad#Any": "Any", "https://w3id.org/cwl/salad#ArraySchema": "ArraySchema", "https://w3id.org/cwl/salad#DocType": "DocType", "https://w3id.org/cwl/salad#Documentation": "Documentation", "https://w3id.org/cwl/salad#EnumSchema": "EnumSchema", "https://w3id.org/cwl/salad#JsonldPredicate": "JsonldPredicate", "https://w3id.org/cwl/salad#NamedType": "NamedType", "https://w3id.org/cwl/salad#PrimitiveType": "PrimitiveType", "https://w3id.org/cwl/salad#RecordField": "RecordField", "https://w3id.org/cwl/salad#RecordSchema": "RecordSchema", "https://w3id.org/cwl/salad#SaladEnumSchema": "SaladEnumSchema", "https://w3id.org/cwl/salad#SaladRecordField": "SaladRecordField", "https://w3id.org/cwl/salad#SaladRecordSchema": "SaladRecordSchema", "https://w3id.org/cwl/salad#SchemaDefinedType": "SchemaDefinedType", "https://w3id.org/cwl/salad#SpecializeDef": "SpecializeDef", "https://w3id.org/cwl/salad#array": "array", "http://www.w3.org/2001/XMLSchema#boolean": "boolean", "https://w3id.org/cwl/salad#documentation": "documentation", "http://www.w3.org/2001/XMLSchema#double": "double", "https://w3id.org/cwl/salad#enum": "enum", "http://www.w3.org/2001/XMLSchema#float": "float", "http://www.w3.org/2001/XMLSchema#int": "int", "http://www.w3.org/2001/XMLSchema#long": "long", "https://w3id.org/cwl/salad#null": "null", "https://w3id.org/cwl/salad#record": "record", "http://www.w3.org/2001/XMLSchema#string": "string", } floattype = _PrimitiveLoader(float) None_type = _PrimitiveLoader(type(None)) inttype = _PrimitiveLoader(int) strtype = _PrimitiveLoader((str, six.text_type)) booltype = _PrimitiveLoader(bool) Any_type = _AnyLoader() PrimitiveTypeLoader = _EnumLoader(("null", "boolean", "int", "long", "float", "double", "string",)) AnyLoader = _EnumLoader(("Any",)) RecordFieldLoader = _RecordLoader(RecordField) RecordSchemaLoader = _RecordLoader(RecordSchema) EnumSchemaLoader = _RecordLoader(EnumSchema) ArraySchemaLoader = _RecordLoader(ArraySchema) JsonldPredicateLoader = _RecordLoader(JsonldPredicate) SpecializeDefLoader = _RecordLoader(SpecializeDef) NamedTypeLoader = _RecordLoader(NamedType) DocTypeLoader = _RecordLoader(DocType) SchemaDefinedTypeLoader = _RecordLoader(SchemaDefinedType) SaladRecordFieldLoader = _RecordLoader(SaladRecordField) SaladRecordSchemaLoader = _RecordLoader(SaladRecordSchema) SaladEnumSchemaLoader = _RecordLoader(SaladEnumSchema) DocumentationLoader = _RecordLoader(Documentation) uri_strtype_True_False_None = _URILoader(strtype, True, False, None) union_of_None_type_or_strtype = _UnionLoader((None_type, strtype,)) union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype = _UnionLoader((PrimitiveTypeLoader, RecordSchemaLoader, EnumSchemaLoader, ArraySchemaLoader, strtype,)) array_of_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype = _ArrayLoader(union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype) union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_or_array_of_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype = _UnionLoader((PrimitiveTypeLoader, RecordSchemaLoader, EnumSchemaLoader, ArraySchemaLoader, strtype, array_of_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype,)) typedsl_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_or_array_of_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_2 = _TypeDSLLoader(union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_or_array_of_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype, 2) array_of_RecordFieldLoader = _ArrayLoader(RecordFieldLoader) union_of_None_type_or_array_of_RecordFieldLoader = _UnionLoader((None_type, array_of_RecordFieldLoader,)) idmap_fields_union_of_None_type_or_array_of_RecordFieldLoader = _IdMapLoader(union_of_None_type_or_array_of_RecordFieldLoader, 'name', 'type') Record_symbolLoader = _EnumLoader(("record",)) typedsl_Record_symbolLoader_2 = _TypeDSLLoader(Record_symbolLoader, 2) array_of_strtype = _ArrayLoader(strtype) uri_array_of_strtype_True_False_None = _URILoader(array_of_strtype, True, False, None) Enum_symbolLoader = _EnumLoader(("enum",)) typedsl_Enum_symbolLoader_2 = _TypeDSLLoader(Enum_symbolLoader, 2) uri_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_or_array_of_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_False_True_2 = _URILoader(union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_or_array_of_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype, False, True, 2) Array_symbolLoader = _EnumLoader(("array",)) typedsl_Array_symbolLoader_2 = _TypeDSLLoader(Array_symbolLoader, 2) uri_union_of_None_type_or_strtype_True_False_None = _URILoader(union_of_None_type_or_strtype, True, False, None) union_of_None_type_or_booltype = _UnionLoader((None_type, booltype,)) union_of_None_type_or_inttype = _UnionLoader((None_type, inttype,)) uri_strtype_None_False_1 = _URILoader(strtype, None, False, 1) union_of_None_type_or_strtype_or_array_of_strtype = _UnionLoader((None_type, strtype, array_of_strtype,)) uri_union_of_None_type_or_strtype_None_False_None = _URILoader(union_of_None_type_or_strtype, None, False, None) uri_union_of_None_type_or_strtype_or_array_of_strtype_None_False_None = _URILoader(union_of_None_type_or_strtype_or_array_of_strtype, None, False, None) union_of_None_type_or_strtype_or_JsonldPredicateLoader = _UnionLoader((None_type, strtype, JsonldPredicateLoader,)) array_of_SaladRecordFieldLoader = _ArrayLoader(SaladRecordFieldLoader) union_of_None_type_or_array_of_SaladRecordFieldLoader = _UnionLoader((None_type, array_of_SaladRecordFieldLoader,)) idmap_fields_union_of_None_type_or_array_of_SaladRecordFieldLoader = _IdMapLoader(union_of_None_type_or_array_of_SaladRecordFieldLoader, 'name', 'type') uri_union_of_None_type_or_strtype_or_array_of_strtype_None_False_1 = _URILoader(union_of_None_type_or_strtype_or_array_of_strtype, None, False, 1) array_of_SpecializeDefLoader = _ArrayLoader(SpecializeDefLoader) union_of_None_type_or_array_of_SpecializeDefLoader = _UnionLoader((None_type, array_of_SpecializeDefLoader,)) idmap_specialize_union_of_None_type_or_array_of_SpecializeDefLoader = _IdMapLoader(union_of_None_type_or_array_of_SpecializeDefLoader, 'specializeFrom', 'specializeTo') Documentation_symbolLoader = _EnumLoader(("documentation",)) typedsl_Documentation_symbolLoader_2 = _TypeDSLLoader(Documentation_symbolLoader, 2) union_of_SaladRecordSchemaLoader_or_SaladEnumSchemaLoader_or_DocumentationLoader = _UnionLoader((SaladRecordSchemaLoader, SaladEnumSchemaLoader, DocumentationLoader,)) array_of_union_of_SaladRecordSchemaLoader_or_SaladEnumSchemaLoader_or_DocumentationLoader = _ArrayLoader(union_of_SaladRecordSchemaLoader_or_SaladEnumSchemaLoader_or_DocumentationLoader) union_of_SaladRecordSchemaLoader_or_SaladEnumSchemaLoader_or_DocumentationLoader_or_array_of_union_of_SaladRecordSchemaLoader_or_SaladEnumSchemaLoader_or_DocumentationLoader = _UnionLoader((SaladRecordSchemaLoader, SaladEnumSchemaLoader, DocumentationLoader, array_of_union_of_SaladRecordSchemaLoader_or_SaladEnumSchemaLoader_or_DocumentationLoader,)) def load_document(doc, baseuri=None, loadingOptions=None): if baseuri is None: baseuri = file_uri(os.getcwd()) + "/" if loadingOptions is None: loadingOptions = LoadingOptions() return _document_load(union_of_SaladRecordSchemaLoader_or_SaladEnumSchemaLoader_or_DocumentationLoader_or_array_of_union_of_SaladRecordSchemaLoader_or_SaladEnumSchemaLoader_or_DocumentationLoader, doc, baseuri, loadingOptions) schema-salad-2.6.20171201034858/schema_salad/__init__.py0000644000175100017510000000162213162250036022271 0ustar peterpeter00000000000000from __future__ import absolute_import import logging import os import sys import typing import six from .utils import onWindows __author__ = 'peter.amstutz@curoverse.com' _logger = logging.getLogger("salad") _logger.addHandler(logging.StreamHandler()) _logger.setLevel(logging.INFO) if six.PY3: if onWindows: # create '/tmp' folder if not present # required by autotranslate module # TODO: remove when https://github.com/PythonCharmers/python-future/issues/295 # is fixed if not os.path.exists("/tmp"): try: os.makedirs("/tmp") except OSError as exception: _logger.error(u"Cannot create '\\tmp' folder in root needed for", "'cwltool' Python 3 installation.") exit(1) from past import autotranslate # type: ignore autotranslate(['avro', 'avro.schema']) schema-salad-2.6.20171201034858/schema_salad/python_codegen.py0000644000175100017510000002117013203345013023533 0ustar peterpeter00000000000000import json import sys import six from six.moves import urllib, cStringIO import collections import logging from pkg_resources import resource_stream from .utils import aslist, flatten from . import schema from .codegen_base import TypeDef, CodeGenBase, shortname from typing import List, Text, Dict, Union, IO, Any class PythonCodeGen(CodeGenBase): def __init__(self, out): # type: (IO[str]) -> None super(PythonCodeGen, self).__init__() self.out = out self.current_class_is_abstract = False def safe_name(self, n): # type: (Text) -> Text avn = schema.avro_name(n) if avn in ("class", "in"): # reserved words avn = avn+"_" return avn def prologue(self): # type: () -> None self.out.write("""# # This file was autogenerated using schema-salad-tool --codegen=python # """) rs = resource_stream(__name__, 'sourceline.py') self.out.write(rs.read().decode("UTF-8")) rs.close() self.out.write("\n\n") rs = resource_stream(__name__, 'python_codegen_support.py') self.out.write(rs.read().decode("UTF-8")) rs.close() self.out.write("\n\n") for p in six.itervalues(self.prims): self.declare_type(p) def begin_class(self, classname, extends, doc, abstract): # type: (Text, List[Text], Text, bool) -> None classname = self.safe_name(classname) if extends: ext = ", ".join(self.safe_name(e) for e in extends) else: ext = "Savable" self.out.write("class %s(%s):\n" % (self.safe_name(classname), ext)) if doc: self.out.write(' """\n') self.out.write(str(doc)) self.out.write('\n """\n') self.serializer = cStringIO() self.current_class_is_abstract = abstract if self.current_class_is_abstract: self.out.write(" pass\n\n") return self.out.write( """ def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} """) self.serializer.write(""" def save(self): r = {} """) def end_class(self, classname): # type: (Text) -> None if self.current_class_is_abstract: return self.out.write(""" if errors: raise ValidationException(\"Trying '%s'\\n\"+\"\\n\".join(errors)) """ % self.safe_name(classname)) self.serializer.write(" return r\n") self.out.write(self.serializer.getvalue()) self.out.write("\n\n") prims = { u"http://www.w3.org/2001/XMLSchema#string": TypeDef("strtype", "_PrimitiveLoader((str, six.text_type))"), u"http://www.w3.org/2001/XMLSchema#int": TypeDef("inttype", "_PrimitiveLoader(int)"), u"http://www.w3.org/2001/XMLSchema#long": TypeDef("inttype", "_PrimitiveLoader(int)"), u"http://www.w3.org/2001/XMLSchema#float": TypeDef("floattype", "_PrimitiveLoader(float)"), u"http://www.w3.org/2001/XMLSchema#double": TypeDef("floattype", "_PrimitiveLoader(float)"), u"http://www.w3.org/2001/XMLSchema#boolean": TypeDef("booltype", "_PrimitiveLoader(bool)"), u"https://w3id.org/cwl/salad#null": TypeDef("None_type", "_PrimitiveLoader(type(None))"), u"https://w3id.org/cwl/salad#Any": TypeDef("Any_type", "_AnyLoader()") } def type_loader(self, t): # type: (Union[List[Any], Dict[Text, Any], Text]) -> TypeDef if isinstance(t, list): sub = [self.type_loader(i) for i in t] return self.declare_type(TypeDef("union_of_%s" % "_or_".join(s.name for s in sub), "_UnionLoader((%s,))" % (", ".join(s.name for s in sub)))) if isinstance(t, dict): if t["type"] in ("array", "https://w3id.org/cwl/salad#array"): i = self.type_loader(t["items"]) return self.declare_type(TypeDef("array_of_%s" % i.name, "_ArrayLoader(%s)" % i.name)) elif t["type"] in ("enum", "https://w3id.org/cwl/salad#enum"): for sym in t["symbols"]: self.add_vocab(shortname(sym), sym) return self.declare_type(TypeDef(self.safe_name(t["name"])+"Loader", '_EnumLoader(("%s",))' % ( '", "'.join(self.safe_name(sym) for sym in t["symbols"])))) elif t["type"] in ("record", "https://w3id.org/cwl/salad#record"): return self.declare_type(TypeDef(self.safe_name(t["name"])+"Loader", "_RecordLoader(%s)" % self.safe_name(t["name"]))) else: raise Exception("wft %s" % t["type"]) if t in self.prims: return self.prims[t] return self.collected_types[self.safe_name(t)+"Loader"] def declare_id_field(self, name, fieldtype, doc): # type: (Text, TypeDef, Text) -> None if self.current_class_is_abstract: return self.declare_field(name, fieldtype, doc, True) self.out.write(""" if self.{safename} is None: if docRoot is not None: self.{safename} = docRoot else: raise ValidationException("Missing {fieldname}") baseuri = self.{safename} """. format(safename=self.safe_name(name), fieldname=shortname(name))) def declare_field(self, name, fieldtype, doc, optional): # type: (Text, TypeDef, Text, bool) -> None if self.current_class_is_abstract: return if optional: self.out.write(" if '{fieldname}' in doc:\n".format(fieldname=shortname(name))) spc = " " else: spc = "" self.out.write("""{spc} try: {spc} self.{safename} = load_field(doc.get('{fieldname}'), {fieldtype}, baseuri, loadingOptions) {spc} except ValidationException as e: {spc} errors.append(SourceLine(doc, '{fieldname}', str).makeError(\"the `{fieldname}` field is not valid because:\\n\"+str(e))) """. format(safename=self.safe_name(name), fieldname=shortname(name), fieldtype=fieldtype.name, spc=spc)) if optional: self.out.write(""" else: self.{safename} = None """.format(safename=self.safe_name(name))) self.out.write("\n") self.serializer.write(" if self.%s is not None:\n r['%s'] = save(self.%s)\n" % (self.safe_name(name), shortname(name), self.safe_name(name))) def uri_loader(self, inner, scoped_id, vocab_term, refScope): # type: (TypeDef, bool, bool, Union[int, None]) -> TypeDef return self.declare_type(TypeDef("uri_%s_%s_%s_%s" % (inner.name, scoped_id, vocab_term, refScope), "_URILoader(%s, %s, %s, %s)" % (inner.name, scoped_id, vocab_term, refScope))) def idmap_loader(self, field, inner, mapSubject, mapPredicate): # type: (Text, TypeDef, Text, Union[Text, None]) -> TypeDef return self.declare_type(TypeDef("idmap_%s_%s" % (self.safe_name(field), inner.name), "_IdMapLoader(%s, '%s', '%s')" % (inner.name, mapSubject, mapPredicate))) def typedsl_loader(self, inner, refScope): # type: (TypeDef, Union[int, None]) -> TypeDef return self.declare_type(TypeDef("typedsl_%s_%s" % (inner.name, refScope), "_TypeDSLLoader(%s, %s)" % (inner.name, refScope))) def epilogue(self, rootLoader): # type: (TypeDef) -> None self.out.write("_vocab = {\n") for k in sorted(self.vocab.keys()): self.out.write(" \"%s\": \"%s\",\n" % (k, self.vocab[k])) self.out.write("}\n") self.out.write("_rvocab = {\n") for k in sorted(self.vocab.keys()): self.out.write(" \"%s\": \"%s\",\n" % (self.vocab[k], k)) self.out.write("}\n\n") for k,tv in six.iteritems(self.collected_types): self.out.write("%s = %s\n" % (tv.name, tv.init)) self.out.write("\n\n") self.out.write(""" def load_document(doc, baseuri=None, loadingOptions=None): if baseuri is None: baseuri = file_uri(os.getcwd()) + "/" if loadingOptions is None: loadingOptions = LoadingOptions() return _document_load(%s, doc, baseuri, loadingOptions) """ % rootLoader.name) schema-salad-2.6.20171201034858/schema_salad/tests/0000755000175100017510000000000013211573301021317 5ustar peterpeter00000000000000schema-salad-2.6.20171201034858/schema_salad/tests/docimp/0000755000175100017510000000000013211573301022572 5ustar peterpeter00000000000000schema-salad-2.6.20171201034858/schema_salad/tests/docimp/d4.yml0000644000175100017510000000005213203345013023617 0ustar peterpeter00000000000000- "hello 4" - $include: d5.md - "hello 5" schema-salad-2.6.20171201034858/schema_salad/tests/docimp/d3.yml0000644000175100017510000000005213203345013023616 0ustar peterpeter00000000000000- "hello 2" - $include: d5.md - "hello 3" schema-salad-2.6.20171201034858/schema_salad/tests/docimp/d1.yml0000644000175100017510000000023313203345013023615 0ustar peterpeter00000000000000$graph: - name: "Semantic_Annotations_for_Linked_Avro_Data" type: documentation doc: - $include: d2.md - $import: d3.yml - $import: d4.yml schema-salad-2.6.20171201034858/schema_salad/tests/docimp/d2.md0000644000175100017510000000000713203345013023414 0ustar peterpeter00000000000000*Hello*schema-salad-2.6.20171201034858/schema_salad/tests/docimp/dpre.json0000644000175100017510000000050013203345013024410 0ustar peterpeter00000000000000[ { "name": "file:///home/peter/work/salad/schema_salad/tests/docimp/d1.yml#Semantic_Annotations_for_Linked_Avro_Data", "type": "documentation", "doc": [ "*Hello*", "hello 2", "hello 3", "hello 4", "hello 5" ] } ] schema-salad-2.6.20171201034858/schema_salad/tests/docimp/d5.md0000644000175100017510000000002213203345013023414 0ustar peterpeter00000000000000*dee dee dee five*schema-salad-2.6.20171201034858/schema_salad/tests/test_cg.py0000644000175100017510000001511213203345013023317 0ustar peterpeter00000000000000import schema_salad.metaschema as cg_metaschema import unittest import logging import os import json from schema_salad.ref_resolver import file_uri from .matcher import JsonDiffMatcher from .util import get_data class TestGeneratedMetaschema(unittest.TestCase): def test_load(self): doc = { "type": "record", "fields": [{ "name": "hello", "doc": "Hello test case", "type": "string" }] } rs = cg_metaschema.RecordSchema(doc, "http://example.com/", cg_metaschema.LoadingOptions()) self.assertEqual("record", rs.type) self.assertEqual("http://example.com/#hello", rs.fields[0].name) self.assertEqual("Hello test case", rs.fields[0].doc) self.assertEqual("string", rs.fields[0].type) self.assertEqual({ "type": "record", "fields": [{ "name": "http://example.com/#hello", "doc": "Hello test case", "type": "string" }] }, rs.save()) def test_err(self): doc = { "doc": "Hello test case", "type": "string" } with self.assertRaises(cg_metaschema.ValidationException): rf = cg_metaschema.RecordField(doc, "", cg_metaschema.LoadingOptions()) def test_include(self): doc = { "name": "hello", "doc": [{"$include": "hello.txt"}], "type": "documentation" } rf = cg_metaschema.Documentation(doc, "http://example.com/", cg_metaschema.LoadingOptions(fileuri=file_uri(get_data("tests/_")))) self.assertEqual("http://example.com/#hello", rf.name) self.assertEqual(["hello world!\n"], rf.doc) self.assertEqual("documentation", rf.type) self.assertEqual({ "name": "http://example.com/#hello", "doc": ["hello world!\n"], "type": "documentation" }, rf.save()) def test_import(self): doc = { "type": "record", "fields": [{ "$import": "hellofield.yml" }] } lead = file_uri(os.path.normpath(get_data("tests"))) rs = cg_metaschema.RecordSchema(doc, "http://example.com/", cg_metaschema.LoadingOptions(fileuri=lead+"/_")) self.assertEqual("record", rs.type) self.assertEqual(lead+"/hellofield.yml#hello", rs.fields[0].name) self.assertEqual("hello world!\n", rs.fields[0].doc) self.assertEqual("string", rs.fields[0].type) self.assertEqual({ "type": "record", "fields": [{ "name": lead+"/hellofield.yml#hello", "doc": "hello world!\n", "type": "string" }] }, rs.save()) maxDiff = None def test_import2(self): rs = cg_metaschema.load_document(file_uri(get_data("tests/docimp/d1.yml")), "", cg_metaschema.LoadingOptions()) self.assertEqual([{'doc': [u'*Hello*', 'hello 2', u'*dee dee dee five*', 'hello 3', 'hello 4', u'*dee dee dee five*', 'hello 5'], 'type': 'documentation', 'name': file_uri(get_data("tests/docimp/d1.yml"))+"#Semantic_Annotations_for_Linked_Avro_Data"}], [r.save() for r in rs]) def test_err2(self): doc = { "type": "rucord", "fields": [{ "name": "hello", "doc": "Hello test case", "type": "string" }] } with self.assertRaises(cg_metaschema.ValidationException): rs = cg_metaschema.RecordSchema(doc, "", cg_metaschema.LoadingOptions()) def test_idmap(self): doc = { "type": "record", "fields": { "hello": { "doc": "Hello test case", "type": "string" } } } rs = cg_metaschema.RecordSchema(doc, "http://example.com/", cg_metaschema.LoadingOptions()) self.assertEqual("record", rs.type) self.assertEqual("http://example.com/#hello", rs.fields[0].name) self.assertEqual("Hello test case", rs.fields[0].doc) self.assertEqual("string", rs.fields[0].type) self.assertEqual({ "type": "record", "fields": [{ "name": "http://example.com/#hello", "doc": "Hello test case", "type": "string" }] }, rs.save()) def test_idmap2(self): doc = { "type": "record", "fields": { "hello": "string" } } rs = cg_metaschema.RecordSchema(doc, "http://example.com/", cg_metaschema.LoadingOptions()) self.assertEqual("record", rs.type) self.assertEqual("http://example.com/#hello", rs.fields[0].name) self.assertEqual(None, rs.fields[0].doc) self.assertEqual("string", rs.fields[0].type) self.assertEqual({ "type": "record", "fields": [{ "name": "http://example.com/#hello", "type": "string" }] }, rs.save()) def test_load_pt(self): doc = cg_metaschema.load_document(file_uri(get_data("tests/pt.yml")), "", cg_metaschema.LoadingOptions()) self.assertEqual(['https://w3id.org/cwl/salad#null', 'http://www.w3.org/2001/XMLSchema#boolean', 'http://www.w3.org/2001/XMLSchema#int', 'http://www.w3.org/2001/XMLSchema#long', 'http://www.w3.org/2001/XMLSchema#float', 'http://www.w3.org/2001/XMLSchema#double', 'http://www.w3.org/2001/XMLSchema#string'], doc.symbols) def test_load_metaschema(self): doc = cg_metaschema.load_document(file_uri(get_data("metaschema/metaschema.yml")), "", cg_metaschema.LoadingOptions()) with open(get_data("tests/metaschema-pre.yml")) as f: pre = json.load(f) saved = [d.save() for d in doc] self.assertEqual(saved, JsonDiffMatcher(pre)) def test_load_cwlschema(self): doc = cg_metaschema.load_document(file_uri(get_data("tests/test_schema/CommonWorkflowLanguage.yml")), "", cg_metaschema.LoadingOptions()) with open(get_data("tests/cwl-pre.yml")) as f: pre = json.load(f) saved = [d.save() for d in doc] self.assertEqual(saved, JsonDiffMatcher(pre)) if __name__ == '__main__': unittest.main() schema-salad-2.6.20171201034858/schema_salad/tests/hellofield.yml0000644000175100017510000000010713203345013024145 0ustar peterpeter00000000000000{ "name": "hello", "doc": {"$include": "hello.txt"}, "type": "string" }schema-salad-2.6.20171201034858/schema_salad/tests/util.py0000644000175100017510000000124413130233260022644 0ustar peterpeter00000000000000from __future__ import absolute_import from pkg_resources import Requirement, resource_filename, ResolutionError # type: ignore from typing import Optional, Text import os def get_data(filename): # type: (Text) -> Optional[Text] filename = os.path.normpath(filename) # normalizing path depending on OS or else it will cause problem when joining path filepath = None try: filepath = resource_filename( Requirement.parse("schema-salad"), filename) except ResolutionError: pass if not filepath or not os.path.isfile(filepath): filepath = os.path.join(os.path.dirname(__file__), os.pardir, filename) return filepath schema-salad-2.6.20171201034858/schema_salad/tests/Process.yml0000644000175100017510000000147312752677740023512 0ustar peterpeter00000000000000$base: "https://w3id.org/cwl/cwl#" $namespaces: cwl: "https://w3id.org/cwl/cwl#" sld: "https://w3id.org/cwl/salad#" $graph: - $import: "../metaschema/metaschema_base.yml" - name: InputBinding type: record abstract: true fields: - name: loadContents type: - "null" - boolean jsonldPredicate: "cwl:loadContents" doc: | Only valid when `type: File` or is an array of `items: File`. Read up to the first 64 KiB of text from the file and place it in the "contents" field of the file object for use by expressions. - name: InputRecordField type: record extends: "sld:RecordField" fields: - name: inputBinding type: [ "null", "#InputBinding" ] jsonldPredicate: "cwl:inputBinding" - name: Blurb type: record extends: InputBinding schema-salad-2.6.20171201034858/schema_salad/tests/test_errors.py0000644000175100017510000000271013203345013024242 0ustar peterpeter00000000000000from __future__ import absolute_import from __future__ import print_function from .util import get_data import unittest from schema_salad.schema import load_schema, load_and_validate from schema_salad.validate import ValidationException from avro.schema import Names import six class TestErrors(unittest.TestCase): def test_errors(self): document_loader, avsc_names, schema_metadata, metaschema_loader = load_schema( get_data(u"tests/test_schema/CommonWorkflowLanguage.yml")) for t in ("test_schema/test1.cwl", "test_schema/test2.cwl", "test_schema/test3.cwl", "test_schema/test4.cwl", "test_schema/test5.cwl", "test_schema/test6.cwl", "test_schema/test7.cwl", "test_schema/test8.cwl", "test_schema/test9.cwl", "test_schema/test10.cwl", "test_schema/test11.cwl", "test_schema/test12.cwl", "test_schema/test13.cwl", "test_schema/test14.cwl", "test_schema/test15.cwl"): with self.assertRaises(ValidationException): try: load_and_validate(document_loader, avsc_names, six.text_type(get_data("tests/"+t)), True) except ValidationException as e: print("\n", e) raise schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/0000755000175100017510000000000013211573301023616 5ustar peterpeter00000000000000schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/test15.cwl0000755000175100017510000000036713203345013025461 0ustar peterpeter00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool baseCommand: echo inputs: message: type: string inputBinding: position: 1 invalid_field: it_is_invalid_field another_invalid_field: invalid outputs: [] schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/test6.cwl0000644000175100017510000000007013025033471025371 0ustar peterpeter00000000000000inputs: foo: string outputs: bar: string steps: [12]schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/test10.cwl0000644000175100017510000000020213025033471025441 0ustar peterpeter00000000000000class: Workflow inputs: foo: string outputs: bar: string steps: step1: scatterMethod: [record] in: [] out: [out]schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/invocation.md0000644000175100017510000000000113025033471026302 0ustar peterpeter00000000000000 schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/test18.cwl0000644000175100017510000000032713203345013025455 0ustar peterpeter00000000000000class: CommandLineTool cwlVersion: v1.0 baseCommand: echo inputs: - id: input type: string? inputBinding: {} outputs: - id: output type: string? outputBinding: {} - id: output1 type: Filea schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/test1.cwl0000644000175100017510000000001713025033471025365 0ustar peterpeter00000000000000class: Workflowschema-salad-2.6.20171201034858/schema_salad/tests/test_schema/Process.yml0000644000175100017510000006117413025033471025772 0ustar peterpeter00000000000000$base: "https://w3id.org/cwl/cwl#" $namespaces: cwl: "https://w3id.org/cwl/cwl#" sld: "https://w3id.org/cwl/salad#" $graph: - name: "Common Workflow Language, v1.0" type: documentation doc: {$include: concepts.md} - $import: "metaschema_base.yml" - name: BaseTypesDoc type: documentation doc: | ## Base types docChild: - "#CWLType" - "#Process" - type: enum name: CWLVersion doc: "Version symbols for published CWL document versions." symbols: - cwl:draft-2 - cwl:draft-3.dev1 - cwl:draft-3.dev2 - cwl:draft-3.dev3 - cwl:draft-3.dev4 - cwl:draft-3.dev5 - cwl:draft-3 - cwl:draft-4.dev1 - cwl:draft-4.dev2 - cwl:draft-4.dev3 - cwl:v1.0.dev4 - cwl:v1.0 - name: CWLType type: enum extends: "sld:PrimitiveType" symbols: - cwl:File - cwl:Directory doc: - "Extends primitive types with the concept of a file and directory as a builtin type." - "File: A File object" - "Directory: A Directory object" - name: File type: record docParent: "#CWLType" doc: | Represents a file (or group of files if `secondaryFiles` is specified) that must be accessible by tools using standard POSIX file system call API such as open(2) and read(2). fields: - name: class type: type: enum name: File_class symbols: - cwl:File jsonldPredicate: _id: "@type" _type: "@vocab" doc: Must be `File` to indicate this object describes a file. - name: location type: string? doc: | An IRI that identifies the file resource. This may be a relative reference, in which case it must be resolved using the base IRI of the document. The location may refer to a local or remote resource; the implementation must use the IRI to retrieve file content. If an implementation is unable to retrieve the file content stored at a remote resource (due to unsupported protocol, access denied, or other issue) it must signal an error. If the `location` field is not provided, the `contents` field must be provided. The implementation must assign a unique identifier for the `location` field. If the `path` field is provided but the `location` field is not, an implementation may assign the value of the `path` field to `location`, then follow the rules above. jsonldPredicate: _id: "@id" _type: "@id" - name: path type: string? doc: | The local host path where the File is available when a CommandLineTool is executed. This field must be set by the implementation. The final path component must match the value of `basename`. This field must not be used in any other context. The command line tool being executed must be able to to access the file at `path` using the POSIX `open(2)` syscall. As a special case, if the `path` field is provided but the `location` field is not, an implementation may assign the value of the `path` field to `location`, and remove the `path` field. If the `path` contains [POSIX shell metacharacters](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02) (`|`,`&`, `;`, `<`, `>`, `(`,`)`, `$`,`` ` ``, `\`, `"`, `'`, ``, ``, and ``) or characters [not allowed](http://www.iana.org/assignments/idna-tables-6.3.0/idna-tables-6.3.0.xhtml) for [Internationalized Domain Names for Applications](https://tools.ietf.org/html/rfc6452) then implementations may terminate the process with a `permanentFailure`. jsonldPredicate: "_id": "cwl:path" "_type": "@id" - name: basename type: string? doc: | The base name of the file, that is, the name of the file without any leading directory path. The base name must not contain a slash `/`. If not provided, the implementation must set this field based on the `location` field by taking the final path component after parsing `location` as an IRI. If `basename` is provided, it is not required to match the value from `location`. When this file is made available to a CommandLineTool, it must be named with `basename`, i.e. the final component of the `path` field must match `basename`. jsonldPredicate: "cwl:basename" - name: dirname type: string? doc: | The name of the directory containing file, that is, the path leading up to the final slash in the path such that `dirname + '/' + basename == path`. The implementation must set this field based on the value of `path` prior to evaluating parameter references or expressions in a CommandLineTool document. This field must not be used in any other context. - name: nameroot type: string? doc: | The basename root such that `nameroot + nameext == basename`, and `nameext` is empty or begins with a period and contains at most one period. For the purposess of path splitting leading periods on the basename are ignored; a basename of `.cshrc` will have a nameroot of `.cshrc`. The implementation must set this field automatically based on the value of `basename` prior to evaluating parameter references or expressions. - name: nameext type: string? doc: | The basename extension such that `nameroot + nameext == basename`, and `nameext` is empty or begins with a period and contains at most one period. Leading periods on the basename are ignored; a basename of `.cshrc` will have an empty `nameext`. The implementation must set this field automatically based on the value of `basename` prior to evaluating parameter references or expressions. - name: checksum type: string? doc: | Optional hash code for validating file integrity. Currently must be in the form "sha1$ + hexadecimal string" using the SHA-1 algorithm. - name: size type: long? doc: Optional file size - name: "secondaryFiles" type: - "null" - type: array items: [File, Directory] jsonldPredicate: "cwl:secondaryFiles" doc: | A list of additional files that are associated with the primary file and must be transferred alongside the primary file. Examples include indexes of the primary file, or external references which must be included when loading primary document. A file object listed in `secondaryFiles` may itself include `secondaryFiles` for which the same rules apply. - name: format type: string? jsonldPredicate: _id: cwl:format _type: "@id" identity: true doc: | The format of the file: this must be an IRI of a concept node that represents the file format, preferrably defined within an ontology. If no ontology is available, file formats may be tested by exact match. Reasoning about format compatability must be done by checking that an input file format is the same, `owl:equivalentClass` or `rdfs:subClassOf` the format required by the input parameter. `owl:equivalentClass` is transitive with `rdfs:subClassOf`, e.g. if ` owl:equivalentClass ` and ` owl:subclassOf ` then infer ` owl:subclassOf `. File format ontologies may be provided in the "$schema" metadata at the root of the document. If no ontologies are specified in `$schema`, the runtime may perform exact file format matches. - name: contents type: string? doc: | File contents literal. Maximum of 64 KiB. If neither `location` nor `path` is provided, `contents` must be non-null. The implementation must assign a unique identifier for the `location` field. When the file is staged as input to CommandLineTool, the value of `contents` must be written to a file. If `loadContents` of `inputBinding` or `outputBinding` is true and `location` is valid, the implementation must read up to the first 64 KiB of text from the file and place it in the "contents" field. - name: Directory type: record docAfter: "#File" doc: | Represents a directory to present to a command line tool. fields: - name: class type: type: enum name: Directory_class symbols: - cwl:Directory jsonldPredicate: _id: "@type" _type: "@vocab" doc: Must be `Directory` to indicate this object describes a Directory. - name: location type: string? doc: | An IRI that identifies the directory resource. This may be a relative reference, in which case it must be resolved using the base IRI of the document. The location may refer to a local or remote resource. If the `listing` field is not set, the implementation must use the location IRI to retrieve directory listing. If an implementation is unable to retrieve the directory listing stored at a remote resource (due to unsupported protocol, access denied, or other issue) it must signal an error. If the `location` field is not provided, the `listing` field must be provided. The implementation must assign a unique identifier for the `location` field. If the `path` field is provided but the `location` field is not, an implementation may assign the value of the `path` field to `location`, then follow the rules above. jsonldPredicate: _id: "@id" _type: "@id" - name: path type: string? doc: | The local path where the Directory is made available prior to executing a CommandLineTool. This must be set by the implementation. This field must not be used in any other context. The command line tool being executed must be able to to access the directory at `path` using the POSIX `opendir(2)` syscall. If the `path` contains [POSIX shell metacharacters](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02) (`|`,`&`, `;`, `<`, `>`, `(`,`)`, `$`,`` ` ``, `\`, `"`, `'`, ``, ``, and ``) or characters [not allowed](http://www.iana.org/assignments/idna-tables-6.3.0/idna-tables-6.3.0.xhtml) for [Internationalized Domain Names for Applications](https://tools.ietf.org/html/rfc6452) then implementations may terminate the process with a `permanentFailure`. jsonldPredicate: _id: "cwl:path" _type: "@id" - name: basename type: string? doc: | The base name of the directory, that is, the name of the file without any leading directory path. The base name must not contain a slash `/`. If not provided, the implementation must set this field based on the `location` field by taking the final path component after parsing `location` as an IRI. If `basename` is provided, it is not required to match the value from `location`. When this file is made available to a CommandLineTool, it must be named with `basename`, i.e. the final component of the `path` field must match `basename`. jsonldPredicate: "cwl:basename" - name: listing type: - "null" - type: array items: [File, Directory] doc: | List of files or subdirectories contained in this directory. The name of each file or subdirectory is determined by the `basename` field of each `File` or `Directory` object. It is an error if a `File` shares a `basename` with any other entry in `listing`. If two or more `Directory` object share the same `basename`, this must be treated as equivalent to a single subdirectory with the listings recursively merged. jsonldPredicate: _id: "cwl:listing" - name: SchemaBase type: record abstract: true fields: - name: label type: - "null" - string jsonldPredicate: "rdfs:label" doc: "A short, human-readable label of this object." - name: Parameter type: record extends: SchemaBase abstract: true doc: | Define an input or output parameter to a process. fields: - name: secondaryFiles type: - "null" - string - Expression - type: array items: [string, Expression] jsonldPredicate: "cwl:secondaryFiles" doc: | Only valid when `type: File` or is an array of `items: File`. Describes files that must be included alongside the primary file(s). If the value is an expression, the value of `self` in the expression must be the primary input or output File to which this binding applies. If the value is a string, it specifies that the following pattern should be applied to the primary file: 1. If string begins with one or more caret `^` characters, for each caret, remove the last file extension from the path (the last period `.` and all following characters). If there are no file extensions, the path is unchanged. 2. Append the remainder of the string to the end of the file path. - name: format type: - "null" - string - type: array items: string - Expression jsonldPredicate: _id: cwl:format _type: "@id" identity: true doc: | Only valid when `type: File` or is an array of `items: File`. For input parameters, this must be one or more IRIs of concept nodes that represents file formats which are allowed as input to this parameter, preferrably defined within an ontology. If no ontology is available, file formats may be tested by exact match. For output parameters, this is the file format that will be assigned to the output parameter. - name: streamable type: boolean? doc: | Only valid when `type: File` or is an array of `items: File`. A value of `true` indicates that the file is read or written sequentially without seeking. An implementation may use this flag to indicate whether it is valid to stream file contents using a named pipe. Default: `false`. - name: doc type: - string? - string[]? doc: "A documentation string for this type, or an array of strings which should be concatenated." jsonldPredicate: "rdfs:comment" - type: enum name: Expression doc: | 'Expression' is not a real type. It indicates that a field must allow runtime parameter references. If [InlineJavascriptRequirement](#InlineJavascriptRequirement) is declared and supported by the platform, the field must also allow Javascript expressions. symbols: - cwl:ExpressionPlaceholder - name: InputBinding type: record abstract: true fields: - name: loadContents type: - "null" - boolean jsonldPredicate: "cwl:loadContents" doc: | Only valid when `type: File` or is an array of `items: File`. Read up to the first 64 KiB of text from the file and place it in the "contents" field of the file object for use by expressions. - name: OutputBinding type: record abstract: true - name: InputSchema extends: SchemaBase type: record abstract: true - name: OutputSchema extends: SchemaBase type: record abstract: true - name: InputRecordField type: record extends: "sld:RecordField" specialize: - specializeFrom: "sld:RecordSchema" specializeTo: InputRecordSchema - specializeFrom: "sld:EnumSchema" specializeTo: InputEnumSchema - specializeFrom: "sld:ArraySchema" specializeTo: InputArraySchema - specializeFrom: "sld:PrimitiveType" specializeTo: CWLType fields: - name: inputBinding type: InputBinding? jsonldPredicate: "cwl:inputBinding" - name: label type: string? jsonldPredicate: "rdfs:label" doc: "A short, human-readable label of this process object." - name: InputRecordSchema type: record extends: ["sld:RecordSchema", InputSchema] specialize: - specializeFrom: "sld:RecordField" specializeTo: InputRecordField - name: InputEnumSchema type: record extends: ["sld:EnumSchema", InputSchema] fields: - name: inputBinding type: InputBinding? jsonldPredicate: "cwl:inputBinding" - name: InputArraySchema type: record extends: ["sld:ArraySchema", InputSchema] specialize: - specializeFrom: "sld:RecordSchema" specializeTo: InputRecordSchema - specializeFrom: "sld:EnumSchema" specializeTo: InputEnumSchema - specializeFrom: "sld:ArraySchema" specializeTo: InputArraySchema - specializeFrom: "sld:PrimitiveType" specializeTo: CWLType fields: - name: inputBinding type: InputBinding? jsonldPredicate: "cwl:inputBinding" - name: OutputRecordField type: record extends: "sld:RecordField" specialize: - specializeFrom: "sld:RecordSchema" specializeTo: OutputRecordSchema - specializeFrom: "sld:EnumSchema" specializeTo: OutputEnumSchema - specializeFrom: "sld:ArraySchema" specializeTo: OutputArraySchema - specializeFrom: "sld:PrimitiveType" specializeTo: CWLType fields: - name: outputBinding type: OutputBinding? jsonldPredicate: "cwl:outputBinding" - name: OutputRecordSchema type: record extends: ["sld:RecordSchema", "#OutputSchema"] docParent: "#OutputParameter" specialize: - specializeFrom: "sld:RecordField" specializeTo: OutputRecordField - name: OutputEnumSchema type: record extends: ["sld:EnumSchema", OutputSchema] docParent: "#OutputParameter" fields: - name: outputBinding type: OutputBinding? jsonldPredicate: "cwl:outputBinding" - name: OutputArraySchema type: record extends: ["sld:ArraySchema", OutputSchema] docParent: "#OutputParameter" specialize: - specializeFrom: "sld:RecordSchema" specializeTo: OutputRecordSchema - specializeFrom: "sld:EnumSchema" specializeTo: OutputEnumSchema - specializeFrom: "sld:ArraySchema" specializeTo: OutputArraySchema - specializeFrom: "sld:PrimitiveType" specializeTo: CWLType fields: - name: outputBinding type: OutputBinding? jsonldPredicate: "cwl:outputBinding" - name: InputParameter type: record extends: Parameter fields: - name: id type: string jsonldPredicate: "@id" doc: "The unique identifier for this parameter object." - name: inputBinding type: InputBinding? jsonldPredicate: "cwl:inputBinding" doc: | Describes how to handle the inputs of a process and convert them into a concrete form for execution, such as command line parameters. - name: default type: Any? jsonldPredicate: "cwl:default" doc: | The default value for this parameter if not provided in the input object. - name: type type: - "null" - CWLType - InputRecordSchema - InputEnumSchema - InputArraySchema - string - type: array items: - CWLType - InputRecordSchema - InputEnumSchema - InputArraySchema - string jsonldPredicate: "_id": "sld:type" "_type": "@vocab" refScope: 2 typeDSL: True doc: | Specify valid types of data that may be assigned to this parameter. - name: OutputParameter type: record extends: Parameter fields: - name: id type: string jsonldPredicate: "@id" doc: "The unique identifier for this parameter object." - name: outputBinding type: OutputBinding? jsonldPredicate: "cwl:outputBinding" doc: | Describes how to handle the outputs of a process. - type: record name: ProcessRequirement abstract: true doc: | A process requirement declares a prerequisite that may or must be fulfilled before executing a process. See [`Process.hints`](#process) and [`Process.requirements`](#process). Process requirements are the primary mechanism for specifying extensions to the CWL core specification. - type: record name: Process abstract: true doc: | The base executable type in CWL is the `Process` object defined by the document. Note that the `Process` object is abstract and cannot be directly executed. fields: - name: id type: string? jsonldPredicate: "@id" doc: "The unique identifier for this process object." - name: inputs type: type: array items: InputParameter jsonldPredicate: _id: "cwl:inputs" mapSubject: id mapPredicate: type doc: | Defines the input parameters of the process. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used to build a user interface for constructing the input object. - name: outputs type: type: array items: OutputParameter jsonldPredicate: _id: "cwl:outputs" mapSubject: id mapPredicate: type doc: | Defines the parameters representing the output of the process. May be used to generate and/or validate the output object. - name: requirements type: ProcessRequirement[]? jsonldPredicate: _id: "cwl:requirements" mapSubject: class doc: | Declares requirements that apply to either the runtime environment or the workflow engine that must be met in order to execute this process. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option. - name: hints type: Any[]? doc: | Declares hints applying to either the runtime environment or the workflow engine that may be helpful in executing this process. It is not an error if an implementation cannot satisfy all hints, however the implementation may report a warning. jsonldPredicate: _id: cwl:hints noLinkCheck: true mapSubject: class - name: label type: string? jsonldPredicate: "rdfs:label" doc: "A short, human-readable label of this process object." - name: doc type: string? jsonldPredicate: "rdfs:comment" doc: "A long, human-readable description of this process object." - name: cwlVersion type: CWLVersion? doc: | CWL document version. Always required at the document root. Not required for a Process embedded inside another Process. jsonldPredicate: "_id": "cwl:cwlVersion" "_type": "@vocab" - name: InlineJavascriptRequirement type: record extends: ProcessRequirement doc: | Indicates that the workflow platform must support inline Javascript expressions. If this requirement is not present, the workflow platform must not perform expression interpolatation. fields: - name: class type: string doc: "Always 'InlineJavascriptRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: expressionLib type: string[]? doc: | Additional code fragments that will also be inserted before executing the expression code. Allows for function definitions that may be called from CWL expressions. - name: SchemaDefRequirement type: record extends: ProcessRequirement doc: | This field consists of an array of type definitions which must be used when interpreting the `inputs` and `outputs` fields. When a `type` field contain a IRI, the implementation must check if the type is defined in `schemaDefs` and use that definition. If the type is not found in `schemaDefs`, it is an error. The entries in `schemaDefs` must be processed in the order listed such that later schema definitions may refer to earlier schema definitions. fields: - name: class type: string doc: "Always 'SchemaDefRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: types type: type: array items: InputSchema doc: The list of type definitions. schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/Workflow.yml0000644000175100017510000004720013203345013026154 0ustar peterpeter00000000000000$base: "https://w3id.org/cwl/cwl#" $namespaces: sld: "https://w3id.org/cwl/salad#" cwl: "https://w3id.org/cwl/cwl#" $graph: - name: "WorkflowDoc" type: documentation doc: - | # Common Workflow Language (CWL) Workflow Description, v1.0 This version: * https://w3id.org/cwl/v1.0/ Current version: * https://w3id.org/cwl/ - "\n\n" - {$include: contrib.md} - "\n\n" - | # Abstract One way to define a workflow is: an analysis task represented by a directed graph describing a sequence of operations that transform an input data set to output. This specification defines the Common Workflow Language (CWL) Workflow description, a vendor-neutral standard for representing workflows intended to be portable across a variety of computing platforms. - {$include: intro.md} - | ## Introduction to v1.0 This specification represents the first full release from the CWL group. Since draft-3, this draft introduces the following changes and additions: * The `inputs` and `outputs` fields have been renamed `in` and `out`. * Syntax simplifcations: denoted by the `map<>` syntax. Example: `in` contains a list of items, each with an id. Now one can specify a mapping of that identifier to the corresponding `InputParameter`. ``` in: - id: one type: string doc: First input parameter - id: two type: int doc: Second input parameter ``` can be ``` in: one: type: string doc: First input parameter two: type: int doc: Second input parameter ``` * The common field `description` has been renamed to `doc`. ## Purpose The Common Workflow Language Command Line Tool Description express workflows for data-intensive science, such as Bioinformatics, Chemistry, Physics, and Astronomy. This specification is intended to define a data and execution model for Workflows that can be implemented on top of a variety of computing platforms, ranging from an individual workstation to cluster, grid, cloud, and high performance computing systems. - {$include: concepts.md} - name: ExpressionToolOutputParameter type: record extends: OutputParameter fields: - name: type type: - "null" - "#CWLType" - "#OutputRecordSchema" - "#OutputEnumSchema" - "#OutputArraySchema" - string - type: array items: - "#CWLType" - "#OutputRecordSchema" - "#OutputEnumSchema" - "#OutputArraySchema" - string jsonldPredicate: "_id": "sld:type" "_type": "@vocab" refScope: 2 typeDSL: True doc: | Specify valid types of data that may be assigned to this parameter. - type: record name: ExpressionTool extends: Process specialize: - specializeFrom: "#OutputParameter" specializeTo: "#ExpressionToolOutputParameter" documentRoot: true doc: | Execute an expression as a Workflow step. fields: - name: "class" jsonldPredicate: "_id": "@type" "_type": "@vocab" type: string - name: expression type: [string, Expression] doc: | The expression to execute. The expression must return a JSON object which matches the output parameters of the ExpressionTool. - name: LinkMergeMethod type: enum docParent: "#WorkflowStepInput" doc: The input link merge method, described in [WorkflowStepInput](#WorkflowStepInput). symbols: - merge_nested - merge_flattened - name: WorkflowOutputParameter type: record extends: OutputParameter docParent: "#Workflow" doc: | Describe an output parameter of a workflow. The parameter must be connected to one or more parameters defined in the workflow that will provide the value of the output parameter. fields: - name: outputSource doc: | Specifies one or more workflow parameters that supply the value of to the output parameter. jsonldPredicate: "_id": "cwl:outputSource" "_type": "@id" refScope: 0 type: - string? - string[]? - name: linkMerge type: ["null", "#LinkMergeMethod"] jsonldPredicate: "cwl:linkMerge" doc: | The method to use to merge multiple sources into a single array. If not specified, the default method is "merge_nested". - name: type type: - "null" - "#CWLType" - "#OutputRecordSchema" - "#OutputEnumSchema" - "#OutputArraySchema" - string - type: array items: - "#CWLType" - "#OutputRecordSchema" - "#OutputEnumSchema" - "#OutputArraySchema" - string jsonldPredicate: "_id": "sld:type" "_type": "@vocab" refScope: 2 typeDSL: True doc: | Specify valid types of data that may be assigned to this parameter. - name: Sink type: record abstract: true fields: - name: source doc: | Specifies one or more workflow parameters that will provide input to the underlying step parameter. jsonldPredicate: "_id": "cwl:source" "_type": "@id" refScope: 2 type: - string? - string[]? - name: linkMerge type: LinkMergeMethod? jsonldPredicate: "cwl:linkMerge" doc: | The method to use to merge multiple inbound links into a single array. If not specified, the default method is "merge_nested". - type: record name: WorkflowStepInput extends: Sink docParent: "#WorkflowStep" doc: | The input of a workflow step connects an upstream parameter (from the workflow inputs, or the outputs of other workflows steps) with the input parameters of the underlying step. ## Input object A WorkflowStepInput object must contain an `id` field in the form `#fieldname` or `#stepname.fieldname`. When the `id` field contains a period `.` the field name consists of the characters following the final period. This defines a field of the workflow step input object with the value of the `source` parameter(s). ## Merging To merge multiple inbound data links, [MultipleInputFeatureRequirement](#MultipleInputFeatureRequirement) must be specified in the workflow or workflow step requirements. If the sink parameter is an array, or named in a [workflow scatter](#WorkflowStep) operation, there may be multiple inbound data links listed in the `source` field. The values from the input links are merged depending on the method specified in the `linkMerge` field. If not specified, the default method is "merge_nested". * **merge_nested** The input must be an array consisting of exactly one entry for each input link. If "merge_nested" is specified with a single link, the value from the link must be wrapped in a single-item list. * **merge_flattened** 1. The source and sink parameters must be compatible types, or the source type must be compatible with single element from the "items" type of the destination array parameter. 2. Source parameters which are arrays are concatenated. Source parameters which are single element types are appended as single elements. fields: - name: id type: string jsonldPredicate: "@id" doc: "A unique identifier for this workflow input parameter." - name: default type: ["null", Any] doc: | The default value for this parameter if there is no `source` field. jsonldPredicate: "cwl:default" - name: valueFrom type: - "null" - "string" - "#Expression" jsonldPredicate: "cwl:valueFrom" doc: | To use valueFrom, [StepInputExpressionRequirement](#StepInputExpressionRequirement) must be specified in the workflow or workflow step requirements. If `valueFrom` is a constant string value, use this as the value for this input parameter. If `valueFrom` is a parameter reference or expression, it must be evaluated to yield the actual value to be assiged to the input field. The `self` value of in the parameter reference or expression must be the value of the parameter(s) specified in the `source` field, or null if there is no `source` field. The value of `inputs` in the parameter reference or expression must be the input object to the workflow step after assigning the `source` values and then scattering. The order of evaluating `valueFrom` among step input parameters is undefined and the result of evaluating `valueFrom` on a parameter must not be visible to evaluation of `valueFrom` on other parameters. - type: record name: WorkflowStepOutput docParent: "#WorkflowStep" doc: | Associate an output parameter of the underlying process with a workflow parameter. The workflow parameter (given in the `id` field) be may be used as a `source` to connect with input parameters of other workflow steps, or with an output parameter of the process. fields: - name: id type: string jsonldPredicate: "@id" doc: | A unique identifier for this workflow output parameter. This is the identifier to use in the `source` field of `WorkflowStepInput` to connect the output value to downstream parameters. - name: ScatterMethod type: enum docParent: "#WorkflowStep" doc: The scatter method, as described in [workflow step scatter](#WorkflowStep). symbols: - dotproduct - nested_crossproduct - flat_crossproduct - name: WorkflowStep type: record docParent: "#Workflow" doc: | A workflow step is an executable element of a workflow. It specifies the underlying process implementation (such as `CommandLineTool` or another `Workflow`) in the `run` field and connects the input and output parameters of the underlying process to workflow parameters. # Scatter/gather To use scatter/gather, [ScatterFeatureRequirement](#ScatterFeatureRequirement) must be specified in the workflow or workflow step requirements. A "scatter" operation specifies that the associated workflow step or subworkflow should execute separately over a list of input elements. Each job making up a scatter operation is independent and may be executed concurrently. The `scatter` field specifies one or more input parameters which will be scattered. An input parameter may be listed more than once. The declared type of each input parameter is implicitly wrapped in an array for each time it appears in the `scatter` field. As a result, upstream parameters which are connected to scattered parameters may be arrays. All output parameter types are also implicitly wrapped in arrays. Each job in the scatter results in an entry in the output array. If `scatter` declares more than one input parameter, `scatterMethod` describes how to decompose the input into a discrete set of jobs. * **dotproduct** specifies that each of the input arrays are aligned and one element taken from each array to construct each job. It is an error if all input arrays are not the same length. * **nested_crossproduct** specifies the Cartesian product of the inputs, producing a job for every combination of the scattered inputs. The output must be nested arrays for each level of scattering, in the order that the input arrays are listed in the `scatter` field. * **flat_crossproduct** specifies the Cartesian product of the inputs, producing a job for every combination of the scattered inputs. The output arrays must be flattened to a single level, but otherwise listed in the order that the input arrays are listed in the `scatter` field. # Subworkflows To specify a nested workflow as part of a workflow step, [SubworkflowFeatureRequirement](#SubworkflowFeatureRequirement) must be specified in the workflow or workflow step requirements. fields: - name: id type: string jsonldPredicate: "@id" doc: "The unique identifier for this workflow step." - name: in type: WorkflowStepInput[] jsonldPredicate: _id: "cwl:in" mapSubject: id mapPredicate: source doc: | Defines the input parameters of the workflow step. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used build a user interface for constructing the input object. - name: out type: - type: array items: [string, WorkflowStepOutput] jsonldPredicate: _id: "cwl:out" _type: "@id" identity: true doc: | Defines the parameters representing the output of the process. May be used to generate and/or validate the output object. - name: requirements type: ProcessRequirement[]? jsonldPredicate: _id: "cwl:requirements" mapSubject: class doc: | Declares requirements that apply to either the runtime environment or the workflow engine that must be met in order to execute this workflow step. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option. - name: hints type: Any[]? jsonldPredicate: _id: "cwl:hints" noLinkCheck: true mapSubject: class doc: | Declares hints applying to either the runtime environment or the workflow engine that may be helpful in executing this workflow step. It is not an error if an implementation cannot satisfy all hints, however the implementation may report a warning. - name: label type: string? jsonldPredicate: "rdfs:label" doc: "A short, human-readable label of this process object." - name: doc type: string? jsonldPredicate: "rdfs:comment" doc: "A long, human-readable description of this process object." - name: run type: [string, Process] jsonldPredicate: "_id": "cwl:run" "_type": "@id" doc: | Specifies the process to run. - name: scatter type: - string? - string[]? jsonldPredicate: "_id": "cwl:scatter" "_type": "@id" "_container": "@list" refScope: 0 - name: scatterMethod doc: | Required if `scatter` is an array of more than one element. type: ScatterMethod? jsonldPredicate: "_id": "cwl:scatterMethod" "_type": "@vocab" - name: Workflow type: record extends: "#Process" documentRoot: true specialize: - specializeFrom: "#OutputParameter" specializeTo: "#WorkflowOutputParameter" doc: | A workflow describes a set of **steps** and the **dependencies** between those steps. When a step produces output that will be consumed by a second step, the first step is a dependency of the second step. When there is a dependency, the workflow engine must execute the preceeding step and wait for it to successfully produce output before executing the dependent step. If two steps are defined in the workflow graph that are not directly or indirectly dependent, these steps are **independent**, and may execute in any order or execute concurrently. A workflow is complete when all steps have been executed. Dependencies between parameters are expressed using the `source` field on [workflow step input parameters](#WorkflowStepInput) and [workflow output parameters](#WorkflowOutputParameter). The `source` field expresses the dependency of one parameter on another such that when a value is associated with the parameter specified by `source`, that value is propagated to the destination parameter. When all data links inbound to a given step are fufilled, the step is ready to execute. ## Workflow success and failure A completed step must result in one of `success`, `temporaryFailure` or `permanentFailure` states. An implementation may choose to retry a step execution which resulted in `temporaryFailure`. An implementation may choose to either continue running other steps of a workflow, or terminate immediately upon `permanentFailure`. * If any step of a workflow execution results in `permanentFailure`, then the workflow status is `permanentFailure`. * If one or more steps result in `temporaryFailure` and all other steps complete `success` or are not executed, then the workflow status is `temporaryFailure`. * If all workflow steps are executed and complete with `success`, then the workflow status is `success`. # Extensions [ScatterFeatureRequirement](#ScatterFeatureRequirement) and [SubworkflowFeatureRequirement](#SubworkflowFeatureRequirement) are available as standard [extensions](#Extensions_and_Metadata) to core workflow semantics. fields: - name: "class" jsonldPredicate: "_id": "@type" "_type": "@vocab" type: string - name: steps doc: | The individual steps that make up the workflow. Each step is executed when all of its input data links are fufilled. An implementation may choose to execute the steps in a different order than listed and/or execute steps concurrently, provided that dependencies between steps are met. type: - type: array items: "#WorkflowStep" jsonldPredicate: mapSubject: id - type: record name: SubworkflowFeatureRequirement extends: ProcessRequirement doc: | Indicates that the workflow platform must support nested workflows in the `run` field of [WorkflowStep](#WorkflowStep). fields: - name: "class" type: "string" doc: "Always 'SubworkflowFeatureRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: ScatterFeatureRequirement type: record extends: ProcessRequirement doc: | Indicates that the workflow platform must support the `scatter` and `scatterMethod` fields of [WorkflowStep](#WorkflowStep). fields: - name: "class" type: "string" doc: "Always 'ScatterFeatureRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: MultipleInputFeatureRequirement type: record extends: ProcessRequirement doc: | Indicates that the workflow platform must support multiple inbound data links listed in the `source` field of [WorkflowStepInput](#WorkflowStepInput). fields: - name: "class" type: "string" doc: "Always 'MultipleInputFeatureRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - type: record name: StepInputExpressionRequirement extends: ProcessRequirement doc: | Indicate that the workflow platform must support the `valueFrom` field of [WorkflowStepInput](#WorkflowStepInput). fields: - name: "class" type: "string" doc: "Always 'StepInputExpressionRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/CommonWorkflowLanguage.yml0000644000175100017510000000032113025033471030766 0ustar peterpeter00000000000000$base: "https://w3id.org/cwl/cwl#" $namespaces: cwl: "https://w3id.org/cwl/cwl#" sld: "https://w3id.org/cwl/salad#" $graph: - $import: Process.yml - $import: CommandLineTool.yml - $import: Workflow.yml schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/test5.cwl0000644000175100017510000000011013025033471025363 0ustar peterpeter00000000000000class: Workflow inputs: foo: string outputs: bar: string steps: [12]schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/test9.cwl0000644000175100017510000000017413025033471025401 0ustar peterpeter00000000000000class: Workflow inputs: foo: string outputs: bar: string steps: step1: scatterMethod: 12 in: [] out: [out]schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/test4.cwl0000644000175100017510000000010213025033471025363 0ustar peterpeter00000000000000class: Workflow inputs: foo: string outputs: bar: 12 steps: []schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/test8.cwl0000644000175100017510000000017513025033471025401 0ustar peterpeter00000000000000class: Workflow inputs: foo: string outputs: bar: string steps: step1: scatterMethod: abc in: [] out: [out]schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/test7.cwl0000644000175100017510000000017713025033471025402 0ustar peterpeter00000000000000class: Workflow inputs: foo: string outputs: bar: string steps: step1: scatter_method: blub in: [] out: [out]schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/concepts.md0000644000175100017510000000000113025033471025747 0ustar peterpeter00000000000000 schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/test14.cwl0000644000175100017510000000026613066216423025464 0ustar peterpeter00000000000000cwlVersion: v1.0 class: CommandLineTool baseCommand: echo inputs: example_flag: type: boolean inputBinding: position: 1 prefix: -f outputs: example_flag: int schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/contrib.md0000644000175100017510000000000113025033471025571 0ustar peterpeter00000000000000 schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/test2.cwl0000644000175100017510000000002013025033471025360 0ustar peterpeter00000000000000class: xWorkflowschema-salad-2.6.20171201034858/schema_salad/tests/test_schema/test13.cwl0000644000175100017510000000042313066216423025456 0ustar peterpeter00000000000000cwlVersion: v1.0 class: Workflow inputs: example_flag: type: boolean inputBinding: position: 1 prefix: -f outputs: [] steps: example_flag: in: [] out: [] run: id: blah class: CommandLineTool inputs: [] outputs: []schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/test3.cwl0000644000175100017510000000010713025033471025367 0ustar peterpeter00000000000000class: Workflow inputs: foo: string outputs: bar: xstring steps: []schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/test16.cwl0000644000175100017510000000037013203345013025451 0ustar peterpeter00000000000000cwlVersion: v1.0 class: CommandLineTool baseCommand: echo inputs: message: type: string inputBinding: position: 1 posi outputs: hello_output: type: File outputBinding: glob: hello-out.txt stdout: hello-out.txt schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/intro.md0000644000175100017510000000000113025033471025264 0ustar peterpeter00000000000000 schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/test17.cwl0000644000175100017510000000032413203345013025451 0ustar peterpeter00000000000000class: CommandLineTool cwlVersion: v1.0 baseCommand: cowsay inputs: - id: input type: string? inputBinding: position: 0 outputs: - id: output type: string? outputBinding: {} - aa: moa schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/test19.cwl0000644000175100017510000000036313203345013025456 0ustar peterpeter00000000000000: aaa cwlVersion: v1.0 class: CommandLineTool baseCommand: echo inputs: message: type: string inputBinding: position: 1 outputs: hello_output: type: File outputBinding: glob: hello-out.txt stdout: hello-out.txt schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/metaschema_base.yml0000644000175100017510000000702413025033471027447 0ustar peterpeter00000000000000$base: "https://w3id.org/cwl/salad#" $namespaces: sld: "https://w3id.org/cwl/salad#" dct: "http://purl.org/dc/terms/" rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdfs: "http://www.w3.org/2000/01/rdf-schema#" xsd: "http://www.w3.org/2001/XMLSchema#" $graph: - name: PrimitiveType type: enum symbols: - "sld:null" - "xsd:boolean" - "xsd:int" - "xsd:long" - "xsd:float" - "xsd:double" - "xsd:string" doc: - | Salad data types are based on Avro schema declarations. Refer to the [Avro schema declaration documentation](https://avro.apache.org/docs/current/spec.html#schemas) for detailed information. - "null: no value" - "boolean: a binary value" - "int: 32-bit signed integer" - "long: 64-bit signed integer" - "float: single precision (32-bit) IEEE 754 floating-point number" - "double: double precision (64-bit) IEEE 754 floating-point number" - "string: Unicode character sequence" - name: Any type: enum symbols: ["#Any"] doc: | The **Any** type validates for any non-null value. - name: RecordField type: record doc: A field of a record. fields: - name: name type: string jsonldPredicate: "@id" doc: | The name of the field - name: doc type: string? doc: | A documentation string for this field jsonldPredicate: "rdfs:comment" - name: type type: - PrimitiveType - RecordSchema - EnumSchema - ArraySchema - string - type: array items: - PrimitiveType - RecordSchema - EnumSchema - ArraySchema - string jsonldPredicate: _id: sld:type _type: "@vocab" typeDSL: true refScope: 2 doc: | The field type - name: RecordSchema type: record fields: type: doc: "Must be `record`" type: name: Record_symbol type: enum symbols: - "sld:record" jsonldPredicate: _id: "sld:type" _type: "@vocab" typeDSL: true refScope: 2 fields: type: RecordField[]? jsonldPredicate: _id: sld:fields mapSubject: name mapPredicate: type doc: "Defines the fields of the record." - name: EnumSchema type: record doc: | Define an enumerated type. fields: type: doc: "Must be `enum`" type: name: Enum_symbol type: enum symbols: - "sld:enum" jsonldPredicate: _id: "sld:type" _type: "@vocab" typeDSL: true refScope: 2 symbols: type: string[] jsonldPredicate: _id: "sld:symbols" _type: "@id" identity: true doc: "Defines the set of valid symbols." - name: ArraySchema type: record fields: type: doc: "Must be `array`" type: name: Array_symbol type: enum symbols: - "sld:array" jsonldPredicate: _id: "sld:type" _type: "@vocab" typeDSL: true refScope: 2 items: type: - PrimitiveType - RecordSchema - EnumSchema - ArraySchema - string - type: array items: - PrimitiveType - RecordSchema - EnumSchema - ArraySchema - string jsonldPredicate: _id: "sld:items" _type: "@vocab" refScope: 2 doc: "Defines the type of the array elements." schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/test12.cwl0000644000175100017510000000042113066216423025453 0ustar peterpeter00000000000000cwlVersion: v1.0 class: CommandLineTool baseCommand: echo inputs: - id: example_flag type: boolean inputBinding: position: 1 prefix: -f - id: example_flag type: int inputBinding: position: 3 prefix: --example-string outputs: [] schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/CommandLineTool.yml0000644000175100017510000007353113203345013027374 0ustar peterpeter00000000000000$base: "https://w3id.org/cwl/cwl#" $namespaces: cwl: "https://w3id.org/cwl/cwl#" sld: "https://w3id.org/cwl/salad#" $graph: - name: CommandLineToolDoc type: documentation doc: - | # Common Workflow Language (CWL) Command Line Tool Description, v1.0 This version: * https://w3id.org/cwl/v1.0/ Current version: * https://w3id.org/cwl/ - "\n\n" - {$include: contrib.md} - "\n\n" - | # Abstract A Command Line Tool is a non-interactive executable program that reads some input, performs a computation, and terminates after producing some output. Command line programs are a flexible unit of code sharing and reuse, unfortunately the syntax and input/output semantics among command line programs is extremely heterogeneous. A common layer for describing the syntax and semantics of programs can reduce this incidental complexity by providing a consistent way to connect programs together. This specification defines the Common Workflow Language (CWL) Command Line Tool Description, a vendor-neutral standard for describing the syntax and input/output semantics of command line programs. - {$include: intro.md} - | ## Introduction to v1.0 This specification represents the first full release from the CWL group. Since draft-3, version 1.0 introduces the following changes and additions: * The [Directory](#Directory) type. * Syntax simplifcations: denoted by the `map<>` syntax. Example: inputs contains a list of items, each with an id. Now one can specify a mapping of that identifier to the corresponding `CommandInputParamater`. ``` inputs: - id: one type: string doc: First input parameter - id: two type: int doc: Second input parameter ``` can be ``` inputs: one: type: string doc: First input parameter two: type: int doc: Second input parameter ``` * [InitialWorkDirRequirement](#InitialWorkDirRequirement): list of files and subdirectories to be present in the output directory prior to execution. * Shortcuts for specifying the standard [output](#stdout) and/or [error](#stderr) streams as a (streamable) File output. * [SoftwareRequirement](#SoftwareRequirement) for describing software dependencies of a tool. * The common `description` field has been renamed to `doc`. ## Errata Post v1.0 release changes to the spec. * 13 July 2016: Mark `baseCommand` as optional and update descriptive text. ## Purpose Standalone programs are a flexible and interoperable form of code reuse. Unlike monolithic applications, applications and analysis workflows which are composed of multiple separate programs can be written in multiple languages and execute concurrently on multiple hosts. However, POSIX does not dictate computer-readable grammar or semantics for program input and output, resulting in extremely heterogeneous command line grammar and input/output semantics among program. This is a particular problem in distributed computing (multi-node compute clusters) and virtualized environments (such as Docker containers) where it is often necessary to provision resources such as input files before executing the program. Often this gap is filled by hard coding program invocation and implicitly assuming requirements will be met, or abstracting program invocation with wrapper scripts or descriptor documents. Unfortunately, where these approaches are application or platform specific it creates a significant barrier to reproducibility and portability, as methods developed for one platform must be manually ported to be used on new platforms. Similarly it creates redundant work, as wrappers for popular tools must be rewritten for each application or platform in use. The Common Workflow Language Command Line Tool Description is designed to provide a common standard description of grammar and semantics for invoking programs used in data-intensive fields such as Bioinformatics, Chemistry, Physics, Astronomy, and Statistics. This specification defines a precise data and execution model for Command Line Tools that can be implemented on a variety of computing platforms, ranging from a single workstation to cluster, grid, cloud, and high performance computing platforms. - {$include: concepts.md} - {$include: invocation.md} - type: record name: EnvironmentDef doc: | Define an environment variable that will be set in the runtime environment by the workflow platform when executing the command line tool. May be the result of executing an expression, such as getting a parameter from input. fields: - name: envName type: string doc: The environment variable name - name: envValue type: [string, Expression] doc: The environment variable value - type: record name: CommandLineBinding extends: InputBinding doc: | When listed under `inputBinding` in the input schema, the term "value" refers to the the corresponding value in the input object. For binding objects listed in `CommandLineTool.arguments`, the term "value" refers to the effective value after evaluating `valueFrom`. The binding behavior when building the command line depends on the data type of the value. If there is a mismatch between the type described by the input schema and the effective value, such as resulting from an expression evaluation, an implementation must use the data type of the effective value. - **string**: Add `prefix` and the string to the command line. - **number**: Add `prefix` and decimal representation to command line. - **boolean**: If true, add `prefix` to the command line. If false, add nothing. - **File**: Add `prefix` and the value of [`File.path`](#File) to the command line. - **array**: If `itemSeparator` is specified, add `prefix` and the join the array into a single string with `itemSeparator` separating the items. Otherwise first add `prefix`, then recursively process individual elements. - **object**: Add `prefix` only, and recursively add object fields for which `inputBinding` is specified. - **null**: Add nothing. fields: - name: position type: int? doc: "The sorting key. Default position is 0." - name: prefix type: string? doc: "Command line prefix to add before the value." - name: separate type: boolean? doc: | If true (default), then the prefix and value must be added as separate command line arguments; if false, prefix and value must be concatenated into a single command line argument. - name: itemSeparator type: string? doc: | Join the array elements into a single string with the elements separated by by `itemSeparator`. - name: valueFrom type: - "null" - string - Expression jsonldPredicate: "cwl:valueFrom" doc: | If `valueFrom` is a constant string value, use this as the value and apply the binding rules above. If `valueFrom` is an expression, evaluate the expression to yield the actual value to use to build the command line and apply the binding rules above. If the inputBinding is associated with an input parameter, the value of `self` in the expression will be the value of the input parameter. When a binding is part of the `CommandLineTool.arguments` field, the `valueFrom` field is required. - name: shellQuote type: boolean? doc: | If `ShellCommandRequirement` is in the requirements for the current command, this controls whether the value is quoted on the command line (default is true). Use `shellQuote: false` to inject metacharacters for operations such as pipes. - type: record name: CommandOutputBinding extends: OutputBinding doc: | Describes how to generate an output parameter based on the files produced by a CommandLineTool. The output parameter is generated by applying these operations in the following order: - glob - loadContents - outputEval fields: - name: glob type: - "null" - string - Expression - type: array items: string doc: | Find files relative to the output directory, using POSIX glob(3) pathname matching. If an array is provided, find files that match any pattern in the array. If an expression is provided, the expression must return a string or an array of strings, which will then be evaluated as one or more glob patterns. Must only match and return files which actually exist. - name: loadContents type: - "null" - boolean jsonldPredicate: "cwl:loadContents" doc: | For each file matched in `glob`, read up to the first 64 KiB of text from the file and place it in the `contents` field of the file object for manipulation by `outputEval`. - name: outputEval type: - "null" - string - Expression doc: | Evaluate an expression to generate the output value. If `glob` was specified, the value of `self` must be an array containing file objects that were matched. If no files were matched, `self` must be a zero length array; if a single file was matched, the value of `self` is an array of a single element. Additionally, if `loadContents` is `true`, the File objects must include up to the first 64 KiB of file contents in the `contents` field. - name: CommandInputRecordField type: record extends: InputRecordField specialize: - specializeFrom: InputRecordSchema specializeTo: CommandInputRecordSchema - specializeFrom: InputEnumSchema specializeTo: CommandInputEnumSchema - specializeFrom: InputArraySchema specializeTo: CommandInputArraySchema - specializeFrom: InputBinding specializeTo: CommandLineBinding - name: CommandInputRecordSchema type: record extends: InputRecordSchema specialize: - specializeFrom: InputRecordField specializeTo: CommandInputRecordField - name: CommandInputEnumSchema type: record extends: InputEnumSchema specialize: - specializeFrom: InputBinding specializeTo: CommandLineBinding - name: CommandInputArraySchema type: record extends: InputArraySchema specialize: - specializeFrom: InputRecordSchema specializeTo: CommandInputRecordSchema - specializeFrom: InputEnumSchema specializeTo: CommandInputEnumSchema - specializeFrom: InputArraySchema specializeTo: CommandInputArraySchema - specializeFrom: InputBinding specializeTo: CommandLineBinding - name: CommandOutputRecordField type: record extends: OutputRecordField specialize: - specializeFrom: OutputRecordSchema specializeTo: CommandOutputRecordSchema - specializeFrom: OutputEnumSchema specializeTo: CommandOutputEnumSchema - specializeFrom: OutputArraySchema specializeTo: CommandOutputArraySchema - specializeFrom: OutputBinding specializeTo: CommandOutputBinding - name: CommandOutputRecordSchema type: record extends: OutputRecordSchema specialize: - specializeFrom: OutputRecordField specializeTo: CommandOutputRecordField - name: CommandOutputEnumSchema type: record extends: OutputEnumSchema specialize: - specializeFrom: OutputRecordSchema specializeTo: CommandOutputRecordSchema - specializeFrom: OutputEnumSchema specializeTo: CommandOutputEnumSchema - specializeFrom: OutputArraySchema specializeTo: CommandOutputArraySchema - specializeFrom: OutputBinding specializeTo: CommandOutputBinding - name: CommandOutputArraySchema type: record extends: OutputArraySchema specialize: - specializeFrom: OutputRecordSchema specializeTo: CommandOutputRecordSchema - specializeFrom: OutputEnumSchema specializeTo: CommandOutputEnumSchema - specializeFrom: OutputArraySchema specializeTo: CommandOutputArraySchema - specializeFrom: OutputBinding specializeTo: CommandOutputBinding - type: record name: CommandInputParameter extends: InputParameter doc: An input parameter for a CommandLineTool. specialize: - specializeFrom: InputRecordSchema specializeTo: CommandInputRecordSchema - specializeFrom: InputEnumSchema specializeTo: CommandInputEnumSchema - specializeFrom: InputArraySchema specializeTo: CommandInputArraySchema - specializeFrom: InputBinding specializeTo: CommandLineBinding - type: record name: CommandOutputParameter extends: OutputParameter doc: An output parameter for a CommandLineTool. specialize: - specializeFrom: OutputBinding specializeTo: CommandOutputBinding fields: - name: type type: - "null" - CWLType - stdout - stderr - CommandOutputRecordSchema - CommandOutputEnumSchema - CommandOutputArraySchema - string - type: array items: - CWLType - CommandOutputRecordSchema - CommandOutputEnumSchema - CommandOutputArraySchema - string jsonldPredicate: "_id": "sld:type" "_type": "@vocab" refScope: 2 typeDSL: True doc: | Specify valid types of data that may be assigned to this parameter. - name: stdout type: enum symbols: [ "cwl:stdout" ] docParent: "#CommandOutputParameter" doc: | Only valid as a `type` for a `CommandLineTool` output with no `outputBinding` set. The following ``` outputs: an_output_name: type: stdout stdout: a_stdout_file ``` is equivalent to ``` outputs: an_output_name: type: File streamable: true outputBinding: glob: a_stdout_file stdout: a_stdout_file ``` If there is no `stdout` name provided, a random filename will be created. For example, the following ``` outputs: an_output_name: type: stdout ``` is equivalent to ``` outputs: an_output_name: type: File streamable: true outputBinding: glob: random_stdout_filenameABCDEFG stdout: random_stdout_filenameABCDEFG ``` - name: stderr type: enum symbols: [ "cwl:stderr" ] docParent: "#CommandOutputParameter" doc: | Only valid as a `type` for a `CommandLineTool` output with no `outputBinding` set. The following ``` outputs: an_output_name: type: stderr stderr: a_stderr_file ``` is equivalent to ``` outputs: an_output_name: type: File streamable: true outputBinding: glob: a_stderr_file stderr: a_stderr_file ``` If there is no `stderr` name provided, a random filename will be created. For example, the following ``` outputs: an_output_name: type: stderr ``` is equivalent to ``` outputs: an_output_name: type: File streamable: true outputBinding: glob: random_stderr_filenameABCDEFG stderr: random_stderr_filenameABCDEFG ``` - type: record name: CommandLineTool extends: Process documentRoot: true specialize: - specializeFrom: InputParameter specializeTo: CommandInputParameter - specializeFrom: OutputParameter specializeTo: CommandOutputParameter doc: | This defines the schema of the CWL Command Line Tool Description document. fields: - name: class jsonldPredicate: "_id": "@type" "_type": "@vocab" type: string - name: baseCommand doc: | Specifies the program to execute. If an array, the first element of the array is the command to execute, and subsequent elements are mandatory command line arguments. The elements in `baseCommand` must appear before any command line bindings from `inputBinding` or `arguments`. If `baseCommand` is not provided or is an empty array, the first element of the command line produced after processing `inputBinding` or `arguments` must be used as the program to execute. If the program includes a path separator character it must be an absolute path, otherwise it is an error. If the program does not include a path separator, search the `$PATH` variable in the runtime environment of the workflow runner find the absolute path of the executable. type: - string? - string[]? jsonldPredicate: "_id": "cwl:baseCommand" "_container": "@list" - name: arguments doc: | Command line bindings which are not directly associated with input parameters. type: - "null" - type: array items: [string, Expression, CommandLineBinding] jsonldPredicate: "_id": "cwl:arguments" "_container": "@list" - name: stdin type: ["null", string, Expression] doc: | A path to a file whose contents must be piped into the command's standard input stream. - name: stderr type: ["null", string, Expression] jsonldPredicate: "https://w3id.org/cwl/cwl#stderr" doc: | Capture the command's standard error stream to a file written to the designated output directory. If `stderr` is a string, it specifies the file name to use. If `stderr` is an expression, the expression is evaluated and must return a string with the file name to use to capture stderr. If the return value is not a string, or the resulting path contains illegal characters (such as the path separator `/`) it is an error. - name: stdout type: ["null", string, Expression] jsonldPredicate: "https://w3id.org/cwl/cwl#stdout" doc: | Capture the command's standard output stream to a file written to the designated output directory. If `stdout` is a string, it specifies the file name to use. If `stdout` is an expression, the expression is evaluated and must return a string with the file name to use to capture stdout. If the return value is not a string, or the resulting path contains illegal characters (such as the path separator `/`) it is an error. - name: successCodes type: int[]? doc: | Exit codes that indicate the process completed successfully. - name: temporaryFailCodes type: int[]? doc: | Exit codes that indicate the process failed due to a possibly temporary condition, where executing the process with the same runtime environment and inputs may produce different results. - name: permanentFailCodes type: int[]? doc: Exit codes that indicate the process failed due to a permanent logic error, where executing the process with the same runtime environment and same inputs is expected to always fail. - type: record name: DockerRequirement extends: ProcessRequirement doc: | Indicates that a workflow component should be run in a [Docker](http://docker.com) container, and specifies how to fetch or build the image. If a CommandLineTool lists `DockerRequirement` under `hints` (or `requirements`), it may (or must) be run in the specified Docker container. The platform must first acquire or install the correct Docker image as specified by `dockerPull`, `dockerImport`, `dockerLoad` or `dockerFile`. The platform must execute the tool in the container using `docker run` with the appropriate Docker image and tool command line. The workflow platform may provide input files and the designated output directory through the use of volume bind mounts. The platform may rewrite file paths in the input object to correspond to the Docker bind mounted locations. When running a tool contained in Docker, the workflow platform must not assume anything about the contents of the Docker container, such as the presence or absence of specific software, except to assume that the generated command line represents a valid command within the runtime environment of the container. ## Interaction with other requirements If [EnvVarRequirement](#EnvVarRequirement) is specified alongside a DockerRequirement, the environment variables must be provided to Docker using `--env` or `--env-file` and interact with the container's preexisting environment as defined by Docker. fields: - name: class type: string doc: "Always 'DockerRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: dockerPull type: string? doc: "Specify a Docker image to retrieve using `docker pull`." - name: dockerLoad type: string? doc: "Specify a HTTP URL from which to download a Docker image using `docker load`." - name: dockerFile type: string? doc: "Supply the contents of a Dockerfile which will be built using `docker build`." - name: dockerImport type: string? doc: "Provide HTTP URL to download and gunzip a Docker images using `docker import." - name: dockerImageId type: string? doc: | The image id that will be used for `docker run`. May be a human-readable image name or the image identifier hash. May be skipped if `dockerPull` is specified, in which case the `dockerPull` image id must be used. - name: dockerOutputDirectory type: string? doc: | Set the designated output directory to a specific location inside the Docker container. - type: record name: SoftwareRequirement extends: ProcessRequirement doc: | A list of software packages that should be configured in the environment of the defined process. fields: - name: class type: string doc: "Always 'SoftwareRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: packages type: SoftwarePackage[] doc: "The list of software to be configured." jsonldPredicate: mapSubject: package mapPredicate: specs - name: SoftwarePackage type: record fields: - name: package type: string doc: "The common name of the software to be configured." - name: version type: string[]? doc: "The (optional) version of the software to configured." - name: specs type: string[]? doc: | Must be one or more IRIs identifying resources for installing or enabling the software. Implementations may provide resolvers which map well-known software spec IRIs to some configuration action. For example, an IRI `https://packages.debian.org/jessie/bowtie` could be resolved with `apt-get install bowtie`. An IRI `https://anaconda.org/bioconda/bowtie` could be resolved with `conda install -c bioconda bowtie`. Tools may also provide IRIs to index entries such as [RRID](http://www.identifiers.org/rrid/), such as `http://identifiers.org/rrid/RRID:SCR_005476` - name: Dirent type: record doc: | Define a file or subdirectory that must be placed in the designated output directory prior to executing the command line tool. May be the result of executing an expression, such as building a configuration file from a template. fields: - name: entryname type: ["null", string, Expression] jsonldPredicate: _id: cwl:entryname doc: | The name of the file or subdirectory to create in the output directory. If `entry` is a File or Directory, this overrides `basename`. Optional. - name: entry type: [string, Expression] jsonldPredicate: _id: cwl:entry doc: | If the value is a string literal or an expression which evaluates to a string, a new file must be created with the string as the file contents. If the value is an expression that evaluates to a `File` object, this indicates the referenced file should be added to the designated output directory prior to executing the tool. If the value is an expression that evaluates to a `Dirent` object, this indicates that the File or Directory in `entry` should be added to the designated output directory with the name in `entryname`. If `writable` is false, the file may be made available using a bind mount or file system link to avoid unnecessary copying of the input file. - name: writable type: boolean? doc: | If true, the file or directory must be writable by the tool. Changes to the file or directory must be isolated and not visible by any other CommandLineTool process. This may be implemented by making a copy of the original file or directory. Default false (files and directories read-only by default). - name: InitialWorkDirRequirement type: record extends: ProcessRequirement doc: Define a list of files and subdirectories that must be created by the workflow platform in the designated output directory prior to executing the command line tool. fields: - name: class type: string doc: InitialWorkDirRequirement jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: listing type: - type: array items: [File, Directory, Dirent, string, Expression] - string - Expression jsonldPredicate: _id: "cwl:listing" doc: | The list of files or subdirectories that must be placed in the designated output directory prior to executing the command line tool. May be an expression. If so, the expression return value must validate as `{type: array, items: [File, Directory]}`. - name: EnvVarRequirement type: record extends: ProcessRequirement doc: | Define a list of environment variables which will be set in the execution environment of the tool. See `EnvironmentDef` for details. fields: - name: class type: string doc: "Always 'EnvVarRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: envDef type: EnvironmentDef[] doc: The list of environment variables. jsonldPredicate: mapSubject: envName mapPredicate: envValue - type: record name: ShellCommandRequirement extends: ProcessRequirement doc: | Modify the behavior of CommandLineTool to generate a single string containing a shell command line. Each item in the argument list must be joined into a string separated by single spaces and quoted to prevent intepretation by the shell, unless `CommandLineBinding` for that argument contains `shellQuote: false`. If `shellQuote: false` is specified, the argument is joined into the command string without quoting, which allows the use of shell metacharacters such as `|` for pipes. fields: - name: class type: string doc: "Always 'ShellCommandRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - type: record name: ResourceRequirement extends: ProcessRequirement doc: | Specify basic hardware resource requirements. "min" is the minimum amount of a resource that must be reserved to schedule a job. If "min" cannot be satisfied, the job should not be run. "max" is the maximum amount of a resource that the job shall be permitted to use. If a node has sufficient resources, multiple jobs may be scheduled on a single node provided each job's "max" resource requirements are met. If a job attempts to exceed its "max" resource allocation, an implementation may deny additional resources, which may result in job failure. If "min" is specified but "max" is not, then "max" == "min" If "max" is specified by "min" is not, then "min" == "max". It is an error if max < min. It is an error if the value of any of these fields is negative. If neither "min" nor "max" is specified for a resource, an implementation may provide a default. fields: - name: class type: string doc: "Always 'ResourceRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: coresMin type: ["null", long, string, Expression] doc: Minimum reserved number of CPU cores - name: coresMax type: ["null", int, string, Expression] doc: Maximum reserved number of CPU cores - name: ramMin type: ["null", long, string, Expression] doc: Minimum reserved RAM in mebibytes (2**20) - name: ramMax type: ["null", long, string, Expression] doc: Maximum reserved RAM in mebibytes (2**20) - name: tmpdirMin type: ["null", long, string, Expression] doc: Minimum reserved filesystem based storage for the designated temporary directory, in mebibytes (2**20) - name: tmpdirMax type: ["null", long, string, Expression] doc: Maximum reserved filesystem based storage for the designated temporary directory, in mebibytes (2**20) - name: outdirMin type: ["null", long, string, Expression] doc: Minimum reserved filesystem based storage for the designated output directory, in mebibytes (2**20) - name: outdirMax type: ["null", long, string, Expression] doc: Maximum reserved filesystem based storage for the designated output directory, in mebibytes (2**20) schema-salad-2.6.20171201034858/schema_salad/tests/test_schema/test11.cwl0000644000175100017510000000017013025033471025446 0ustar peterpeter00000000000000class: Workflow inputs: foo: string outputs: bar: string steps: step1: run: blub.cwl in: [] out: [out]schema-salad-2.6.20171201034858/schema_salad/tests/frag.yml0000644000175100017510000000005113030536526022764 0ustar peterpeter00000000000000- id: foo1 bar: b1 - id: foo2 bar: b2schema-salad-2.6.20171201034858/schema_salad/tests/.coverage0000644000175100017510000000157113165562750023135 0ustar peterpeter00000000000000!coverage.py: This is a private format, don't read it directly!{"lines": {"/home/peter/work/salad/schema_salad/validate.py": [1, 2, 3, 4, 5, 6, 7, 9, 10, 12, 13, 15, 19, 20, 21, 22, 25, 26, 27, 28, 29, 30, 31, 32, 33, 37, 38, 39, 41, 43, 44, 48, 51, 52, 54, 56, 57, 58, 60, 63, 64, 65, 66, 72, 73, 74, 75, 79, 80, 82, 83, 91, 92, 93, 94, 100, 109, 118, 119, 127, 128, 129, 131, 132, 133, 135, 136, 137, 138, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 155, 157, 158, 160, 161, 162, 163, 164, 166, 167, 169, 170, 171, 172, 173, 174, 176, 177, 178, 179, 181, 182, 183, 184, 186, 187, 188, 189, 191, 193, 194, 195, 196, 198, 200, 201, 202, 203, 204, 205, 206, 208, 209, 210, 211, 212, 214, 215, 216, 217, 219, 220, 222, 223, 227, 228, 229, 230, 231, 232, 233, 234, 236, 237, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 250, 251, 254, 256, 257, 258, 259, 261, 265]}}schema-salad-2.6.20171201034858/schema_salad/tests/test_validate.pyx0000644000175100017510000000415013165562750024727 0ustar peterpeter00000000000000import unittest import json from schema_salad.schema import load_schema from schema_salad.validate import validate_ex from schema_salad.sourceline import cmap class TestValidate(unittest.TestCase): schema = cmap({"name": "_", "$graph":[{ "name": "File", "type": "record", "fields": [{ "name": "class", "type": { "type": "enum", "name": "File_class", "symbols": ["#_/File"] }, "jsonldPredicate": { "_id": "@type", "_type": "@vocab" } }, { "name": "location", "type": "string", "jsonldPredicate": "_:location" }] }, { "name": "Directory", "type": "record", "fields": [{ "name": "class", "type": { "type": "enum", "name": "Directory_class", "symbols": ["#_/Directory"] }, "jsonldPredicate": { "_id": "@type", "_type": "@vocab" } }, { "name": "location", "type": "string", "jsonldPredicate": "_:location" }, { "name": "listing", "type": { "type": "array", "items": ["File", "Directory"] } }], }]}) def test_validate_big(self): document_loader, avsc_names, schema_metadata, metaschema_loader = load_schema(self.schema) with open("biglisting.yml") as f: biglisting = json.load(f) self.assertEquals(True, validate_ex(avsc_names.get_name("Directory", ""), biglisting, strict=True, raise_ex=False)) # def test_validate_small(self): # document_loader, avsc_names, schema_metadata, metaschema_loader = load_schema(self.schema) # with open("smalllisting.yml") as f: # smalllisting = json.load(f) # validate_ex(avsc_names.get_name("Directory", ""), smalllisting, # strict=True, raise_ex=True) schema-salad-2.6.20171201034858/schema_salad/tests/df0000644000175100017510000000015613165562750021653 0ustar peterpeter00000000000000........... ---------------------------------------------------------------------- Ran 11 tests in 0.593s OK schema-salad-2.6.20171201034858/schema_salad/tests/test_fetch.py0000644000175100017510000000433113130233260024017 0ustar peterpeter00000000000000from __future__ import absolute_import from __future__ import print_function import unittest import schema_salad.ref_resolver import schema_salad.main import schema_salad.schema from schema_salad.jsonld_context import makerdf import rdflib import ruamel.yaml as yaml import json import os from typing import Text from six.moves import urllib class TestFetcher(unittest.TestCase): def test_fetcher(self): class TestFetcher(schema_salad.ref_resolver.Fetcher): def __init__(self, a, b): pass def fetch_text(self, url): # type: (Text) -> Text if url == "keep:abc+123/foo.txt": return "hello: keepfoo" if url.endswith("foo.txt"): return "hello: foo" else: raise RuntimeError("Not foo.txt") def check_exists(self, url): # type: (Text) -> bool if url.endswith("foo.txt"): return True else: return False def urljoin(self, base, url): urlsp = urllib.parse.urlsplit(url) if urlsp.scheme: return url basesp = urllib.parse.urlsplit(base) if basesp.scheme == "keep": return base + "/" + url return urllib.parse.urljoin(base, url) loader = schema_salad.ref_resolver.Loader({}, fetcher_constructor=TestFetcher) self.assertEqual({"hello": "foo"}, loader.resolve_ref("foo.txt")[0]) self.assertEqual({"hello": "keepfoo"}, loader.resolve_ref("foo.txt", base_url="keep:abc+123")[0]) self.assertTrue(loader.check_exists("foo.txt")) with self.assertRaises(RuntimeError): loader.resolve_ref("bar.txt") self.assertFalse(loader.check_exists("bar.txt")) def test_cache(self): loader = schema_salad.ref_resolver.Loader({}) foo = os.path.join(os.getcwd(), "foo.txt") foo = schema_salad.ref_resolver.file_uri(foo) loader.cache.update({foo: "hello: foo"}) print(loader.cache) self.assertEqual({"hello": "foo"}, loader.resolve_ref("foo.txt")[0]) self.assertTrue(loader.check_exists(foo)) schema-salad-2.6.20171201034858/schema_salad/tests/test_ref_resolver.py0000644000175100017510000001346013205361073025435 0ustar peterpeter00000000000000"""Test the ref_resolver module.""" from __future__ import absolute_import import shutil import tempfile import pytest # type: ignore @pytest.fixture def tmp_dir_fixture(request): d = tempfile.mkdtemp() @request.addfinalizer def teardown(): shutil.rmtree(d) return d def test_Loader_initialisation_for_HOME_env_var(tmp_dir_fixture): import os from schema_salad.ref_resolver import Loader from requests import Session # Ensure HOME is set. os.environ["HOME"] = tmp_dir_fixture loader = Loader(ctx={}) assert isinstance(loader.session, Session) def test_Loader_initialisation_for_TMP_env_var(tmp_dir_fixture): import os from schema_salad.ref_resolver import Loader from requests import Session # Ensure HOME is missing. if "HOME" in os.environ: del os.environ["HOME"] # Ensure TMP is present. os.environ["TMP"] = tmp_dir_fixture loader = Loader(ctx={}) assert isinstance(loader.session, Session) def test_Loader_initialisation_with_neither_TMP_HOME_set(tmp_dir_fixture): import os from schema_salad.ref_resolver import Loader from requests import Session # Ensure HOME is missing. if "HOME" in os.environ: del os.environ["HOME"] if "TMP" in os.environ: del os.environ["TMP"] loader = Loader(ctx={}) assert isinstance(loader.session, Session) def test_DefaultFetcher_urljoin_win32(tmp_dir_fixture): import os import sys from schema_salad.ref_resolver import DefaultFetcher from requests import Session # Ensure HOME is set. os.environ["HOME"] = tmp_dir_fixture actual_platform = sys.platform try: # For this test always pretend we're on Windows sys.platform = "win32" fetcher = DefaultFetcher({}, None) # Relative path, same folder url = fetcher.urljoin("file:///C:/Users/fred/foo.cwl", "soup.cwl") assert url == "file:///C:/Users/fred/soup.cwl" # Relative path, sub folder url = fetcher.urljoin("file:///C:/Users/fred/foo.cwl", "foo/soup.cwl") assert url == "file:///C:/Users/fred/foo/soup.cwl" # relative climb-up path url = fetcher.urljoin("file:///C:/Users/fred/foo.cwl", "../alice/soup.cwl") assert url == "file:///C:/Users/alice/soup.cwl" # Path with drive: should not be treated as relative to directory # Note: \ would already have been converted to / by resolve_ref() url = fetcher.urljoin("file:///C:/Users/fred/foo.cwl", "c:/bar/soup.cwl") assert url == "file:///c:/bar/soup.cwl" # /C:/ (regular URI absolute path) url = fetcher.urljoin("file:///C:/Users/fred/foo.cwl", "/c:/bar/soup.cwl") assert url == "file:///c:/bar/soup.cwl" # Relative, change drive url = fetcher.urljoin("file:///C:/Users/fred/foo.cwl", "D:/baz/soup.cwl") assert url == "file:///d:/baz/soup.cwl" # Relative from root of base's D: drive url = fetcher.urljoin("file:///d:/baz/soup.cwl", "/foo/soup.cwl") assert url == "file:///d:/foo/soup.cwl" # resolving absolute non-drive URIs still works url = fetcher.urljoin("file:///C:/Users/fred/foo.cwl", "http://example.com/bar/soup.cwl") assert url == "http://example.com/bar/soup.cwl" # and of course relative paths from http:// url = fetcher.urljoin("http://example.com/fred/foo.cwl", "soup.cwl") assert url == "http://example.com/fred/soup.cwl" # Stay on http:// and same host url = fetcher.urljoin("http://example.com/fred/foo.cwl", "/bar/soup.cwl") assert url == "http://example.com/bar/soup.cwl" # Security concern - can't resolve file: from http: with pytest.raises(ValueError): url = fetcher.urljoin("http://example.com/fred/foo.cwl", "file:///c:/bar/soup.cwl") # Drive-relative -- should NOT return "absolute" URI c:/bar/soup.cwl" # as that is a potential remote exploit with pytest.raises(ValueError): url = fetcher.urljoin("http://example.com/fred/foo.cwl", "c:/bar/soup.cwl") finally: sys.platform = actual_platform def test_DefaultFetcher_urljoin_linux(tmp_dir_fixture): import os import sys from schema_salad.ref_resolver import DefaultFetcher from requests import Session # Ensure HOME is set. os.environ["HOME"] = tmp_dir_fixture actual_platform = sys.platform try: # Pretend it's Linux (e.g. not win32) sys.platform = "linux2" fetcher = DefaultFetcher({}, None) url = fetcher.urljoin("file:///home/fred/foo.cwl", "soup.cwl") assert url == "file:///home/fred/soup.cwl" url = fetcher.urljoin("file:///home/fred/foo.cwl", "../alice/soup.cwl") assert url == "file:///home/alice/soup.cwl" # relative from root url = fetcher.urljoin("file:///home/fred/foo.cwl", "/baz/soup.cwl") assert url == "file:///baz/soup.cwl" url = fetcher.urljoin("file:///home/fred/foo.cwl", "http://example.com/bar/soup.cwl") assert url == "http://example.com/bar/soup.cwl" url = fetcher.urljoin("http://example.com/fred/foo.cwl", "soup.cwl") assert url == "http://example.com/fred/soup.cwl" # Root-relative -- here relative to http host, not file:/// url = fetcher.urljoin("http://example.com/fred/foo.cwl", "/bar/soup.cwl") assert url == "http://example.com/bar/soup.cwl" # Security concern - can't resolve file: from http: with pytest.raises(ValueError): url = fetcher.urljoin("http://example.com/fred/foo.cwl", "file:///bar/soup.cwl") # But this one is not "dangerous" on Linux fetcher.urljoin("http://example.com/fred/foo.cwl", "c:/bar/soup.cwl") finally: sys.platform = actual_platform def test_link_checking(tmp_dir_fixture): pass schema-salad-2.6.20171201034858/schema_salad/tests/#cg_metaschema.py#0000644000175100017510000021433113165562750024601 0ustar peterpeter00000000000000from __future__ import absolute_import import ruamel.yaml from ruamel.yaml.comments import CommentedBase, CommentedMap, CommentedSeq import re import os import traceback from typing import (Any, AnyStr, Callable, cast, Dict, List, Iterable, Tuple, TypeVar, Union, Text) import six lineno_re = re.compile(u"^(.*?:[0-9]+:[0-9]+: )(( *)(.*))") def _add_lc_filename(r, source): # type: (ruamel.yaml.comments.CommentedBase, AnyStr) -> None if isinstance(r, ruamel.yaml.comments.CommentedBase): r.lc.filename = source if isinstance(r, list): for d in r: _add_lc_filename(d, source) elif isinstance(r, dict): for d in six.itervalues(r): _add_lc_filename(d, source) def relname(source): # type: (Text) -> Text if source.startswith("file://"): source = source[7:] source = os.path.relpath(source) return source def add_lc_filename(r, source): # type: (ruamel.yaml.comments.CommentedBase, Text) -> None _add_lc_filename(r, relname(source)) def reflow(text, maxline, shift=""): # type: (Text, int, Text) -> Text if maxline < 20: maxline = 20 if len(text) > maxline: sp = text.rfind(' ', 0, maxline) if sp < 1: sp = text.find(' ', sp+1) if sp == -1: sp = len(text) if sp < len(text): return "%s\n%s%s" % (text[0:sp], shift, reflow(text[sp+1:], maxline, shift)) return text def indent(v, nolead=False, shift=u" ", bullet=u" "): # type: (Text, bool, Text, Text) -> Text if nolead: return v.splitlines()[0] + u"\n".join([shift + l for l in v.splitlines()[1:]]) else: def lineno(i, l): # type: (int, Text) -> Text r = lineno_re.match(l) if bool(r): return r.group(1) + (bullet if i == 0 else shift) + r.group(2) else: return (bullet if i == 0 else shift) + l return u"\n".join([lineno(i, l) for i, l in enumerate(v.splitlines())]) def bullets(textlist, bul): # type: (List[Text], Text) -> Text if len(textlist) == 1: return textlist[0] else: return "\n".join(indent(t, bullet=bul) for t in textlist) def strip_dup_lineno(text, maxline=None): # type: (Text, int) -> Text if maxline is None: maxline = int(os.environ.get("COLUMNS", "100")) pre = None msg = [] for l in text.splitlines(): g = lineno_re.match(l) if not g: msg.append(l) continue shift = len(g.group(1)) + len(g.group(3)) g2 = reflow(g.group(2), maxline-shift, " " * shift) if g.group(1) != pre: pre = g.group(1) msg.append(pre + g2) else: g2 = reflow(g.group(2), maxline-len(g.group(1)), " " * (len(g.group(1))+len(g.group(3)))) msg.append(" " * len(g.group(1)) + g2) return "\n".join(msg) def cmap(d, lc=None, fn=None): # type: (Union[int, float, str, Text, Dict, List], List[int], Text) -> Union[int, float, str, Text, CommentedMap, CommentedSeq] if lc is None: lc = [0, 0, 0, 0] if fn is None: fn = "test" if isinstance(d, CommentedMap): fn = d.lc.filename if hasattr(d.lc, "filename") else fn for k,v in six.iteritems(d): if k in d.lc.data: d[k] = cmap(v, lc=d.lc.data[k], fn=fn) else: d[k] = cmap(v, lc, fn=fn) return d if isinstance(d, CommentedSeq): fn = d.lc.filename if hasattr(d.lc, "filename") else fn for k,v in enumerate(d): if k in d.lc.data: d[k] = cmap(v, lc=d.lc.data[k], fn=fn) else: d[k] = cmap(v, lc, fn=fn) return d if isinstance(d, dict): cm = CommentedMap() for k in sorted(d.keys()): v = d[k] if isinstance(v, CommentedBase): uselc = [v.lc.line, v.lc.col, v.lc.line, v.lc.col] vfn = v.lc.filename if hasattr(v.lc, "filename") else fn else: uselc = lc vfn = fn cm[k] = cmap(v, lc=uselc, fn=vfn) cm.lc.add_kv_line_col(k, uselc) cm.lc.filename = fn return cm if isinstance(d, list): cs = CommentedSeq() for k,v in enumerate(d): if isinstance(v, CommentedBase): uselc = [v.lc.line, v.lc.col, v.lc.line, v.lc.col] vfn = v.lc.filename if hasattr(v.lc, "filename") else fn else: uselc = lc vfn = fn cs.append(cmap(v, lc=uselc, fn=vfn)) cs.lc.add_kv_line_col(k, uselc) cs.lc.filename = fn return cs else: return d class SourceLine(object): def __init__(self, item, key=None, raise_type=six.text_type, include_traceback=False): # type: (Any, Any, Callable, bool) -> None self.item = item self.key = key self.raise_type = raise_type self.include_traceback = include_traceback def __enter__(self): # type: () -> SourceLine return self def __exit__(self, exc_type, # type: Any exc_value, # type: Any tb # type: Any ): # -> Any if not exc_value: return if self.include_traceback: raise self.makeError("\n".join(traceback.format_exception(exc_type, exc_value, tb))) else: raise self.makeError(six.text_type(exc_value)) def makeLead(self): # type: () -> Text if self.key is None or self.item.lc.data is None or self.key not in self.item.lc.data: return "%s:%i:%i:" % (self.item.lc.filename if hasattr(self.item.lc, "filename") else "", (self.item.lc.line or 0)+1, (self.item.lc.col or 0)+1) else: return "%s:%i:%i:" % (self.item.lc.filename if hasattr(self.item.lc, "filename") else "", (self.item.lc.data[self.key][0] or 0)+1, (self.item.lc.data[self.key][1] or 0)+1) def makeError(self, msg): # type: (Text) -> Any if not isinstance(self.item, ruamel.yaml.comments.CommentedBase): return self.raise_type(msg) errs = [] lead = self.makeLead() for m in msg.splitlines(): if bool(lineno_re.match(m)): errs.append(m) else: errs.append("%s %s" % (lead, m)) return self.raise_type("\n".join(errs)) from types import NoneType from six.moves import urllib import ruamel.yaml as yaml from StringIO import StringIO import copy class ValidationException(Exception): pass class Savable(object): pass class LoadingOptions(object): def __init__(self, fetcher=None, namespaces=None, fileuri=None, copyfrom=None): if copyfrom is not None: self.idx = copyfrom.idx if fetcher is None: fetcher = copyfrom.fetcher if fileuri is None: fileuri = copyfrom.fileuri else: self.idx = {} if fetcher is None: import os import requests from cachecontrol.wrapper import CacheControl from cachecontrol.caches import FileCache from schema_salad.ref_resolver import DefaultFetcher if "HOME" in os.environ: session = CacheControl( requests.Session(), cache=FileCache(os.path.join(os.environ["HOME"], ".cache", "salad"))) elif "TMP" in os.environ: session = CacheControl( requests.Session(), cache=FileCache(os.path.join(os.environ["TMP"], ".cache", "salad"))) else: session = CacheControl( requests.Session(), cache=FileCache("/tmp", ".cache", "salad")) self.fetcher = DefaultFetcher({}, session) else: self.fetcher = fetcher self.fileuri = fileuri self.vocab = _vocab self.rvocab = _rvocab if namespaces is not None: self.vocab = self.vocab.copy() self.rvocab = self.rvocab.copy() for k,v in namespaces.iteritems(): self.vocab[k] = v self.rvocab[v] = k def load_field(val, fieldtype, baseuri, loadingOptions): if isinstance(val, dict): if "$import" in val: return _document_load_by_url(fieldtype, loadingOptions.fetcher.urljoin(loadingOptions.fileuri, val["$import"]), loadingOptions) elif "$include" in val: val = loadingOptions.fetcher.fetch_text(loadingOptions.fetcher.urljoin(loadingOptions.fileuri, val["$include"])) return fieldtype.load(val, baseuri, loadingOptions) def save(val): if isinstance(val, Savable): return val.save() if isinstance(val, list): return [save(v) for v in val] return val def expand_url(url, # type: Text base_url, # type: Text loadingOptions, scoped_id=False, # type: bool vocab_term=False, # type: bool scoped_ref=None # type: int ): # type: (...) -> Text if url in (u"@id", u"@type"): return url if vocab_term and url in loadingOptions.vocab: return url if bool(loadingOptions.vocab) and u":" in url: prefix = url.split(u":")[0] if prefix in loadingOptions.vocab: url = loadingOptions.vocab[prefix] + url[len(prefix) + 1:] split = urllib.parse.urlsplit(url) if ((bool(split.scheme) and split.scheme in [u'http', u'https', u'file']) or url.startswith(u"$(") or url.startswith(u"${")): pass elif scoped_id and not bool(split.fragment): splitbase = urllib.parse.urlsplit(base_url) frg = u"" if bool(splitbase.fragment): frg = splitbase.fragment + u"/" + split.path else: frg = split.path pt = splitbase.path if splitbase.path != '' else "/" url = urllib.parse.urlunsplit( (splitbase.scheme, splitbase.netloc, pt, splitbase.query, frg)) elif scoped_ref is not None and not split.fragment: pass else: url = loadingOptions.fetcher.urljoin(base_url, url) if vocab_term: if bool(split.scheme): if url in loadingOptions.rvocab: return loadingOptions.rvocab[url] else: raise ValidationException("Term '%s' not in vocabulary" % url) return url class _Loader(object): def load(self, doc, baseuri, loadingOptions, docRoot=None): pass class _PrimitiveLoader(_Loader): def __init__(self, tp): self.tp = tp def load(self, doc, baseuri, loadingOptions, docRoot=None): if not isinstance(doc, self.tp): raise ValidationException("Expected a %s but got %s" % (self.tp, type(doc))) return doc def __repr__(self): return str(self.tp) class _ArrayLoader(_Loader): def __init__(self, items): self.items = items def load(self, doc, baseuri, loadingOptions, docRoot=None): if not isinstance(doc, list): raise ValidationException("Expected a list") r = [] errors = [] for i in xrange(0, len(doc)): try: lf = load_field(doc[i], _UnionLoader((self, self.items)), baseuri, loadingOptions) if isinstance(lf, list): r.extend(lf) else: r.append(lf) except ValidationException as e: errors.append(SourceLine(doc, i, str).makeError(six.text_type(e))) if errors: raise ValidationException("\n".join(errors)) return r def __repr__(self): return "array<%s>" % self.items class _EnumLoader(_Loader): def __init__(self, symbols): self.symbols = symbols def load(self, doc, baseuri, loadingOptions, docRoot=None): if doc in self.symbols: return doc else: raise ValidationException("Expected one of %s" % (self.symbols,)) class _RecordLoader(_Loader): def __init__(self, classtype): self.classtype = classtype def load(self, doc, baseuri, loadingOptions, docRoot=None): if not isinstance(doc, dict): raise ValidationException("Expected a dict") return self.classtype(doc, baseuri, loadingOptions, docRoot=docRoot) def __repr__(self): return str(self.classtype) class _UnionLoader(_Loader): def __init__(self, alternates): self.alternates = alternates def load(self, doc, baseuri, loadingOptions, docRoot=None): errors = [] for t in self.alternates: try: return t.load(doc, baseuri, loadingOptions, docRoot=docRoot) except ValidationException as e: errors.append("tried %s but\n%s" % (t, indent(str(e)))) raise ValidationException(bullets(errors, "- ")) def __repr__(self): return " | ".join(str(a) for a in self.alternates) class _URILoader(_Loader): def __init__(self, inner, scoped_id, vocab_term, scoped_ref): self.inner = inner self.scoped_id = scoped_id self.vocab_term = vocab_term self.scoped_ref = scoped_ref def load(self, doc, baseuri, loadingOptions, docRoot=None): if isinstance(doc, list): return [self.load(i, baseuri, loadingOptions) for i in doc] if isinstance(doc, basestring): return expand_url(doc, baseuri, loadingOptions, self.scoped_id, self.vocab_term, self.scoped_ref) return self.inner.load(doc, baseuri, loadingOptions) class _TypeDSLLoader(_Loader): def __init__(self, inner): self.inner = inner def load(self, doc, baseuri, loadingOptions, docRoot=None): return self.inner.load(doc, baseuri, loadingOptions) class _IdMapLoader(_Loader): def __init__(self, inner, mapSubject, mapPredicate): self.inner = inner self.mapSubject = mapSubject self.mapPredicate = mapPredicate def load(self, doc, baseuri, loadingOptions, docRoot=None): if isinstance(doc, dict): r = [] for k in sorted(doc.keys()): val = doc[k] if isinstance(val, dict): v = copy.copy(val) if hasattr(val, 'lc'): v.lc.data = val.lc.data v.lc.filename = val.lc.filename else: if self.mapPredicate: v = {self.mapPredicate: val} else: raise ValidationException("No mapPredicate") v[self.mapSubject] = k r.append(v) doc = r return self.inner.load(doc, baseuri, loadingOptions) def _document_load(loader, doc, baseuri, loadingOptions): if isinstance(doc, basestring): return _document_load_by_url(loader, doc, loadingOptions) if isinstance(doc, dict): if "$namespaces" in doc: loadingOptions = LoadingOptions(copyfrom=loadingOptions, namespaces=doc["$namespaces"]) if "$base" in doc: baseuri = doc["$base"] if "$graph" in doc: return loader.load(doc["$graph"], baseuri, loadingOptions) else: return loader.load(doc, baseuri, loadingOptions, docRoot=baseuri) if isinstance(doc, list): return loader.load(doc, baseuri, loadingOptions) raise ValidationException() def _document_load_by_url(loader, url, loadingOptions): if url in loadingOptions.idx: return _document_load(loader, loadingOptions.idx[url], url, loadingOptions) text = loadingOptions.fetcher.fetch_text(url) if isinstance(text, bytes): textIO = StringIO(text.decode('utf-8')) else: textIO = StringIO(text) textIO.name = url # type: ignore result = yaml.round_trip_load(textIO) add_lc_filename(result, url) loadingOptions.idx[url] = result loadingOptions = LoadingOptions(copyfrom=loadingOptions, fileuri=url) return _document_load(loader, result, url, loadingOptions) class RecordField(Savable): """ A field of a record. """ def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} if 'name' in doc: try: self.name = load_field(doc.get('name'), uri_strtype_True_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'name', str).makeError("the `name` field is not valid because:\n"+str(e))) else: self.name = None if self.name is None: if docRoot is not None: self.name = docRoot else: raise ValidationException("Missing name") baseuri = self.name if 'doc' in doc: try: self.doc = load_field(doc.get('doc'), union_of_None_type_or_strtype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'doc', str).makeError("the `doc` field is not valid because:\n"+str(e))) else: self.doc = None try: self.type = load_field(doc.get('type'), typedsl_uri_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_or_array_of_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_False_True_2, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'type', str).makeError("the `type` field is not valid because:\n"+str(e))) if errors: raise ValidationException("Trying 'RecordField'\n"+"\n".join(errors)) def save(self): r = {} if self.name is not None: r['name'] = save(self.name) if self.doc is not None: r['doc'] = save(self.doc) if self.type is not None: r['type'] = save(self.type) return r class RecordSchema(Savable): def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} if 'fields' in doc: try: self.fields = load_field(doc.get('fields'), idmap_fields_union_of_None_type_or_array_of_RecordFieldLoader, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'fields', str).makeError("the `fields` field is not valid because:\n"+str(e))) else: self.fields = None try: self.type = load_field(doc.get('type'), typedsl_uri_Record_symbolLoader_False_True_2, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'type', str).makeError("the `type` field is not valid because:\n"+str(e))) if errors: raise ValidationException("Trying 'RecordSchema'\n"+"\n".join(errors)) def save(self): r = {} if self.fields is not None: r['fields'] = save(self.fields) if self.type is not None: r['type'] = save(self.type) return r class EnumSchema(Savable): """ Define an enumerated type. """ def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} try: self.symbols = load_field(doc.get('symbols'), uri_array_of_strtype_False_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'symbols', str).makeError("the `symbols` field is not valid because:\n"+str(e))) try: self.type = load_field(doc.get('type'), typedsl_uri_Enum_symbolLoader_False_True_2, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'type', str).makeError("the `type` field is not valid because:\n"+str(e))) if errors: raise ValidationException("Trying 'EnumSchema'\n"+"\n".join(errors)) def save(self): r = {} if self.symbols is not None: r['symbols'] = save(self.symbols) if self.type is not None: r['type'] = save(self.type) return r class ArraySchema(Savable): def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} try: self.items = load_field(doc.get('items'), uri_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_or_array_of_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_False_True_2, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'items', str).makeError("the `items` field is not valid because:\n"+str(e))) try: self.type = load_field(doc.get('type'), typedsl_uri_Array_symbolLoader_False_True_2, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'type', str).makeError("the `type` field is not valid because:\n"+str(e))) if errors: raise ValidationException("Trying 'ArraySchema'\n"+"\n".join(errors)) def save(self): r = {} if self.items is not None: r['items'] = save(self.items) if self.type is not None: r['type'] = save(self.type) return r class JsonldPredicate(Savable): """ Attached to a record field to define how the parent record field is handled for URI resolution and JSON-LD context generation. """ def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} if '_id' in doc: try: self._id = load_field(doc.get('_id'), uri_union_of_None_type_or_strtype_False_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, '_id', str).makeError("the `_id` field is not valid because:\n"+str(e))) else: self._id = None if '_type' in doc: try: self._type = load_field(doc.get('_type'), union_of_None_type_or_strtype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, '_type', str).makeError("the `_type` field is not valid because:\n"+str(e))) else: self._type = None if '_container' in doc: try: self._container = load_field(doc.get('_container'), union_of_None_type_or_strtype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, '_container', str).makeError("the `_container` field is not valid because:\n"+str(e))) else: self._container = None if 'identity' in doc: try: self.identity = load_field(doc.get('identity'), union_of_None_type_or_booltype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'identity', str).makeError("the `identity` field is not valid because:\n"+str(e))) else: self.identity = None if 'noLinkCheck' in doc: try: self.noLinkCheck = load_field(doc.get('noLinkCheck'), union_of_None_type_or_booltype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'noLinkCheck', str).makeError("the `noLinkCheck` field is not valid because:\n"+str(e))) else: self.noLinkCheck = None if 'mapSubject' in doc: try: self.mapSubject = load_field(doc.get('mapSubject'), union_of_None_type_or_strtype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'mapSubject', str).makeError("the `mapSubject` field is not valid because:\n"+str(e))) else: self.mapSubject = None if 'mapPredicate' in doc: try: self.mapPredicate = load_field(doc.get('mapPredicate'), union_of_None_type_or_strtype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'mapPredicate', str).makeError("the `mapPredicate` field is not valid because:\n"+str(e))) else: self.mapPredicate = None if 'refScope' in doc: try: self.refScope = load_field(doc.get('refScope'), union_of_None_type_or_inttype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'refScope', str).makeError("the `refScope` field is not valid because:\n"+str(e))) else: self.refScope = None if 'typeDSL' in doc: try: self.typeDSL = load_field(doc.get('typeDSL'), union_of_None_type_or_booltype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'typeDSL', str).makeError("the `typeDSL` field is not valid because:\n"+str(e))) else: self.typeDSL = None if errors: raise ValidationException("Trying 'JsonldPredicate'\n"+"\n".join(errors)) def save(self): r = {} if self._id is not None: r['_id'] = save(self._id) if self._type is not None: r['_type'] = save(self._type) if self._container is not None: r['_container'] = save(self._container) if self.identity is not None: r['identity'] = save(self.identity) if self.noLinkCheck is not None: r['noLinkCheck'] = save(self.noLinkCheck) if self.mapSubject is not None: r['mapSubject'] = save(self.mapSubject) if self.mapPredicate is not None: r['mapPredicate'] = save(self.mapPredicate) if self.refScope is not None: r['refScope'] = save(self.refScope) if self.typeDSL is not None: r['typeDSL'] = save(self.typeDSL) return r class SpecializeDef(Savable): def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} try: self.specializeFrom = load_field(doc.get('specializeFrom'), uri_strtype_False_False_1, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'specializeFrom', str).makeError("the `specializeFrom` field is not valid because:\n"+str(e))) try: self.specializeTo = load_field(doc.get('specializeTo'), uri_strtype_False_False_1, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'specializeTo', str).makeError("the `specializeTo` field is not valid because:\n"+str(e))) if errors: raise ValidationException("Trying 'SpecializeDef'\n"+"\n".join(errors)) def save(self): r = {} if self.specializeFrom is not None: r['specializeFrom'] = save(self.specializeFrom) if self.specializeTo is not None: r['specializeTo'] = save(self.specializeTo) return r class NamedType(Savable): def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} if 'name' in doc: try: self.name = load_field(doc.get('name'), uri_strtype_True_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'name', str).makeError("the `name` field is not valid because:\n"+str(e))) else: self.name = None if self.name is None: if docRoot is not None: self.name = docRoot else: raise ValidationException("Missing name") baseuri = self.name if 'inVocab' in doc: try: self.inVocab = load_field(doc.get('inVocab'), union_of_None_type_or_booltype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'inVocab', str).makeError("the `inVocab` field is not valid because:\n"+str(e))) else: self.inVocab = None if errors: raise ValidationException("Trying 'NamedType'\n"+"\n".join(errors)) def save(self): r = {} if self.name is not None: r['name'] = save(self.name) if self.inVocab is not None: r['inVocab'] = save(self.inVocab) return r class DocType(Savable): def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} if 'doc' in doc: try: self.doc = load_field(doc.get('doc'), union_of_None_type_or_strtype_or_array_of_strtype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'doc', str).makeError("the `doc` field is not valid because:\n"+str(e))) else: self.doc = None if 'docParent' in doc: try: self.docParent = load_field(doc.get('docParent'), uri_union_of_None_type_or_strtype_False_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docParent', str).makeError("the `docParent` field is not valid because:\n"+str(e))) else: self.docParent = None if 'docChild' in doc: try: self.docChild = load_field(doc.get('docChild'), uri_union_of_None_type_or_strtype_or_array_of_strtype_False_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docChild', str).makeError("the `docChild` field is not valid because:\n"+str(e))) else: self.docChild = None if 'docAfter' in doc: try: self.docAfter = load_field(doc.get('docAfter'), uri_union_of_None_type_or_strtype_False_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docAfter', str).makeError("the `docAfter` field is not valid because:\n"+str(e))) else: self.docAfter = None if errors: raise ValidationException("Trying 'DocType'\n"+"\n".join(errors)) def save(self): r = {} if self.doc is not None: r['doc'] = save(self.doc) if self.docParent is not None: r['docParent'] = save(self.docParent) if self.docChild is not None: r['docChild'] = save(self.docChild) if self.docAfter is not None: r['docAfter'] = save(self.docAfter) return r class SchemaDefinedType(DocType): """ Abstract base for schema-defined types. """ def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} if 'doc' in doc: try: self.doc = load_field(doc.get('doc'), union_of_None_type_or_strtype_or_array_of_strtype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'doc', str).makeError("the `doc` field is not valid because:\n"+str(e))) else: self.doc = None if 'docParent' in doc: try: self.docParent = load_field(doc.get('docParent'), uri_union_of_None_type_or_strtype_False_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docParent', str).makeError("the `docParent` field is not valid because:\n"+str(e))) else: self.docParent = None if 'docChild' in doc: try: self.docChild = load_field(doc.get('docChild'), uri_union_of_None_type_or_strtype_or_array_of_strtype_False_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docChild', str).makeError("the `docChild` field is not valid because:\n"+str(e))) else: self.docChild = None if 'docAfter' in doc: try: self.docAfter = load_field(doc.get('docAfter'), uri_union_of_None_type_or_strtype_False_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docAfter', str).makeError("the `docAfter` field is not valid because:\n"+str(e))) else: self.docAfter = None if 'jsonldPredicate' in doc: try: self.jsonldPredicate = load_field(doc.get('jsonldPredicate'), union_of_None_type_or_strtype_or_JsonldPredicateLoader, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'jsonldPredicate', str).makeError("the `jsonldPredicate` field is not valid because:\n"+str(e))) else: self.jsonldPredicate = None if 'documentRoot' in doc: try: self.documentRoot = load_field(doc.get('documentRoot'), union_of_None_type_or_booltype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'documentRoot', str).makeError("the `documentRoot` field is not valid because:\n"+str(e))) else: self.documentRoot = None if errors: raise ValidationException("Trying 'SchemaDefinedType'\n"+"\n".join(errors)) def save(self): r = {} if self.doc is not None: r['doc'] = save(self.doc) if self.docParent is not None: r['docParent'] = save(self.docParent) if self.docChild is not None: r['docChild'] = save(self.docChild) if self.docAfter is not None: r['docAfter'] = save(self.docAfter) if self.jsonldPredicate is not None: r['jsonldPredicate'] = save(self.jsonldPredicate) if self.documentRoot is not None: r['documentRoot'] = save(self.documentRoot) return r class SaladRecordField(RecordField): """ A field of a record. """ def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} if 'name' in doc: try: self.name = load_field(doc.get('name'), uri_strtype_True_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'name', str).makeError("the `name` field is not valid because:\n"+str(e))) else: self.name = None if self.name is None: if docRoot is not None: self.name = docRoot else: raise ValidationException("Missing name") baseuri = self.name if 'doc' in doc: try: self.doc = load_field(doc.get('doc'), union_of_None_type_or_strtype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'doc', str).makeError("the `doc` field is not valid because:\n"+str(e))) else: self.doc = None try: self.type = load_field(doc.get('type'), typedsl_uri_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_or_array_of_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_False_True_2, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'type', str).makeError("the `type` field is not valid because:\n"+str(e))) if 'jsonldPredicate' in doc: try: self.jsonldPredicate = load_field(doc.get('jsonldPredicate'), union_of_None_type_or_strtype_or_JsonldPredicateLoader, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'jsonldPredicate', str).makeError("the `jsonldPredicate` field is not valid because:\n"+str(e))) else: self.jsonldPredicate = None if errors: raise ValidationException("Trying 'SaladRecordField'\n"+"\n".join(errors)) def save(self): r = {} if self.name is not None: r['name'] = save(self.name) if self.doc is not None: r['doc'] = save(self.doc) if self.type is not None: r['type'] = save(self.type) if self.jsonldPredicate is not None: r['jsonldPredicate'] = save(self.jsonldPredicate) return r class SaladRecordSchema(NamedType, RecordSchema, SchemaDefinedType): def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} if 'name' in doc: try: self.name = load_field(doc.get('name'), uri_strtype_True_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'name', str).makeError("the `name` field is not valid because:\n"+str(e))) else: self.name = None if self.name is None: if docRoot is not None: self.name = docRoot else: raise ValidationException("Missing name") baseuri = self.name if 'inVocab' in doc: try: self.inVocab = load_field(doc.get('inVocab'), union_of_None_type_or_booltype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'inVocab', str).makeError("the `inVocab` field is not valid because:\n"+str(e))) else: self.inVocab = None if 'fields' in doc: try: self.fields = load_field(doc.get('fields'), idmap_fields_union_of_None_type_or_array_of_SaladRecordFieldLoader, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'fields', str).makeError("the `fields` field is not valid because:\n"+str(e))) else: self.fields = None try: self.type = load_field(doc.get('type'), typedsl_uri_Record_symbolLoader_False_True_2, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'type', str).makeError("the `type` field is not valid because:\n"+str(e))) if 'doc' in doc: try: self.doc = load_field(doc.get('doc'), union_of_None_type_or_strtype_or_array_of_strtype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'doc', str).makeError("the `doc` field is not valid because:\n"+str(e))) else: self.doc = None if 'docParent' in doc: try: self.docParent = load_field(doc.get('docParent'), uri_union_of_None_type_or_strtype_False_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docParent', str).makeError("the `docParent` field is not valid because:\n"+str(e))) else: self.docParent = None if 'docChild' in doc: try: self.docChild = load_field(doc.get('docChild'), uri_union_of_None_type_or_strtype_or_array_of_strtype_False_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docChild', str).makeError("the `docChild` field is not valid because:\n"+str(e))) else: self.docChild = None if 'docAfter' in doc: try: self.docAfter = load_field(doc.get('docAfter'), uri_union_of_None_type_or_strtype_False_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docAfter', str).makeError("the `docAfter` field is not valid because:\n"+str(e))) else: self.docAfter = None if 'jsonldPredicate' in doc: try: self.jsonldPredicate = load_field(doc.get('jsonldPredicate'), union_of_None_type_or_strtype_or_JsonldPredicateLoader, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'jsonldPredicate', str).makeError("the `jsonldPredicate` field is not valid because:\n"+str(e))) else: self.jsonldPredicate = None if 'documentRoot' in doc: try: self.documentRoot = load_field(doc.get('documentRoot'), union_of_None_type_or_booltype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'documentRoot', str).makeError("the `documentRoot` field is not valid because:\n"+str(e))) else: self.documentRoot = None if 'abstract' in doc: try: self.abstract = load_field(doc.get('abstract'), union_of_None_type_or_booltype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'abstract', str).makeError("the `abstract` field is not valid because:\n"+str(e))) else: self.abstract = None if 'extends' in doc: try: self.extends = load_field(doc.get('extends'), uri_union_of_None_type_or_strtype_or_array_of_strtype_False_False_1, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'extends', str).makeError("the `extends` field is not valid because:\n"+str(e))) else: self.extends = None if 'specialize' in doc: try: self.specialize = load_field(doc.get('specialize'), idmap_specialize_union_of_None_type_or_array_of_SpecializeDefLoader, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'specialize', str).makeError("the `specialize` field is not valid because:\n"+str(e))) else: self.specialize = None if errors: raise ValidationException("Trying 'SaladRecordSchema'\n"+"\n".join(errors)) def save(self): r = {} if self.name is not None: r['name'] = save(self.name) if self.inVocab is not None: r['inVocab'] = save(self.inVocab) if self.fields is not None: r['fields'] = save(self.fields) if self.type is not None: r['type'] = save(self.type) if self.doc is not None: r['doc'] = save(self.doc) if self.docParent is not None: r['docParent'] = save(self.docParent) if self.docChild is not None: r['docChild'] = save(self.docChild) if self.docAfter is not None: r['docAfter'] = save(self.docAfter) if self.jsonldPredicate is not None: r['jsonldPredicate'] = save(self.jsonldPredicate) if self.documentRoot is not None: r['documentRoot'] = save(self.documentRoot) if self.abstract is not None: r['abstract'] = save(self.abstract) if self.extends is not None: r['extends'] = save(self.extends) if self.specialize is not None: r['specialize'] = save(self.specialize) return r class SaladEnumSchema(NamedType, EnumSchema, SchemaDefinedType): """ Define an enumerated type. """ def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} if 'name' in doc: try: self.name = load_field(doc.get('name'), uri_strtype_True_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'name', str).makeError("the `name` field is not valid because:\n"+str(e))) else: self.name = None if self.name is None: if docRoot is not None: self.name = docRoot else: raise ValidationException("Missing name") baseuri = self.name if 'inVocab' in doc: try: self.inVocab = load_field(doc.get('inVocab'), union_of_None_type_or_booltype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'inVocab', str).makeError("the `inVocab` field is not valid because:\n"+str(e))) else: self.inVocab = None try: self.symbols = load_field(doc.get('symbols'), uri_array_of_strtype_False_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'symbols', str).makeError("the `symbols` field is not valid because:\n"+str(e))) try: self.type = load_field(doc.get('type'), typedsl_uri_Enum_symbolLoader_False_True_2, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'type', str).makeError("the `type` field is not valid because:\n"+str(e))) if 'doc' in doc: try: self.doc = load_field(doc.get('doc'), union_of_None_type_or_strtype_or_array_of_strtype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'doc', str).makeError("the `doc` field is not valid because:\n"+str(e))) else: self.doc = None if 'docParent' in doc: try: self.docParent = load_field(doc.get('docParent'), uri_union_of_None_type_or_strtype_False_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docParent', str).makeError("the `docParent` field is not valid because:\n"+str(e))) else: self.docParent = None if 'docChild' in doc: try: self.docChild = load_field(doc.get('docChild'), uri_union_of_None_type_or_strtype_or_array_of_strtype_False_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docChild', str).makeError("the `docChild` field is not valid because:\n"+str(e))) else: self.docChild = None if 'docAfter' in doc: try: self.docAfter = load_field(doc.get('docAfter'), uri_union_of_None_type_or_strtype_False_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docAfter', str).makeError("the `docAfter` field is not valid because:\n"+str(e))) else: self.docAfter = None if 'jsonldPredicate' in doc: try: self.jsonldPredicate = load_field(doc.get('jsonldPredicate'), union_of_None_type_or_strtype_or_JsonldPredicateLoader, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'jsonldPredicate', str).makeError("the `jsonldPredicate` field is not valid because:\n"+str(e))) else: self.jsonldPredicate = None if 'documentRoot' in doc: try: self.documentRoot = load_field(doc.get('documentRoot'), union_of_None_type_or_booltype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'documentRoot', str).makeError("the `documentRoot` field is not valid because:\n"+str(e))) else: self.documentRoot = None if 'extends' in doc: try: self.extends = load_field(doc.get('extends'), uri_union_of_None_type_or_strtype_or_array_of_strtype_False_False_1, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'extends', str).makeError("the `extends` field is not valid because:\n"+str(e))) else: self.extends = None if errors: raise ValidationException("Trying 'SaladEnumSchema'\n"+"\n".join(errors)) def save(self): r = {} if self.name is not None: r['name'] = save(self.name) if self.inVocab is not None: r['inVocab'] = save(self.inVocab) if self.symbols is not None: r['symbols'] = save(self.symbols) if self.type is not None: r['type'] = save(self.type) if self.doc is not None: r['doc'] = save(self.doc) if self.docParent is not None: r['docParent'] = save(self.docParent) if self.docChild is not None: r['docChild'] = save(self.docChild) if self.docAfter is not None: r['docAfter'] = save(self.docAfter) if self.jsonldPredicate is not None: r['jsonldPredicate'] = save(self.jsonldPredicate) if self.documentRoot is not None: r['documentRoot'] = save(self.documentRoot) if self.extends is not None: r['extends'] = save(self.extends) return r class Documentation(NamedType, DocType): """ A documentation section. This type exists to facilitate self-documenting schemas but has no role in formal validation. """ def __init__(self, _doc, baseuri, loadingOptions, docRoot=None): doc = copy.copy(_doc) if hasattr(_doc, 'lc'): doc.lc.data = _doc.lc.data doc.lc.filename = _doc.lc.filename errors = [] #doc = {expand_url(d, u"", loadingOptions, scoped_id=False, vocab_term=True): v for d,v in doc.items()} if 'name' in doc: try: self.name = load_field(doc.get('name'), uri_strtype_True_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'name', str).makeError("the `name` field is not valid because:\n"+str(e))) else: self.name = None if self.name is None: if docRoot is not None: self.name = docRoot else: raise ValidationException("Missing name") baseuri = self.name if 'inVocab' in doc: try: self.inVocab = load_field(doc.get('inVocab'), union_of_None_type_or_booltype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'inVocab', str).makeError("the `inVocab` field is not valid because:\n"+str(e))) else: self.inVocab = None if 'doc' in doc: try: self.doc = load_field(doc.get('doc'), union_of_None_type_or_strtype_or_array_of_strtype, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'doc', str).makeError("the `doc` field is not valid because:\n"+str(e))) else: self.doc = None if 'docParent' in doc: try: self.docParent = load_field(doc.get('docParent'), uri_union_of_None_type_or_strtype_False_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docParent', str).makeError("the `docParent` field is not valid because:\n"+str(e))) else: self.docParent = None if 'docChild' in doc: try: self.docChild = load_field(doc.get('docChild'), uri_union_of_None_type_or_strtype_or_array_of_strtype_False_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docChild', str).makeError("the `docChild` field is not valid because:\n"+str(e))) else: self.docChild = None if 'docAfter' in doc: try: self.docAfter = load_field(doc.get('docAfter'), uri_union_of_None_type_or_strtype_False_False_None, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'docAfter', str).makeError("the `docAfter` field is not valid because:\n"+str(e))) else: self.docAfter = None try: self.type = load_field(doc.get('type'), typedsl_uri_Documentation_symbolLoader_False_True_2, baseuri, loadingOptions) except ValidationException as e: errors.append(SourceLine(doc, 'type', str).makeError("the `type` field is not valid because:\n"+str(e))) if errors: raise ValidationException("Trying 'Documentation'\n"+"\n".join(errors)) def save(self): r = {} if self.name is not None: r['name'] = save(self.name) if self.inVocab is not None: r['inVocab'] = save(self.inVocab) if self.doc is not None: r['doc'] = save(self.doc) if self.docParent is not None: r['docParent'] = save(self.docParent) if self.docChild is not None: r['docChild'] = save(self.docChild) if self.docAfter is not None: r['docAfter'] = save(self.docAfter) if self.type is not None: r['type'] = save(self.type) return r _vocab = { "fields": "https://w3id.org/cwl/salad#fields", "int": "http://www.w3.org/2001/XMLSchema#int", "refScope": "https://w3id.org/cwl/salad#JsonldPredicate/refScope", "abstract": "https://w3id.org/cwl/salad#SaladRecordSchema/abstract", "float": "http://www.w3.org/2001/XMLSchema#float", "symbols": "https://w3id.org/cwl/salad#symbols", "inVocab": "https://w3id.org/cwl/salad#NamedType/inVocab", "jsonldPredicate": "https://w3id.org/cwl/salad#SchemaDefinedType/jsonldPredicate", "boolean": "http://www.w3.org/2001/XMLSchema#boolean", "mapPredicate": "https://w3id.org/cwl/salad#JsonldPredicate/mapPredicate", "NamedType": "https://w3id.org/cwl/salad#NamedType", "array": "https://w3id.org/cwl/salad#array", "null": "https://w3id.org/cwl/salad#null", "SchemaDefinedType": "https://w3id.org/cwl/salad#SchemaDefinedType", "mapSubject": "https://w3id.org/cwl/salad#JsonldPredicate/mapSubject", "SaladRecordField": "https://w3id.org/cwl/salad#SaladRecordField", "SaladEnumSchema": "https://w3id.org/cwl/salad#SaladEnumSchema", "SpecializeDef": "https://w3id.org/cwl/salad#SpecializeDef", "DocType": "https://w3id.org/cwl/salad#DocType", "long": "http://www.w3.org/2001/XMLSchema#long", "JsonldPredicate": "https://w3id.org/cwl/salad#JsonldPredicate", "docParent": "https://w3id.org/cwl/salad#docParent", "extends": "https://w3id.org/cwl/salad#extends", "specializeFrom": "https://w3id.org/cwl/salad#specializeFrom", "type": "https://w3id.org/cwl/salad#type", "ArraySchema": "https://w3id.org/cwl/salad#ArraySchema", "_type": "https://w3id.org/cwl/salad#JsonldPredicate/_type", "docChild": "https://w3id.org/cwl/salad#docChild", "string": "http://www.w3.org/2001/XMLSchema#string", "RecordField": "https://w3id.org/cwl/salad#RecordField", "enum": "https://w3id.org/cwl/salad#enum", "RecordSchema": "https://w3id.org/cwl/salad#RecordSchema", "typeDSL": "https://w3id.org/cwl/salad#JsonldPredicate/typeDSL", "Documentation": "https://w3id.org/cwl/salad#Documentation", "docAfter": "https://w3id.org/cwl/salad#docAfter", "_container": "https://w3id.org/cwl/salad#JsonldPredicate/_container", "noLinkCheck": "https://w3id.org/cwl/salad#JsonldPredicate/noLinkCheck", "identity": "https://w3id.org/cwl/salad#JsonldPredicate/identity", "EnumSchema": "https://w3id.org/cwl/salad#EnumSchema", "specialize": "https://w3id.org/cwl/salad#specialize", "documentRoot": "https://w3id.org/cwl/salad#SchemaDefinedType/documentRoot", "double": "http://www.w3.org/2001/XMLSchema#double", "documentation": "https://w3id.org/cwl/salad#documentation", "SaladRecordSchema": "https://w3id.org/cwl/salad#SaladRecordSchema", "record": "https://w3id.org/cwl/salad#record", "doc": "https://w3id.org/cwl/salad#DocType/doc", "specializeTo": "https://w3id.org/cwl/salad#specializeTo", "items": "https://w3id.org/cwl/salad#items", "_id": "https://w3id.org/cwl/salad#_id", "Any": "https://w3id.org/cwl/salad#Any", } _rvocab = { "https://w3id.org/cwl/salad#fields": "fields", "http://www.w3.org/2001/XMLSchema#int": "int", "https://w3id.org/cwl/salad#JsonldPredicate/refScope": "refScope", "https://w3id.org/cwl/salad#SaladRecordSchema/abstract": "abstract", "http://www.w3.org/2001/XMLSchema#float": "float", "https://w3id.org/cwl/salad#symbols": "symbols", "https://w3id.org/cwl/salad#NamedType/inVocab": "inVocab", "https://w3id.org/cwl/salad#SchemaDefinedType/jsonldPredicate": "jsonldPredicate", "http://www.w3.org/2001/XMLSchema#boolean": "boolean", "https://w3id.org/cwl/salad#JsonldPredicate/mapPredicate": "mapPredicate", "https://w3id.org/cwl/salad#NamedType": "NamedType", "https://w3id.org/cwl/salad#array": "array", "https://w3id.org/cwl/salad#null": "null", "https://w3id.org/cwl/salad#SchemaDefinedType": "SchemaDefinedType", "https://w3id.org/cwl/salad#JsonldPredicate/mapSubject": "mapSubject", "https://w3id.org/cwl/salad#SaladRecordField": "SaladRecordField", "https://w3id.org/cwl/salad#SaladEnumSchema": "SaladEnumSchema", "https://w3id.org/cwl/salad#SpecializeDef": "SpecializeDef", "https://w3id.org/cwl/salad#DocType": "DocType", "http://www.w3.org/2001/XMLSchema#long": "long", "https://w3id.org/cwl/salad#JsonldPredicate": "JsonldPredicate", "https://w3id.org/cwl/salad#docParent": "docParent", "https://w3id.org/cwl/salad#extends": "extends", "https://w3id.org/cwl/salad#specializeFrom": "specializeFrom", "https://w3id.org/cwl/salad#type": "type", "https://w3id.org/cwl/salad#ArraySchema": "ArraySchema", "https://w3id.org/cwl/salad#JsonldPredicate/_type": "_type", "https://w3id.org/cwl/salad#docChild": "docChild", "http://www.w3.org/2001/XMLSchema#string": "string", "https://w3id.org/cwl/salad#RecordField": "RecordField", "https://w3id.org/cwl/salad#enum": "enum", "https://w3id.org/cwl/salad#RecordSchema": "RecordSchema", "https://w3id.org/cwl/salad#JsonldPredicate/typeDSL": "typeDSL", "https://w3id.org/cwl/salad#Documentation": "Documentation", "https://w3id.org/cwl/salad#docAfter": "docAfter", "https://w3id.org/cwl/salad#JsonldPredicate/_container": "_container", "https://w3id.org/cwl/salad#JsonldPredicate/noLinkCheck": "noLinkCheck", "https://w3id.org/cwl/salad#JsonldPredicate/identity": "identity", "https://w3id.org/cwl/salad#EnumSchema": "EnumSchema", "https://w3id.org/cwl/salad#specialize": "specialize", "https://w3id.org/cwl/salad#SchemaDefinedType/documentRoot": "documentRoot", "http://www.w3.org/2001/XMLSchema#double": "double", "https://w3id.org/cwl/salad#documentation": "documentation", "https://w3id.org/cwl/salad#SaladRecordSchema": "SaladRecordSchema", "https://w3id.org/cwl/salad#record": "record", "https://w3id.org/cwl/salad#DocType/doc": "doc", "https://w3id.org/cwl/salad#specializeTo": "specializeTo", "https://w3id.org/cwl/salad#items": "items", "https://w3id.org/cwl/salad#_id": "_id", "https://w3id.org/cwl/salad#Any": "Any", } inttype = _PrimitiveLoader(int) booltype = _PrimitiveLoader(bool) None_type = _PrimitiveLoader(NoneType) strtype = _PrimitiveLoader((str, six.text_type)) PrimitiveTypeLoader = _EnumLoader(("null", "boolean", "int", "long", "float", "double", "string",)) AnyLoader = _EnumLoader(("Any",)) RecordFieldLoader = _RecordLoader(RecordField) RecordSchemaLoader = _RecordLoader(RecordSchema) EnumSchemaLoader = _RecordLoader(EnumSchema) ArraySchemaLoader = _RecordLoader(ArraySchema) JsonldPredicateLoader = _RecordLoader(JsonldPredicate) SpecializeDefLoader = _RecordLoader(SpecializeDef) NamedTypeLoader = _RecordLoader(NamedType) DocTypeLoader = _RecordLoader(DocType) SchemaDefinedTypeLoader = _RecordLoader(SchemaDefinedType) SaladRecordFieldLoader = _RecordLoader(SaladRecordField) SaladRecordSchemaLoader = _RecordLoader(SaladRecordSchema) SaladEnumSchemaLoader = _RecordLoader(SaladEnumSchema) DocumentationLoader = _RecordLoader(Documentation) uri_strtype_True_False_None = _URILoader(strtype, True, False, None) union_of_None_type_or_strtype = _UnionLoader((None_type, strtype)) union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype = _UnionLoader((PrimitiveTypeLoader, RecordSchemaLoader, EnumSchemaLoader, ArraySchemaLoader, strtype)) array_of_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype = _ArrayLoader(union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype) union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_or_array_of_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype = _UnionLoader((PrimitiveTypeLoader, RecordSchemaLoader, EnumSchemaLoader, ArraySchemaLoader, strtype, array_of_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype)) uri_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_or_array_of_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_False_True_2 = _URILoader(union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_or_array_of_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype, False, True, 2) typedsl_uri_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_or_array_of_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_False_True_2 = _TypeDSLLoader(uri_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_or_array_of_union_of_PrimitiveTypeLoader_or_RecordSchemaLoader_or_EnumSchemaLoader_or_ArraySchemaLoader_or_strtype_False_True_2) array_of_RecordFieldLoader = _ArrayLoader(RecordFieldLoader) union_of_None_type_or_array_of_RecordFieldLoader = _UnionLoader((None_type, array_of_RecordFieldLoader)) idmap_fields_union_of_None_type_or_array_of_RecordFieldLoader = _IdMapLoader(union_of_None_type_or_array_of_RecordFieldLoader, 'name', 'type') Record_symbolLoader = _EnumLoader(("record",)) uri_Record_symbolLoader_False_True_2 = _URILoader(Record_symbolLoader, False, True, 2) typedsl_uri_Record_symbolLoader_False_True_2 = _TypeDSLLoader(uri_Record_symbolLoader_False_True_2) array_of_strtype = _ArrayLoader(strtype) uri_array_of_strtype_False_False_None = _URILoader(array_of_strtype, False, False, None) Enum_symbolLoader = _EnumLoader(("enum",)) uri_Enum_symbolLoader_False_True_2 = _URILoader(Enum_symbolLoader, False, True, 2) typedsl_uri_Enum_symbolLoader_False_True_2 = _TypeDSLLoader(uri_Enum_symbolLoader_False_True_2) Array_symbolLoader = _EnumLoader(("array",)) uri_Array_symbolLoader_False_True_2 = _URILoader(Array_symbolLoader, False, True, 2) typedsl_uri_Array_symbolLoader_False_True_2 = _TypeDSLLoader(uri_Array_symbolLoader_False_True_2) uri_union_of_None_type_or_strtype_False_False_None = _URILoader(union_of_None_type_or_strtype, False, False, None) union_of_None_type_or_booltype = _UnionLoader((None_type, booltype)) union_of_None_type_or_inttype = _UnionLoader((None_type, inttype)) uri_strtype_False_False_1 = _URILoader(strtype, False, False, 1) union_of_None_type_or_strtype_or_array_of_strtype = _UnionLoader((None_type, strtype, array_of_strtype)) uri_union_of_None_type_or_strtype_or_array_of_strtype_False_False_None = _URILoader(union_of_None_type_or_strtype_or_array_of_strtype, False, False, None) union_of_None_type_or_strtype_or_JsonldPredicateLoader = _UnionLoader((None_type, strtype, JsonldPredicateLoader)) array_of_SaladRecordFieldLoader = _ArrayLoader(SaladRecordFieldLoader) union_of_None_type_or_array_of_SaladRecordFieldLoader = _UnionLoader((None_type, array_of_SaladRecordFieldLoader)) idmap_fields_union_of_None_type_or_array_of_SaladRecordFieldLoader = _IdMapLoader(union_of_None_type_or_array_of_SaladRecordFieldLoader, 'name', 'type') uri_union_of_None_type_or_strtype_or_array_of_strtype_False_False_1 = _URILoader(union_of_None_type_or_strtype_or_array_of_strtype, False, False, 1) array_of_SpecializeDefLoader = _ArrayLoader(SpecializeDefLoader) union_of_None_type_or_array_of_SpecializeDefLoader = _UnionLoader((None_type, array_of_SpecializeDefLoader)) idmap_specialize_union_of_None_type_or_array_of_SpecializeDefLoader = _IdMapLoader(union_of_None_type_or_array_of_SpecializeDefLoader, 'specializeFrom', 'specializeTo') Documentation_symbolLoader = _EnumLoader(("documentation",)) uri_Documentation_symbolLoader_False_True_2 = _URILoader(Documentation_symbolLoader, False, True, 2) typedsl_uri_Documentation_symbolLoader_False_True_2 = _TypeDSLLoader(uri_Documentation_symbolLoader_False_True_2) union_of_SaladRecordSchemaLoader_or_SaladEnumSchemaLoader_or_DocumentationLoader = _UnionLoader((SaladRecordSchemaLoader, SaladEnumSchemaLoader, DocumentationLoader)) array_of_union_of_SaladRecordSchemaLoader_or_SaladEnumSchemaLoader_or_DocumentationLoader = _ArrayLoader(union_of_SaladRecordSchemaLoader_or_SaladEnumSchemaLoader_or_DocumentationLoader) union_of_SaladRecordSchemaLoader_or_SaladEnumSchemaLoader_or_DocumentationLoader_or_array_of_union_of_SaladRecordSchemaLoader_or_SaladEnumSchemaLoader_or_DocumentationLoader = _UnionLoader((SaladRecordSchemaLoader, SaladEnumSchemaLoader, DocumentationLoader, array_of_union_of_SaladRecordSchemaLoader_or_SaladEnumSchemaLoader_or_DocumentationLoader)) def load_document(doc, baseuri, loadingOptions): return _document_load(union_of_SaladRecordSchemaLoader_or_SaladEnumSchemaLoader_or_DocumentationLoader_or_array_of_union_of_SaladRecordSchemaLoader_or_SaladEnumSchemaLoader_or_DocumentationLoader, doc, baseuri, loadingOptions) schema-salad-2.6.20171201034858/schema_salad/tests/matcher.py0000644000175100017510000000177413203345013023323 0ustar peterpeter00000000000000# Copyright (C) The Arvados Authors. All rights reserved. # # SPDX-License-Identifier: Apache-2.0 import difflib import json import re class JsonDiffMatcher(object): """Raise AssertionError with a readable JSON diff when not __eq__(). Used with assert_called_with() so it's possible for a human to see the differences between expected and actual call arguments that include non-trivial data structures. """ def __init__(self, expected): self.expected = expected def __eq__(self, actual): expected_json = json.dumps(self.expected, sort_keys=True, indent=2) actual_json = json.dumps(actual, sort_keys=True, indent=2) if expected_json != actual_json: raise AssertionError("".join(difflib.context_diff( expected_json.splitlines(1), actual_json.splitlines(1), fromfile="Expected", tofile="Actual"))) return True def StripYAMLComments(yml): return re.sub(r'(?ms)^(#.*?\n)*\n*', '', yml) schema-salad-2.6.20171201034858/schema_salad/tests/EDAM.owl0000644000175100017510001176501012752677740022610 0ustar peterpeter00000000000000 ]> EDAM_topic http://edamontology.org/topic_ "EDAM topics" EDAM_operation http://edamontology.org/operation_ "EDAM operations" 09:07:2015 3625 formats "EDAM data formats" EDAM An ontology of bioinformatics topics, operations, types of data including identifiers, and data formats identifiers "EDAM types of identifiers" data "EDAM types of data" relations "EDAM relations" edam "EDAM" EDAM editors: Jon Ison, Matus Kalas, and Herve Menager. Contributors: Inge Jonassen, Dan Bolser, Hamish McWilliam, Mahmut Uludag, James Malone, Rodrigo Lopez, Steve Pettifer, and Peter Rice. Contibutions from these projects: EMBRACE, ELIXIR, and BioMedBridges (EU); EMBOSS (BBSRC, UK); eSysbio, FUGE Bioinformatics Platform, and ELIXIR.NO/Norwegian Bioinformatics Platform (Research Council of Norway). See http://edamontology.org for documentation and licence. operations "EDAM operations" EDAM http://edamontology.org/ "EDAM relations and concept properties" application/rdf+xml EDAM_data http://edamontology.org/data_ "EDAM types of data" concept_properties "EDAM concept properties" Jon Ison Matúš Kalaš Jon Ison, Matus Kalas, Hervé Ménager EDAM_format http://edamontology.org/format_ "EDAM data formats" topics "EDAM topics" 1.11 Hervé Ménager EDAM is an ontology of well established, familiar concepts that are prevalent within bioinformatics, including types of data and data identifiers, data formats, operations and topics. EDAM is a simple ontology - essentially a set of terms with synonyms and definitions - organised into an intuitive hierarchy for convenient use by curators, software developers and end-users. EDAM is suitable for large-scale semantic annotations and categorization of diverse bioinformatics resources. EDAM is also suitable for diverse application including for example within workbenches and workflow-management systems, software distributions, and resource registries. Created in Version in which a concept was created. true concept_properties Documentation Specification 'Documentation' trailing modifier (qualifier, 'documentation') of 'xref' links of 'Format' concepts. When 'true', the link is pointing to a page with explanation, description, documentation, or specification of the given data format. true concept_properties Example 'Example' concept property ('example' metadat tag) lists examples of valid values of types of identifiers (accessions). Applicable to some other types of data, too. true concept_properties Obsolete since true concept_properties Version in which a concept was made obsolete. Regular expression 'Regular expression' concept property ('regex' metadata tag) specifies the allowed values of types of identifiers (accessions). Applicable to some other types of data, too. concept_properties true has format "http://purl.obolibrary.org/obo/OBI_0000298" Subject A can be any concept or entity outside of an ontology (or an ontology concept in a role of an entity being semantically annotated) that is (or is in a role of) 'Data', or an input, output, input or output argument of an 'Operation'. Object B can either be a concept that is a 'Format', or in unexpected cases an entity outside of an ontology that is a 'Format' or is in the role of a 'Format'. In EDAM, 'has_format' is not explicitly defined between EDAM concepts, only the inverse 'is_format_of'. false OBO_REL:is_a relations http://www.loa-cnr.it/ontologies/DOLCE-Lite.owl#has-quality" false false edam 'A has_format B' defines for the subject A, that it has the object B as its data format. false has function http://wsio.org/has_function false OBO_REL:is_a OBO_REL:bearer_of edam Subject A can be any concept or entity outside of an ontology (or an ontology concept in a role of an entity being semantically annotated). Object B can either be a concept that is (or is in a role of) a function, or an entity outside of an ontology that is (or is in a role of) a function specification. In the scope of EDAM, 'has_function' serves only for relating annotated entities outside of EDAM with 'Operation' concepts. false http://www.loa-cnr.it/ontologies/DOLCE-Lite.owl#has-quality" true 'A has_function B' defines for the subject A, that it has the object B as its function. "http://purl.obolibrary.org/obo/OBI_0000306" relations false true In very unusual cases. Is defined anywhere? Not in the 'unknown' version of RO. 'OBO_REL:bearer_of' is narrower in the sense that it only relates ontological categories (concepts) that are an 'independent_continuant' (snap:IndependentContinuant) with ontological categories that are a 'specifically_dependent_continuant' (snap:SpecificallyDependentContinuant), and broader in the sense that it relates with any borne objects not just functions of the subject. OBO_REL:bearer_of has identifier false false relations OBO_REL:is_a edam 'A has_identifier B' defines for the subject A, that it has the object B as its identifier. Subject A can be any concept or entity outside of an ontology (or an ontology concept in a role of an entity being semantically annotated). Object B can either be a concept that is an 'Identifier', or an entity outside of an ontology that is an 'Identifier' or is in the role of an 'Identifier'. In EDAM, 'has_identifier' is not explicitly defined between EDAM concepts, only the inverse 'is_identifier_of'. false false has input OBO_REL:has_participant "http://purl.obolibrary.org/obo/OBI_0000293" false http://wsio.org/has_input Subject A can either be concept that is or has an 'Operation' function, or an entity outside of an ontology (or an ontology concept in a role of an entity being semantically annotated) that has an 'Operation' function or is an 'Operation'. Object B can be any concept or entity. In EDAM, only 'has_input' is explicitly defined between EDAM concepts ('Operation' 'has_input' 'Data'). The inverse, 'is_input_of', is not explicitly defined. relations OBO_REL:is_a false 'A has_input B' defines for the subject A, that it has the object B as a necessary or actual input or input argument. false true edam OBO_REL:has_participant 'OBO_REL:has_participant' is narrower in the sense that it only relates ontological categories (concepts) that are a 'process' (span:Process) with ontological categories that are a 'continuant' (snap:Continuant), and broader in the sense that it relates with any participating objects not just inputs or input arguments of the subject. true In very unusual cases. has output http://wsio.org/has_output Subject A can either be concept that is or has an 'Operation' function, or an entity outside of an ontology (or an ontology concept in a role of an entity being semantically annotated) that has an 'Operation' function or is an 'Operation'. Object B can be any concept or entity. In EDAM, only 'has_output' is explicitly defined between EDAM concepts ('Operation' 'has_output' 'Data'). The inverse, 'is_output_of', is not explicitly defined. edam "http://purl.obolibrary.org/obo/OBI_0000299" OBO_REL:is_a relations OBO_REL:has_participant true 'A has_output B' defines for the subject A, that it has the object B as a necessary or actual output or output argument. false false false 'OBO_REL:has_participant' is narrower in the sense that it only relates ontological categories (concepts) that are a 'process' (span:Process) with ontological categories that are a 'continuant' (snap:Continuant), and broader in the sense that it relates with any participating objects not just outputs or output arguments of the subject. It is also not clear whether an output (result) actually participates in the process that generates it. OBO_REL:has_participant In very unusual cases. true has topic relations true Subject A can be any concept or entity outside of an ontology (or an ontology concept in a role of an entity being semantically annotated). Object B can either be a concept that is a 'Topic', or in unexpected cases an entity outside of an ontology that is a 'Topic' or is in the role of a 'Topic'. In EDAM, only 'has_topic' is explicitly defined between EDAM concepts ('Operation' or 'Data' 'has_topic' 'Topic'). The inverse, 'is_topic_of', is not explicitly defined. false 'A has_topic B' defines for the subject A, that it has the object B as its topic (A is in the scope of a topic B). edam OBO_REL:is_a http://annotation-ontology.googlecode.com/svn/trunk/annotation-core.owl#hasTopic false "http://purl.obolibrary.org/obo/IAO_0000136" false http://www.loa-cnr.it/ontologies/DOLCE-Lite.owl#has-quality "http://purl.obolibrary.org/obo/OBI_0000298" In very unusual cases. true is format of false OBO_REL:is_a false false false 'A is_format_of B' defines for the subject A, that it is a data format of the object B. edam relations Subject A can either be a concept that is a 'Format', or in unexpected cases an entity outside of an ontology (or an ontology concept in a role of an entity being semantically annotated) that is a 'Format' or is in the role of a 'Format'. Object B can be any concept or entity outside of an ontology that is (or is in a role of) 'Data', or an input, output, input or output argument of an 'Operation'. In EDAM, only 'is_format_of' is explicitly defined between EDAM concepts ('Format' 'is_format_of' 'Data'). The inverse, 'has_format', is not explicitly defined. OBO_REL:quality_of http://www.loa-cnr.it/ontologies/DOLCE-Lite.owl#inherent-in OBO_REL:quality_of Is defined anywhere? Not in the 'unknown' version of RO. 'OBO_REL:quality_of' might be seen narrower in the sense that it only relates subjects that are a 'quality' (snap:Quality) with objects that are an 'independent_continuant' (snap:IndependentContinuant), and is broader in the sense that it relates any qualities of the object. is function of Subject A can either be concept that is (or is in a role of) a function, or an entity outside of an ontology (or an ontology concept in a role of an entity being semantically annotated) that is (or is in a role of) a function specification. Object B can be any concept or entity. Within EDAM itself, 'is_function_of' is not used. OBO_REL:inheres_in true OBO_REL:is_a false 'A is_function_of B' defines for the subject A, that it is a function of the object B. OBO_REL:function_of edam http://wsio.org/is_function_of relations http://www.loa-cnr.it/ontologies/DOLCE-Lite.owl#inherent-in false false OBO_REL:inheres_in Is defined anywhere? Not in the 'unknown' version of RO. 'OBO_REL:inheres_in' is narrower in the sense that it only relates ontological categories (concepts) that are a 'specifically_dependent_continuant' (snap:SpecificallyDependentContinuant) with ontological categories that are an 'independent_continuant' (snap:IndependentContinuant), and broader in the sense that it relates any borne subjects not just functions. true In very unusual cases. OBO_REL:function_of Is defined anywhere? Not in the 'unknown' version of RO. 'OBO_REL:function_of' only relates subjects that are a 'function' (snap:Function) with objects that are an 'independent_continuant' (snap:IndependentContinuant), so for example no processes. It does not define explicitly that the subject is a function of the object. is identifier of false false edam false relations Subject A can either be a concept that is an 'Identifier', or an entity outside of an ontology (or an ontology concept in a role of an entity being semantically annotated) that is an 'Identifier' or is in the role of an 'Identifier'. Object B can be any concept or entity outside of an ontology. In EDAM, only 'is_identifier_of' is explicitly defined between EDAM concepts (only 'Identifier' 'is_identifier_of' 'Data'). The inverse, 'has_identifier', is not explicitly defined. 'A is_identifier_of B' defines for the subject A, that it is an identifier of the object B. OBO_REL:is_a false is input of false http://wsio.org/is_input_of relations true false OBO_REL:participates_in OBO_REL:is_a "http://purl.obolibrary.org/obo/OBI_0000295" edam Subject A can be any concept or entity outside of an ontology (or an ontology concept in a role of an entity being semantically annotated). Object B can either be a concept that is or has an 'Operation' function, or an entity outside of an ontology that has an 'Operation' function or is an 'Operation'. In EDAM, 'is_input_of' is not explicitly defined between EDAM concepts, only the inverse 'has_input'. false 'A is_input_of B' defines for the subject A, that it as a necessary or actual input or input argument of the object B. 'OBO_REL:participates_in' is narrower in the sense that it only relates ontological categories (concepts) that are a 'continuant' (snap:Continuant) with ontological categories that are a 'process' (span:Process), and broader in the sense that it relates any participating subjects not just inputs or input arguments. OBO_REL:participates_in In very unusual cases. true is output of OBO_REL:is_a false false Subject A can be any concept or entity outside of an ontology (or an ontology concept in a role of an entity being semantically annotated). Object B can either be a concept that is or has an 'Operation' function, or an entity outside of an ontology that has an 'Operation' function or is an 'Operation'. In EDAM, 'is_output_of' is not explicitly defined between EDAM concepts, only the inverse 'has_output'. edam false 'A is_output_of B' defines for the subject A, that it as a necessary or actual output or output argument of the object B. OBO_REL:participates_in http://wsio.org/is_output_of true relations "http://purl.obolibrary.org/obo/OBI_0000312" In very unusual cases. true OBO_REL:participates_in 'OBO_REL:participates_in' is narrower in the sense that it only relates ontological categories (concepts) that are a 'continuant' (snap:Continuant) with ontological categories that are a 'process' (span:Process), and broader in the sense that it relates any participating subjects not just outputs or output arguments. It is also not clear whether an output (result) actually participates in the process that generates it. is topic of 'A is_topic_of B' defines for the subject A, that it is a topic of the object B (a topic A is the scope of B). relations OBO_REL:quality_of false true false Subject A can either be a concept that is a 'Topic', or in unexpected cases an entity outside of an ontology (or an ontology concept in a role of an entity being semantically annotated) that is a 'Topic' or is in the role of a 'Topic'. Object B can be any concept or entity outside of an ontology. In EDAM, 'is_topic_of' is not explicitly defined between EDAM concepts, only the inverse 'has_topic'. http://www.loa-cnr.it/ontologies/DOLCE-Lite.owl#inherent-in false OBO_REL:is_a edam OBO_REL:quality_of Is defined anywhere? Not in the 'unknown' version of RO. 'OBO_REL:quality_of' might be seen narrower in the sense that it only relates subjects that are a 'quality' (snap:Quality) with objects that are an 'independent_continuant' (snap:IndependentContinuant), and is broader in the sense that it relates any qualities of the object. In very unusual cases. true Resource type beta12orEarlier beta12orEarlier A type of computational resource used in bioinformatics. true Data Information, represented in an information artefact (data record) that is 'understandable' by dedicated computational tools that can use the data as input or produce it as output. http://www.onto-med.de/ontologies/gfo.owl#Perpetuant http://semanticscience.org/resource/SIO_000088 http://semanticscience.org/resource/SIO_000069 "http://purl.obolibrary.org/obo/IAO_0000030" "http://purl.obolibrary.org/obo/IAO_0000027" Data set Data record beta12orEarlier http://wsio.org/data_002 http://purl.org/biotop/biotop.owl#DigitalEntity http://www.ifomis.org/bfo/1.1/snap#Continuant Datum Data set EDAM does not distinguish the multiplicity of data, such as one data item (datum) versus a collection of data (data set). Datum EDAM does not distinguish the multiplicity of data, such as one data item (datum) versus a collection of data (data set). Data record EDAM does not distinguish a data record (a tool-understandable information artefact) from data or datum (its content, the tool-understandable encoding of an information). Tool beta12orEarlier A bioinformatics package or tool, e.g. a standalone application or web service. beta12orEarlier true Database A digital data archive typically based around a relational model but sometimes using an object-oriented, tree or graph-based model. beta12orEarlier true beta12orEarlier Ontology beta12orEarlier Ontologies An ontology of biological or bioinformatics concepts and relations, a controlled vocabulary, structured glossary etc. Directory metadata 1.5 A directory on disk from which files are read. beta12orEarlier true MeSH vocabulary beta12orEarlier true Controlled vocabulary from National Library of Medicine. The MeSH thesaurus is used to index articles in biomedical journals for the Medline/PubMED databases. beta12orEarlier HGNC vocabulary beta12orEarlier beta12orEarlier Controlled vocabulary for gene names (symbols) from HUGO Gene Nomenclature Committee. true UMLS vocabulary Compendium of controlled vocabularies for the biomedical domain (Unified Medical Language System). beta12orEarlier beta12orEarlier true Identifier http://semanticscience.org/resource/SIO_000115 beta12orEarlier ID "http://purl.org/dc/elements/1.1/identifier" http://wsio.org/data_005 A text token, number or something else which identifies an entity, but which may not be persistent (stable) or unique (the same identifier may identify multiple things). Almost exact but limited to identifying resources. Database entry beta12orEarlier beta12orEarlier An entry (retrievable via URL) from a biological database. true Molecular mass Mass of a molecule. beta12orEarlier Molecular charge Net charge of a molecule. beta12orEarlier PDBML:pdbx_formal_charge Chemical formula Chemical structure specification A specification of a chemical structure. beta12orEarlier QSAR descriptor A QSAR quantitative descriptor (name-value pair) of chemical structure. QSAR descriptors have numeric values that quantify chemical information encoded in a symbolic representation of a molecule. They are used in quantitative structure activity relationship (QSAR) applications. Many subtypes of individual descriptors (not included in EDAM) cover various types of protein properties. beta12orEarlier Raw sequence beta12orEarlier A raw molecular sequence (string of characters) which might include ambiguity, unknown positions and non-sequence characters. Non-sequence characters may be used for example for gaps and translation stop. Sequence record http://purl.bioontology.org/ontology/MSH/D058977 beta12orEarlier A molecular sequence and associated metadata. SO:2000061 Sequence set A collection of multiple molecular sequences and associated metadata that do not (typically) correspond to molecular sequence database records or entries and which (typically) are derived from some analytical method. This concept may be used for arbitrary sequence sets and associated data arising from processing. beta12orEarlier SO:0001260 Sequence mask character true beta12orEarlier 1.5 A character used to replace (mask) other characters in a molecular sequence. Sequence mask type A label (text token) describing the type of sequence masking to perform. Sequence masking is where specific characters or positions in a molecular sequence are masked (replaced) with an another (mask character). The mask type indicates what is masked, for example regions that are not of interest or which are information-poor including acidic protein regions, basic protein regions, proline-rich regions, low compositional complexity regions, short-periodicity internal repeats, simple repeats and low complexity regions. Masked sequences are used in database search to eliminate statistically significant but biologically uninteresting hits. beta12orEarlier 1.5 true DNA sense specification DNA strand specification beta12orEarlier Strand The strand of a DNA sequence (forward or reverse). The forward or 'top' strand might specify a sequence is to be used as given, the reverse or 'bottom' strand specifying the reverse complement of the sequence is to be used. Sequence length specification true A specification of sequence length(s). beta12orEarlier 1.5 Sequence metadata beta12orEarlier Basic or general information concerning molecular sequences. This is used for such things as a report including the sequence identifier, type and length. 1.5 true Sequence feature source This might be the name and version of a software tool, the name of a database, or 'curated' to indicate a manual annotation (made by a human). How the annotation of a sequence feature (for example in EMBL or Swiss-Prot) was derived. beta12orEarlier Sequence search results beta12orEarlier Database hits (sequence) Sequence database hits Sequence search hits The score list includes the alignment score, percentage of the query sequence matched, length of the database sequence entry in this alignment, identifier of the database sequence entry, excerpt of the database sequence entry description etc. A report of sequence hits and associated data from searching a database of sequences (for example a BLAST search). This will typically include a list of scores (often with statistical evaluation) and a set of alignments for the hits. Sequence database search results Sequence signature matches Sequence motif matches Protein secondary database search results beta12orEarlier Report on the location of matches in one or more sequences to profiles, motifs (conserved or functional patterns) or other signatures. Sequence profile matches This ncluding reports of hits from a search of a protein secondary or domain database. Search results (protein secondary database) Sequence signature model Data files used by motif or profile methods. beta12orEarlier beta12orEarlier true Sequence signature data beta12orEarlier This can include metadata about a motif or sequence profile such as its name, length, technical details about the profile construction, and so on. Data concering concerning specific or conserved pattern in molecular sequences and the classifiers used for their identification, including sequence motifs, profiles or other diagnostic element. Sequence alignment (words) 1.5 beta12orEarlier true Sequence word alignment Alignment of exact matches between subsequences (words) within two or more molecular sequences. Dotplot A dotplot of sequence similarities identified from word-matching or character comparison. beta12orEarlier Sequence alignment http://en.wikipedia.org/wiki/Sequence_alignment http://purl.bioontology.org/ontology/MSH/D016415 http://semanticscience.org/resource/SIO_010066 beta12orEarlier Alignment of multiple molecular sequences. Sequence alignment parameter Some simple value controlling a sequence alignment (or similar 'match') operation. true 1.5 beta12orEarlier Sequence similarity score A value representing molecular sequence similarity. beta12orEarlier Sequence alignment metadata Report of general information on a sequence alignment, typically include a description, sequence identifiers and alignment score. beta12orEarlier true 1.5 Sequence alignment report Use this for any computer-generated reports on sequence alignments, and for general information (metadata) on a sequence alignment, such as a description, sequence identifiers and alignment score. An informative report of molecular sequence alignment-derived data or metadata. beta12orEarlier Sequence profile alignment beta12orEarlier A profile-profile alignment (each profile typically representing a sequence alignment). Sequence-profile alignment beta12orEarlier Alignment of one or more molecular sequence(s) to one or more sequence profile(s) (each profile typically representing a sequence alignment). Data associated with the alignment might also be included, e.g. ranked list of best-scoring sequences and a graphical representation of scores. Sequence distance matrix beta12orEarlier Moby:phylogenetic_distance_matrix A matrix of estimated evolutionary distance between molecular sequences, such as is suitable for phylogenetic tree calculation. Phylogenetic distance matrix Methods might perform character compatibility analysis or identify patterns of similarity in an alignment or data matrix. Phylogenetic character data Basic character data from which a phylogenetic tree may be generated. As defined, this concept would also include molecular sequences, microsatellites, polymorphisms (RAPDs, RFLPs, or AFLPs), restriction sites and fragments http://www.evolutionaryontology.org/cdao.owl#Character beta12orEarlier Phylogenetic tree Phylogeny Moby:Tree http://www.evolutionaryontology.org/cdao.owl#Tree A phylogenetic tree is usually constructed from a set of sequences from which an alignment (or data matrix) is calculated. See also 'Phylogenetic tree image'. http://purl.bioontology.org/ontology/MSH/D010802 Moby:phylogenetic_tree The raw data (not just an image) from which a phylogenetic tree is directly generated or plotted, such as topology, lengths (in time or in expected amounts of variance) and a confidence interval for each length. beta12orEarlier Moby:myTree Comparison matrix beta12orEarlier The comparison matrix might include matrix name, optional comment, height and width (or size) of matrix, an index row/column (of characters) and data rows/columns (of integers or floats). Matrix of integer or floating point numbers for amino acid or nucleotide sequence comparison. Substitution matrix Protein topology beta12orEarlier beta12orEarlier Predicted or actual protein topology represented as a string of protein secondary structure elements. true The location and size of the secondary structure elements and intervening loop regions is usually indicated. Protein features report (secondary structure) beta12orEarlier 1.8 true Secondary structure (predicted or real) of a protein. Protein features report (super-secondary) 1.8 Super-secondary structures include leucine zippers, coiled coils, Helix-Turn-Helix etc. true beta12orEarlier Super-secondary structure of protein sequence(s). Secondary structure alignment (protein) Alignment of the (1D representations of) secondary structure of two or more proteins. beta12orEarlier Secondary structure alignment metadata (protein) An informative report on protein secondary structure alignment-derived data or metadata. beta12orEarlier beta12orEarlier true RNA secondary structure An informative report of secondary structure (predicted or real) of an RNA molecule. This includes thermodynamically stable or evolutionarily conserved structures such as knots, pseudoknots etc. Moby:RNAStructML Secondary structure (RNA) beta12orEarlier Secondary structure alignment (RNA) Moby:RNAStructAlignmentML Alignment of the (1D representations of) secondary structure of two or more RNA molecules. beta12orEarlier Secondary structure alignment metadata (RNA) true beta12orEarlier An informative report of RNA secondary structure alignment-derived data or metadata. beta12orEarlier Structure beta12orEarlier Coordinate model Structure data The coordinate data may be predicted or real. http://purl.bioontology.org/ontology/MSH/D015394 3D coordinate and associated data for a macromolecular tertiary (3D) structure or part of a structure. Tertiary structure record true beta12orEarlier beta12orEarlier An entry from a molecular tertiary (3D) structure database. Structure database search results 1.8 Results (hits) from searching a database of tertiary structure. beta12orEarlier true Structure alignment Alignment (superimposition) of molecular tertiary (3D) structures. A tertiary structure alignment will include the untransformed coordinates of one macromolecule, followed by the second (or subsequent) structure(s) with all the coordinates transformed (by rotation / translation) to give a superposition. beta12orEarlier Structure alignment report beta12orEarlier This is a broad data type and is used a placeholder for other, more specific types. An informative report of molecular tertiary structure alignment-derived data. Structure similarity score beta12orEarlier A value representing molecular structure similarity, measured from structure alignment or some other type of structure comparison. Structural profile beta12orEarlier 3D profile Some type of structural (3D) profile or template (representing a structure or structure alignment). Structural (3D) profile Structural (3D) profile alignment beta12orEarlier Structural profile alignment A 3D profile-3D profile alignment (each profile representing structures or a structure alignment). Sequence-3D profile alignment Sequence-structural profile alignment 1.5 An alignment of a sequence to a 3D profile (representing structures or a structure alignment). beta12orEarlier true Protein sequence-structure scoring matrix beta12orEarlier Matrix of values used for scoring sequence-structure compatibility. Sequence-structure alignment beta12orEarlier An alignment of molecular sequence to structure (from threading sequence(s) through 3D structure or representation of structure(s)). Amino acid annotation An informative report about a specific amino acid. 1.4 true beta12orEarlier Peptide annotation 1.4 true An informative report about a specific peptide. beta12orEarlier Protein report Gene product annotation beta12orEarlier An informative human-readable report about one or more specific protein molecules or protein structural domains, derived from analysis of primary (sequence or structural) data. Protein property Protein physicochemical property A report of primarily non-positional data describing intrinsic physical, chemical or other properties of a protein molecule or model. beta12orEarlier Protein sequence statistics Protein properties The report may be based on analysis of nucleic acid sequence or structural data. This is a broad data type and is used a placeholder for other, more specific types. Protein structural motifs and surfaces true 1.8 3D structural motifs in a protein. beta12orEarlier Protein 3D motifs Protein domain classification true Data concerning the classification of the sequences and/or structures of protein structural domain(s). 1.5 beta12orEarlier Protein features report (domains) true structural domains or 3D folds in a protein or polypeptide chain. 1.8 beta12orEarlier Protein architecture report 1.4 An informative report on architecture (spatial arrangement of secondary structure) of a protein structure. Protein property (architecture) Protein structure report (architecture) beta12orEarlier true Protein folding report beta12orEarlier A report on an analysis or model of protein folding properties, folding pathways, residues or sites that are key to protein folding, nucleation or stabilization centers etc. true 1.8 Protein features (mutation) This is a broad data type and is used a placeholder for other, more specific types. It is primarily intended to help navigation of EDAM and would not typically be used for annotation. Data on the effect of (typically point) mutation on protein folding, stability, structure and function. true beta12orEarlier Protein property (mutation) Protein structure report (mutation) beta13 Protein report (mutation) Protein interaction raw data This is a broad data type and is used a placeholder for other, more specific types. It is primarily intended to help navigation of EDAM and would not typically be used for annotation. Protein-protein interaction data from for example yeast two-hybrid analysis, protein microarrays, immunoaffinity chromatography followed by mass spectrometry, phage display etc. beta12orEarlier Protein interaction report beta12orEarlier Protein report (interaction) Protein interaction record An informative report on the interactions (predicted or known) of a protein, protein domain or part of a protein with some other molecule(s), which might be another protein, nucleic acid or some other ligand. Protein family report beta12orEarlier An informative report on a specific protein family or other classification or group of protein sequences or structures. Protein family annotation Protein classification data Vmax beta12orEarlier The maximum initial velocity or rate of a reaction. It is the limiting velocity as substrate concentrations get very large. Km Km is the concentration (usually in Molar units) of substrate that leads to half-maximal velocity of an enzyme-catalysed reaction. beta12orEarlier Nucleotide base annotation beta12orEarlier true An informative report about a specific nucleotide base. 1.4 Nucleic acid property A report of primarily non-positional data describing intrinsic physical, chemical or other properties of a nucleic acid molecule. The report may be based on analysis of nucleic acid sequence or structural data. This is a broad data type and is used a placeholder for other, more specific types. Nucleic acid physicochemical property beta12orEarlier Codon usage data beta12orEarlier Data derived from analysis of codon usage (typically a codon usage table) of DNA sequences. This is a broad data type and is used a placeholder for other, more specific types. Gene report Gene structure (repot) A report on predicted or actual gene structure, regions which make an RNA product and features such as promoters, coding regions, splice sites etc. Gene and transcript structure (report) Gene features report Nucleic acid features (gene and transcript structure) Moby:gene This includes any report on a particular locus or gene. This might include the gene name, description, summary and so on. It can include details about the function of a gene, such as its encoded protein or a functional classification of the gene sequence along according to the encoded protein(s). Gene annotation beta12orEarlier Moby_namespace:Human_Readable_Description Gene function (report) Moby:GeneInfo Gene classification beta12orEarlier true A report on the classification of nucleic acid / gene sequences according to the functional classification of their gene products. beta12orEarlier DNA variation stable, naturally occuring mutations in a nucleotide sequence including alleles, naturally occurring mutations such as single base nucleotide substitutions, deletions and insertions, RFLPs and other polymorphisms. true 1.8 beta12orEarlier Chromosome report beta12orEarlier An informative report on a specific chromosome. This includes basic information. e.g. chromosome number, length, karyotype features, chromosome sequence etc. Genotype/phenotype report An informative report on the set of genes (or allelic forms) present in an individual, organism or cell and associated with a specific physical characteristic, or a report concerning an organisms traits and phenotypes. Genotype/phenotype annotation beta12orEarlier Nucleic acid features report (primers) true 1.8 beta12orEarlier PCR primers and hybridization oligos in a nucleic acid sequence. PCR experiment report true beta12orEarlier PCR experiments, e.g. quantitative real-time PCR. 1.8 Sequence trace Fluorescence trace data generated by an automated DNA sequencer, which can be interprted as a molecular sequence (reads), given associated sequencing metadata such as base-call quality scores. This is the raw data produced by a DNA sequencing machine. beta12orEarlier Sequence assembly beta12orEarlier An assembly of fragments of a (typically genomic) DNA sequence. http://en.wikipedia.org/wiki/Sequence_assembly SO:0001248 Typically, an assembly is a collection of contigs (for example ESTs and genomic DNA fragments) that are ordered, aligned and merged. Annotation of the assembled sequence might be included. SO:0000353 SO:0001248 Perhaps surprisingly, the definition of 'SO:assembly' is narrower than the 'SO:sequence_assembly'. Radiation Hybrid (RH) scores beta12orEarlier Radiation Hybrid (RH) scores are used in Radiation Hybrid mapping. Radiation hybrid scores (RH) scores for one or more markers. Genetic linkage report beta12orEarlier Gene annotation (linkage) Linkage disequilibrium (report) An informative report on the linkage of alleles. This includes linkage disequilibrium; the non-random association of alleles or polymorphisms at two or more loci (not necessarily on the same chromosome). Gene expression profile Data quantifying the level of expression of (typically) multiple genes, derived for example from microarray experiments. beta12orEarlier Gene expression pattern Microarray experiment report true microarray experiments including conditions, protocol, sample:data relationships etc. 1.8 beta12orEarlier Oligonucleotide probe data beta12orEarlier beta13 true Data on oligonucleotide probes (typically for use with DNA microarrays). SAGE experimental data beta12orEarlier true Output from a serial analysis of gene expression (SAGE) experiment. Serial analysis of gene expression (SAGE) experimental data beta12orEarlier MPSS experimental data beta12orEarlier Massively parallel signature sequencing (MPSS) data. beta12orEarlier Massively parallel signature sequencing (MPSS) experimental data true SBS experimental data beta12orEarlier beta12orEarlier true Sequencing by synthesis (SBS) experimental data Sequencing by synthesis (SBS) data. Sequence tag profile (with gene assignment) beta12orEarlier Tag to gene assignments (tag mapping) of SAGE, MPSS and SBS data. Typically this is the sequencing-based expression profile annotated with gene identifiers. Protein X-ray crystallographic data X-ray crystallography data. beta12orEarlier Protein NMR data Protein nuclear magnetic resonance (NMR) raw data. beta12orEarlier Protein circular dichroism (CD) spectroscopic data beta12orEarlier Protein secondary structure from protein coordinate or circular dichroism (CD) spectroscopic data. Electron microscopy volume map beta12orEarlier Volume map data from electron microscopy. EM volume map Electron microscopy model beta12orEarlier Annotation on a structural 3D model (volume map) from electron microscopy. This might include the location in the model of the known features of a particular macromolecule. 2D PAGE image beta12orEarlier Two-dimensional gel electrophoresis image Mass spectrometry spectra beta12orEarlier Spectra from mass spectrometry. Peptide mass fingerprint Peak list Protein fingerprint A set of peptide masses (peptide mass fingerprint) from mass spectrometry. beta12orEarlier Peptide identification Protein or peptide identifications with evidence supporting the identifications, typically from comparing a peptide mass fingerprint (from mass spectrometry) to a sequence database. beta12orEarlier Pathway or network annotation beta12orEarlier true An informative report about a specific biological pathway or network, typically including a map (diagram) of the pathway. beta12orEarlier Biological pathway map beta12orEarlier true A map (typically a diagram) of a biological pathway. beta12orEarlier Data resource definition beta12orEarlier true 1.5 A definition of a data resource serving one or more types of data, including metadata and links to the resource or data proper. Workflow metadata Basic information, annotation or documentation concerning a workflow (but not the workflow itself). beta12orEarlier Mathematical model Biological model beta12orEarlier A biological model represented in mathematical terms. Statistical estimate score beta12orEarlier A value representing estimated statistical significance of some observed data; typically sequence database hits. EMBOSS database resource definition beta12orEarlier Resource definition for an EMBOSS database. true 1.5 Version information "http://purl.obolibrary.org/obo/IAO_0000129" 1.5 Development status / maturity may be part of the version information, for example in case of tools, standards, or some data records. http://www.ebi.ac.uk/swo/maturity/SWO_9000061 beta12orEarlier Information on a version of software or data, for example name, version number and release date. http://semanticscience.org/resource/SIO_000653 true http://usefulinc.com/ns/doap#Version Database cross-mapping beta12orEarlier A mapping of the accession numbers (or other database identifier) of entries between (typically) two biological or biomedical databases. The cross-mapping is typically a table where each row is an accession number and each column is a database being cross-referenced. The cells give the accession number or identifier of the corresponding entry in a database. If a cell in the table is not filled then no mapping could be found for the database. Additional information might be given on version, date etc. Data index An index of data of biological relevance. beta12orEarlier Data index report A report of an analysis of an index of biological data. Database index annotation beta12orEarlier Database metadata Basic information on bioinformatics database(s) or other data sources such as name, type, description, URL etc. beta12orEarlier Tool metadata beta12orEarlier Basic information about one or more bioinformatics applications or packages, such as name, type, description, or other documentation. Job metadata beta12orEarlier true 1.5 Moby:PDGJOB Textual metadata on a submitted or completed job. User metadata beta12orEarlier Textual metadata on a software author or end-user, for example a person or other software. Small molecule report Small molecule annotation Small molecule report Chemical structure report An informative report on a specific chemical compound. beta12orEarlier Chemical compound annotation Cell line report Organism strain data Cell line annotation Report on a particular strain of organism cell line including plants, virus, fungi and bacteria. The data typically includes strain number, organism type, growth conditions, source and so on. beta12orEarlier Scent annotation beta12orEarlier An informative report about a specific scent. 1.4 true Ontology term Ontology class name beta12orEarlier A term (name) from an ontology. Ontology terms Ontology concept data beta12orEarlier Ontology class metadata Ontology term metadata Data concerning or derived from a concept from a biological ontology. Keyword Phrases Keyword(s) or phrase(s) used (typically) for text-searching purposes. Boolean operators (AND, OR and NOT) and wildcard characters may be allowed. Moby:QueryString beta12orEarlier Moby:BooleanQueryString Moby:Wildcard_Query Moby:Global_Keyword Terms Text Citation Bibliographic data that uniquely identifies a scientific article, book or other published material. A bibliographic reference might include information such as authors, title, journal name, date and (possibly) a link to the abstract or full-text of the article if available. Moby:GCP_SimpleCitation Reference Bibliographic reference Moby:Publication beta12orEarlier Article A document of scientific text, typically a full text article from a scientific journal. beta12orEarlier Text mining report An abstract of the results of text mining. beta12orEarlier Text mining output A text mining abstract will typically include an annotated a list of words or sentences extracted from one or more scientific articles. Entity identifier beta12orEarlier true beta12orEarlier An identifier of a biological entity or phenomenon. Data resource identifier true An identifier of a data resource. beta12orEarlier beta12orEarlier Identifier (typed) beta12orEarlier This concept exists only to assist EDAM maintenance and navigation in graphical browsers. It does not add semantic information. This branch provides an alternative organisation of the concepts nested under 'Accession' and 'Name'. All concepts under here are already included under 'Accession' or 'Name'. An identifier that identifies a particular type of data. Tool identifier An identifier of a bioinformatics tool, e.g. an application or web service. beta12orEarlier Discrete entity identifier beta12orEarlier true beta12orEarlier Name or other identifier of a discrete entity (any biological thing with a distinct, discrete physical existence). Entity feature identifier true beta12orEarlier Name or other identifier of an entity feature (a physical part or region of a discrete biological entity, or a feature that can be mapped to such a thing). beta12orEarlier Entity collection identifier beta12orEarlier true beta12orEarlier Name or other identifier of a collection of discrete biological entities. Phenomenon identifier beta12orEarlier true beta12orEarlier Name or other identifier of a physical, observable biological occurrence or event. Molecule identifier Name or other identifier of a molecule. beta12orEarlier Atom ID Atom identifier Identifier (e.g. character symbol) of a specific atom. beta12orEarlier Molecule name Name of a specific molecule. beta12orEarlier Molecule type For example, 'Protein', 'DNA', 'RNA' etc. true 1.5 beta12orEarlier A label (text token) describing the type a molecule. Protein|DNA|RNA Chemical identifier true beta12orEarlier beta12orEarlier Unique identifier of a chemical compound. Chromosome name beta12orEarlier Name of a chromosome. Peptide identifier Identifier of a peptide chain. beta12orEarlier Protein identifier beta12orEarlier Identifier of a protein. Compound name Chemical name Unique name of a chemical compound. beta12orEarlier Chemical registry number beta12orEarlier Unique registry number of a chemical compound. Ligand identifier true beta12orEarlier Code word for a ligand, for example from a PDB file. beta12orEarlier Drug identifier beta12orEarlier Identifier of a drug. Amino acid identifier Identifier of an amino acid. beta12orEarlier Residue identifier Nucleotide identifier beta12orEarlier Name or other identifier of a nucleotide. Monosaccharide identifier beta12orEarlier Identifier of a monosaccharide. Chemical name (ChEBI) ChEBI chemical name Unique name from Chemical Entities of Biological Interest (ChEBI) of a chemical compound. beta12orEarlier This is the recommended chemical name for use for example in database annotation. Chemical name (IUPAC) IUPAC recommended name of a chemical compound. IUPAC chemical name beta12orEarlier Chemical name (INN) INN chemical name beta12orEarlier International Non-proprietary Name (INN or 'generic name') of a chemical compound, assigned by the World Health Organization (WHO). Chemical name (brand) Brand name of a chemical compound. Brand chemical name beta12orEarlier Chemical name (synonymous) beta12orEarlier Synonymous chemical name Synonymous name of a chemical compound. Chemical registry number (CAS) CAS chemical registry number CAS registry number of a chemical compound. beta12orEarlier Chemical registry number (Beilstein) Beilstein chemical registry number beta12orEarlier Beilstein registry number of a chemical compound. Chemical registry number (Gmelin) Gmelin chemical registry number beta12orEarlier Gmelin registry number of a chemical compound. HET group name 3-letter code word for a ligand (HET group) from a PDB file, for example ATP. Short ligand name Component identifier code beta12orEarlier Amino acid name String of one or more ASCII characters representing an amino acid. beta12orEarlier Nucleotide code beta12orEarlier String of one or more ASCII characters representing a nucleotide. Polypeptide chain ID beta12orEarlier WHATIF: chain Chain identifier Identifier of a polypeptide chain from a protein. PDBML:pdbx_PDB_strand_id Protein chain identifier PDB strand id PDB chain identifier This is typically a character (for the chain) appended to a PDB identifier, e.g. 1cukA Polypeptide chain identifier Protein name Name of a protein. beta12orEarlier Enzyme identifier beta12orEarlier Name or other identifier of an enzyme or record from a database of enzymes. EC number [0-9]+\.-\.-\.-|[0-9]+\.[0-9]+\.-\.-|[0-9]+\.[0-9]+\.[0-9]+\.-|[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+ EC code Moby:EC_Number An Enzyme Commission (EC) number of an enzyme. EC Moby:Annotated_EC_Number beta12orEarlier Enzyme Commission number Enzyme name Name of an enzyme. beta12orEarlier Restriction enzyme name Name of a restriction enzyme. beta12orEarlier Sequence position specification 1.5 A specification (partial or complete) of one or more positions or regions of a molecular sequence or map. beta12orEarlier true Sequence feature ID A unique identifier of molecular sequence feature, for example an ID of a feature that is unique within the scope of the GFF file. beta12orEarlier Sequence position WHATIF: number WHATIF: PDBx_atom_site beta12orEarlier PDBML:_atom_site.id SO:0000735 A position of one or more points (base or residue) in a sequence, or part of such a specification. Sequence range beta12orEarlier Specification of range(s) of sequence positions. Nucleic acid feature identifier beta12orEarlier beta12orEarlier Name or other identifier of an nucleic acid feature. true Protein feature identifier Name or other identifier of a protein feature. true beta12orEarlier beta12orEarlier Sequence feature key Sequence feature method The type of a sequence feature, typically a term or accession from the Sequence Ontology, for example an EMBL or Swiss-Prot sequence feature key. Sequence feature type beta12orEarlier A feature key indicates the biological nature of the feature or information about changes to or versions of the sequence. Sequence feature qualifier beta12orEarlier Typically one of the EMBL or Swiss-Prot feature qualifiers. Feature qualifiers hold information about a feature beyond that provided by the feature key and location. Sequence feature label Sequence feature name Typically an EMBL or Swiss-Prot feature label. A feature label identifies a feature of a sequence database entry. When used with the database name and the entry's primary accession number, it is a unique identifier of that feature. beta12orEarlier EMBOSS Uniform Feature Object beta12orEarlier UFO The name of a sequence feature-containing entity adhering to the standard feature naming scheme used by all EMBOSS applications. Codon name beta12orEarlier beta12orEarlier String of one or more ASCII characters representing a codon. true Gene identifier Moby:GeneAccessionList An identifier of a gene, such as a name/symbol or a unique identifier of a gene in a database. beta12orEarlier Gene symbol Moby_namespace:Global_GeneSymbol beta12orEarlier Moby_namespace:Global_GeneCommonName The short name of a gene; a single word that does not contain white space characters. It is typically derived from the gene name. Gene ID (NCBI) NCBI geneid Gene identifier (NCBI) http://www.geneontology.org/doc/GO.xrf_abbs:NCBI_Gene Entrez gene ID Gene identifier (Entrez) http://www.geneontology.org/doc/GO.xrf_abbs:LocusID An NCBI unique identifier of a gene. NCBI gene ID beta12orEarlier Gene identifier (NCBI RefSeq) beta12orEarlier true beta12orEarlier An NCBI RefSeq unique identifier of a gene. Gene identifier (NCBI UniGene) beta12orEarlier An NCBI UniGene unique identifier of a gene. beta12orEarlier true Gene identifier (Entrez) An Entrez unique identifier of a gene. beta12orEarlier true [0-9]+ beta12orEarlier Gene ID (CGD) CGD ID Identifier of a gene or feature from the CGD database. beta12orEarlier Gene ID (DictyBase) beta12orEarlier Identifier of a gene from DictyBase. Ensembl gene ID beta12orEarlier Gene ID (Ensembl) Unique identifier for a gene (or other feature) from the Ensembl database. Gene ID (SGD) Identifier of an entry from the SGD database. S[0-9]+ SGD identifier beta12orEarlier Gene ID (GeneDB) Moby_namespace:GeneDB GeneDB identifier beta12orEarlier [a-zA-Z_0-9\.-]* Identifier of a gene from the GeneDB database. TIGR identifier beta12orEarlier Identifier of an entry from the TIGR database. TAIR accession (gene) Gene:[0-9]{7} beta12orEarlier Identifier of an gene from the TAIR database. Protein domain ID beta12orEarlier Identifier of a protein structural domain. This is typically a character or string concatenated with a PDB identifier and a chain identifier. SCOP domain identifier Identifier of a protein domain (or other node) from the SCOP database. beta12orEarlier CATH domain ID 1nr3A00 beta12orEarlier CATH domain identifier Identifier of a protein domain from CATH. SCOP concise classification string (sccs) A SCOP concise classification string (sccs) is a compact representation of a SCOP domain classification. beta12orEarlier An scss includes the class (alphabetical), fold, superfamily and family (all numerical) to which a given domain belongs. SCOP sunid Unique identifier (number) of an entry in the SCOP hierarchy, for example 33229. beta12orEarlier A sunid uniquely identifies an entry in the SCOP hierarchy, including leaves (the SCOP domains) and higher level nodes including entries corresponding to the protein level. sunid SCOP unique identifier 33229 CATH node ID 3.30.1190.10.1.1.1.1.1 CATH code A code number identifying a node from the CATH database. CATH node identifier beta12orEarlier Kingdom name The name of a biological kingdom (Bacteria, Archaea, or Eukaryotes). beta12orEarlier Species name The name of a species (typically a taxonomic group) of organism. Organism species beta12orEarlier Strain name beta12orEarlier The name of a strain of an organism variant, typically a plant, virus or bacterium. URI A string of characters that name or otherwise identify a resource on the Internet. URIs beta12orEarlier Database ID An identifier of a biological or bioinformatics database. Database identifier beta12orEarlier Directory name beta12orEarlier The name of a directory. File name The name (or part of a name) of a file (of any type). beta12orEarlier Ontology name beta12orEarlier Name of an ontology of biological or bioinformatics concepts and relations. URL A Uniform Resource Locator (URL). Moby:URL Moby:Link beta12orEarlier URN beta12orEarlier A Uniform Resource Name (URN). LSID beta12orEarlier LSIDs provide a standard way to locate and describe data. An LSID is represented as a Uniform Resource Name (URN) with the following format: URN:LSID:<Authority>:<Namespace>:<ObjectID>[:<Version>] Life Science Identifier A Life Science Identifier (LSID) - a unique identifier of some data. Database name The name of a biological or bioinformatics database. beta12orEarlier Sequence database name The name of a molecular sequence database. true beta13 beta12orEarlier Enumerated file name beta12orEarlier The name of a file (of any type) with restricted possible values. File name extension The extension of a file name. A file extension is the characters appearing after the final '.' in the file name. beta12orEarlier File base name beta12orEarlier The base name of a file. A file base name is the file name stripped of its directory specification and extension. QSAR descriptor name beta12orEarlier Name of a QSAR descriptor. Database entry identifier true This concept is required for completeness. It should never have child concepts. beta12orEarlier An identifier of an entry from a database where the same type of identifier is used for objects (data) of different semantic type. beta12orEarlier Sequence identifier An identifier of molecular sequence(s) or entries from a molecular sequence database. beta12orEarlier Sequence set ID An identifier of a set of molecular sequence(s). beta12orEarlier Sequence signature identifier beta12orEarlier beta12orEarlier true Identifier of a sequence signature (motif or profile) for example from a database of sequence patterns. Sequence alignment ID Identifier of a molecular sequence alignment, for example a record from an alignment database. beta12orEarlier Phylogenetic distance matrix identifier beta12orEarlier Identifier of a phylogenetic distance matrix. true beta12orEarlier Phylogenetic tree ID beta12orEarlier Identifier of a phylogenetic tree for example from a phylogenetic tree database. Comparison matrix identifier An identifier of a comparison matrix. Substitution matrix identifier beta12orEarlier Structure ID beta12orEarlier A unique and persistent identifier of a molecular tertiary structure, typically an entry from a structure database. Structural (3D) profile ID Structural profile identifier Identifier or name of a structural (3D) profile or template (representing a structure or structure alignment). beta12orEarlier Structure alignment ID beta12orEarlier Identifier of an entry from a database of tertiary structure alignments. Amino acid index ID Identifier of an index of amino acid physicochemical and biochemical property data. beta12orEarlier Protein interaction ID beta12orEarlier Molecular interaction ID Identifier of a report of protein interactions from a protein interaction database (typically). Protein family identifier Protein secondary database record identifier Identifier of a protein family. beta12orEarlier Codon usage table name Unique name of a codon usage table. beta12orEarlier Transcription factor identifier Identifier of a transcription factor (or a TF binding site). beta12orEarlier Experiment annotation ID beta12orEarlier Identifier of an entry from a database of microarray data. Electron microscopy model ID Identifier of an entry from a database of electron microscopy data. beta12orEarlier Gene expression report ID Accession of a report of gene expression (e.g. a gene expression profile) from a database. beta12orEarlier Gene expression profile identifier Genotype and phenotype annotation ID Identifier of an entry from a database of genotypes and phenotypes. beta12orEarlier Pathway or network identifier Identifier of an entry from a database of biological pathways or networks. beta12orEarlier Workflow ID beta12orEarlier Identifier of a biological or biomedical workflow, typically from a database of workflows. Data resource definition ID beta12orEarlier Identifier of a data type definition from some provider. Data resource definition identifier Biological model ID Biological model identifier beta12orEarlier Identifier of a mathematical model, typically an entry from a database. Compound identifier beta12orEarlier Chemical compound identifier Identifier of an entry from a database of chemicals. Small molecule identifier Ontology concept ID A unique (typically numerical) identifier of a concept in an ontology of biological or bioinformatics concepts and relations. beta12orEarlier Ontology concept ID Article ID beta12orEarlier Unique identifier of a scientific article. Article identifier FlyBase ID Identifier of an object from the FlyBase database. FB[a-zA-Z_0-9]{2}[0-9]{7} beta12orEarlier WormBase name Name of an object from the WormBase database, usually a human-readable name. beta12orEarlier WormBase class beta12orEarlier Class of an object from the WormBase database. A WormBase class describes the type of object such as 'sequence' or 'protein'. Sequence accession beta12orEarlier A persistent, unique identifier of a molecular sequence database entry. Sequence accession number Sequence type 1.5 Sequence type might reflect the molecule (protein, nucleic acid etc) or the sequence itself (gapped, ambiguous etc). A label (text token) describing a type of molecular sequence. true beta12orEarlier EMBOSS Uniform Sequence Address EMBOSS USA beta12orEarlier The name of a sequence-based entity adhering to the standard sequence naming scheme used by all EMBOSS applications. Sequence accession (protein) Accession number of a protein sequence database entry. Protein sequence accession number beta12orEarlier Sequence accession (nucleic acid) Accession number of a nucleotide sequence database entry. beta12orEarlier Nucleotide sequence accession number RefSeq accession Accession number of a RefSeq database entry. beta12orEarlier RefSeq ID (NC|AC|NG|NT|NW|NZ|NM|NR|XM|XR|NP|AP|XP|YP|ZP)_[0-9]+ UniProt accession (extended) true Accession number of a UniProt (protein sequence) database entry. May contain version or isoform number. [A-NR-Z][0-9][A-Z][A-Z0-9][A-Z0-9][0-9]|[OPQ][0-9][A-Z0-9][A-Z0-9][A-Z0-9][0-9]|[A-NR-Z][0-9][A-Z][A-Z0-9][A-Z0-9][0-9].[0-9]+|[OPQ][0-9][A-Z0-9][A-Z0-9][A-Z0-9][0-9].[0-9]+|[A-NR-Z][0-9][A-Z][A-Z0-9][A-Z0-9][0-9]-[0-9]+|[OPQ][0-9][A-Z0-9][A-Z0-9][A-Z0-9][0-9]-[0-9]+ beta12orEarlier Q7M1G0|P43353-2|P01012.107 1.0 PIR identifier An identifier of PIR sequence database entry. beta12orEarlier PIR ID PIR accession number TREMBL accession beta12orEarlier Identifier of a TREMBL sequence database entry. true 1.2 Gramene primary identifier beta12orEarlier Gramene primary ID Primary identifier of a Gramene database entry. EMBL/GenBank/DDBJ ID Identifier of a (nucleic acid) entry from the EMBL/GenBank/DDBJ databases. beta12orEarlier Sequence cluster ID (UniGene) UniGene identifier UniGene cluster id UniGene ID UniGene cluster ID beta12orEarlier A unique identifier of an entry (gene cluster) from the NCBI UniGene database. dbEST accession dbEST ID Identifier of a dbEST database entry. beta12orEarlier dbSNP ID beta12orEarlier dbSNP identifier Identifier of a dbSNP database entry. EMBOSS sequence type beta12orEarlier true See the EMBOSS documentation (http://emboss.sourceforge.net/) for a definition of what this includes. beta12orEarlier The EMBOSS type of a molecular sequence. EMBOSS listfile 1.5 List of EMBOSS Uniform Sequence Addresses (EMBOSS listfile). true beta12orEarlier Sequence cluster ID An identifier of a cluster of molecular sequence(s). beta12orEarlier Sequence cluster ID (COG) COG ID beta12orEarlier Unique identifier of an entry from the COG database. Sequence motif identifier Identifier of a sequence motif, for example an entry from a motif database. beta12orEarlier Sequence profile ID Identifier of a sequence profile. beta12orEarlier A sequence profile typically represents a sequence alignment. ELM ID Identifier of an entry from the ELMdb database of protein functional sites. beta12orEarlier Prosite accession number beta12orEarlier Accession number of an entry from the Prosite database. PS[0-9]{5} Prosite ID HMMER hidden Markov model ID Unique identifier or name of a HMMER hidden Markov model. beta12orEarlier JASPAR profile ID beta12orEarlier Unique identifier or name of a profile from the JASPAR database. Sequence alignment type beta12orEarlier 1.5 true Possible values include for example the EMBOSS alignment types, BLAST alignment types and so on. A label (text token) describing the type of a sequence alignment. BLAST sequence alignment type true beta12orEarlier beta12orEarlier The type of a BLAST sequence alignment. Phylogenetic tree type For example 'nj', 'upgmp' etc. beta12orEarlier true A label (text token) describing the type of a phylogenetic tree. 1.5 nj|upgmp TreeBASE study accession number Accession number of an entry from the TreeBASE database. beta12orEarlier TreeFam accession number beta12orEarlier Accession number of an entry from the TreeFam database. Comparison matrix type 1.5 true beta12orEarlier blosum|pam|gonnet|id A label (text token) describing the type of a comparison matrix. Substitution matrix type For example 'blosum', 'pam', 'gonnet', 'id' etc. Comparison matrix type may be required where a series of matrices of a certain type are used. Comparison matrix name beta12orEarlier Substitution matrix name See for example http://www.ebi.ac.uk/Tools/webservices/help/matrix. Unique name or identifier of a comparison matrix. PDB ID An identifier of an entry from the PDB database. [a-zA-Z_0-9]{4} PDBID PDB identifier beta12orEarlier AAindex ID beta12orEarlier Identifier of an entry from the AAindex database. BIND accession number Accession number of an entry from the BIND database. beta12orEarlier IntAct accession number EBI\-[0-9]+ beta12orEarlier Accession number of an entry from the IntAct database. Protein family name beta12orEarlier Name of a protein family. InterPro entry name beta12orEarlier Name of an InterPro entry, usually indicating the type of protein matches for that entry. InterPro accession Primary accession number of an InterPro entry. InterPro primary accession Every InterPro entry has a unique accession number to provide a persistent citation of database records. beta12orEarlier InterPro primary accession number IPR015590 IPR[0-9]{6} InterPro secondary accession Secondary accession number of an InterPro entry. beta12orEarlier InterPro secondary accession number Gene3D ID beta12orEarlier Unique identifier of an entry from the Gene3D database. PIRSF ID PIRSF[0-9]{6} beta12orEarlier Unique identifier of an entry from the PIRSF database. PRINTS code beta12orEarlier PR[0-9]{5} The unique identifier of an entry in the PRINTS database. Pfam accession number PF[0-9]{5} Accession number of a Pfam entry. beta12orEarlier SMART accession number Accession number of an entry from the SMART database. beta12orEarlier SM[0-9]{5} Superfamily hidden Markov model number Unique identifier (number) of a hidden Markov model from the Superfamily database. beta12orEarlier TIGRFam ID TIGRFam accession number Accession number of an entry (family) from the TIGRFam database. beta12orEarlier ProDom accession number A ProDom domain family accession number. PD[0-9]+ beta12orEarlier ProDom is a protein domain family database. TRANSFAC accession number beta12orEarlier Identifier of an entry from the TRANSFAC database. ArrayExpress accession number Accession number of an entry from the ArrayExpress database. beta12orEarlier [AEP]-[a-zA-Z_0-9]{4}-[0-9]+ ArrayExpress experiment ID PRIDE experiment accession number [0-9]+ beta12orEarlier PRIDE experiment accession number. EMDB ID beta12orEarlier Identifier of an entry from the EMDB electron microscopy database. GEO accession number Accession number of an entry from the GEO database. o^GDS[0-9]+ beta12orEarlier GermOnline ID beta12orEarlier Identifier of an entry from the GermOnline database. EMAGE ID Identifier of an entry from the EMAGE database. beta12orEarlier Disease ID Identifier of an entry from a database of disease. beta12orEarlier HGVbase ID Identifier of an entry from the HGVbase database. beta12orEarlier HIVDB identifier true beta12orEarlier Identifier of an entry from the HIVDB database. beta12orEarlier OMIM ID beta12orEarlier [*#+%^]?[0-9]{6} Identifier of an entry from the OMIM database. KEGG object identifier beta12orEarlier Unique identifier of an object from one of the KEGG databases (excluding the GENES division). Pathway ID (reactome) Identifier of an entry from the Reactome database. Reactome ID beta12orEarlier REACT_[0-9]+(\.[0-9]+)? Pathway ID (aMAZE) beta12orEarlier aMAZE ID true beta12orEarlier Identifier of an entry from the aMAZE database. Pathway ID (BioCyc) BioCyc pathway ID beta12orEarlier Identifier of an pathway from the BioCyc biological pathways database. Pathway ID (INOH) beta12orEarlier INOH identifier Identifier of an entry from the INOH database. Pathway ID (PATIKA) Identifier of an entry from the PATIKA database. PATIKA ID beta12orEarlier Pathway ID (CPDB) This concept refers to identifiers used by the databases collated in CPDB; CPDB identifiers are not independently defined. CPDB ID Identifier of an entry from the CPDB (ConsensusPathDB) biological pathways database, which is an identifier from an external database integrated into CPDB. beta12orEarlier Pathway ID (Panther) Identifier of a biological pathway from the Panther Pathways database. beta12orEarlier PTHR[0-9]{5} Panther Pathways ID MIRIAM identifier Unique identifier of a MIRIAM data resource. MIR:00100005 MIR:[0-9]{8} beta12orEarlier This is the identifier used internally by MIRIAM for a data type. MIRIAM data type name beta12orEarlier The name of a data type from the MIRIAM database. MIRIAM URI beta12orEarlier The URI (URL or URN) of a data entity from the MIRIAM database. identifiers.org synonym urn:miriam:pubmed:16333295|urn:miriam:obo.go:GO%3A0045202 A MIRIAM URI consists of the URI of the MIRIAM data type (PubMed, UniProt etc) followed by the identifier of an element of that data type, for example PMID for a publication or an accession number for a GO term. MIRIAM data type primary name beta12orEarlier The primary name of a MIRIAM data type is taken from a controlled vocabulary. UniProt|Enzyme Nomenclature The primary name of a data type from the MIRIAM database. A protein entity has the MIRIAM data type 'UniProt', and an enzyme has the MIRIAM data type 'Enzyme Nomenclature'. UniProt|Enzyme Nomenclature MIRIAM data type synonymous name A synonymous name of a data type from the MIRIAM database. A synonymous name for a MIRIAM data type taken from a controlled vocabulary. beta12orEarlier Taverna workflow ID beta12orEarlier Unique identifier of a Taverna workflow. Biological model name beta12orEarlier Name of a biological (mathematical) model. BioModel ID Unique identifier of an entry from the BioModel database. beta12orEarlier (BIOMD|MODEL)[0-9]{10} PubChem CID [0-9]+ PubChem compound accession identifier Chemical structure specified in PubChem Compound Identification (CID), a non-zero integer identifier for a unique chemical structure. beta12orEarlier ChemSpider ID Identifier of an entry from the ChemSpider database. beta12orEarlier [0-9]+ ChEBI ID Identifier of an entry from the ChEBI database. ChEBI IDs ChEBI identifier CHEBI:[0-9]+ beta12orEarlier BioPax concept ID beta12orEarlier An identifier of a concept from the BioPax ontology. GO concept ID GO concept identifier [0-9]{7}|GO:[0-9]{7} beta12orEarlier An identifier of a concept from The Gene Ontology. MeSH concept ID beta12orEarlier An identifier of a concept from the MeSH vocabulary. HGNC concept ID beta12orEarlier An identifier of a concept from the HGNC controlled vocabulary. NCBI taxonomy ID NCBI taxonomy identifier [1-9][0-9]{0,8} NCBI tax ID A stable unique identifier for each taxon (for a species, a family, an order, or any other group in the NCBI taxonomy database. 9662|3483|182682 beta12orEarlier Plant Ontology concept ID An identifier of a concept from the Plant Ontology (PO). beta12orEarlier UMLS concept ID An identifier of a concept from the UMLS vocabulary. beta12orEarlier FMA concept ID An identifier of a concept from Foundational Model of Anatomy. FMA:[0-9]+ Classifies anatomical entities according to their shared characteristics (genus) and distinguishing characteristics (differentia). Specifies the part-whole and spatial relationships of the entities, morphological transformation of the entities during prenatal development and the postnatal life cycle and principles, rules and definitions according to which classes and relationships in the other three components of FMA are represented. beta12orEarlier EMAP concept ID beta12orEarlier An identifier of a concept from the EMAP mouse ontology. ChEBI concept ID beta12orEarlier An identifier of a concept from the ChEBI ontology. MGED concept ID beta12orEarlier An identifier of a concept from the MGED ontology. myGrid concept ID beta12orEarlier The ontology is provided as two components, the service ontology and the domain ontology. The domain ontology acts provides concepts for core bioinformatics data types and their relations. The service ontology describes the physical and operational features of web services. An identifier of a concept from the myGrid ontology. PubMed ID PMID [1-9][0-9]{0,8} PubMed unique identifier of an article. beta12orEarlier 4963447 DOI beta12orEarlier (doi\:)?[0-9]{2}\.[0-9]{4}/.* Digital Object Identifier Digital Object Identifier (DOI) of a published article. Medline UI beta12orEarlier Medline UI (unique identifier) of an article. The use of Medline UI has been replaced by the PubMed unique identifier. Medline unique identifier Tool name The name of a computer package, application, method or function. beta12orEarlier Tool name (signature) beta12orEarlier The unique name of a signature (sequence classifier) method. Signature methods from http://www.ebi.ac.uk/Tools/InterProScan/help.html#results include BlastProDom, FPrintScan, HMMPIR, HMMPfam, HMMSmart, HMMTigr, ProfileScan, ScanRegExp, SuperFamily and HAMAP. Tool name (BLAST) This include 'blastn', 'blastp', 'blastx', 'tblastn' and 'tblastx'. The name of a BLAST tool. beta12orEarlier BLAST name Tool name (FASTA) beta12orEarlier The name of a FASTA tool. This includes 'fasta3', 'fastx3', 'fasty3', 'fastf3', 'fasts3' and 'ssearch'. Tool name (EMBOSS) The name of an EMBOSS application. beta12orEarlier Tool name (EMBASSY package) The name of an EMBASSY package. beta12orEarlier QSAR descriptor (constitutional) A QSAR constitutional descriptor. beta12orEarlier QSAR constitutional descriptor QSAR descriptor (electronic) beta12orEarlier A QSAR electronic descriptor. QSAR electronic descriptor QSAR descriptor (geometrical) QSAR geometrical descriptor A QSAR geometrical descriptor. beta12orEarlier QSAR descriptor (topological) beta12orEarlier QSAR topological descriptor A QSAR topological descriptor. QSAR descriptor (molecular) A QSAR molecular descriptor. QSAR molecular descriptor beta12orEarlier Sequence set (protein) Any collection of multiple protein sequences and associated metadata that do not (typically) correspond to common sequence database records or database entries. beta12orEarlier Sequence set (nucleic acid) beta12orEarlier Any collection of multiple nucleotide sequences and associated metadata that do not (typically) correspond to common sequence database records or database entries. Sequence cluster A set of sequences that have been clustered or otherwise classified as belonging to a group including (typically) sequence cluster information. The cluster might include sequences identifiers, short descriptions, alignment and summary information. beta12orEarlier Psiblast checkpoint file beta12orEarlier A Psiblast checkpoint file uses ASN.1 Binary Format and usually has the extension '.asn'. beta12orEarlier true A file of intermediate results from a PSIBLAST search that is used for priming the search in the next PSIBLAST iteration. HMMER synthetic sequences set Sequences generated by HMMER package in FASTA-style format. beta12orEarlier beta12orEarlier true Proteolytic digest beta12orEarlier A protein sequence cleaved into peptide fragments (by enzymatic or chemical cleavage) with fragment masses. Restriction digest Restriction digest fragments from digesting a nucleotide sequence with restriction sites using a restriction endonuclease. SO:0000412 beta12orEarlier PCR primers beta12orEarlier Oligonucleotide primer(s) for PCR and DNA amplification, for example a minimal primer set. vectorstrip cloning vector definition file beta12orEarlier true File of sequence vectors used by EMBOSS vectorstrip application, or any file in same format. beta12orEarlier Primer3 internal oligo mishybridizing library true beta12orEarlier A library of nucleotide sequences to avoid during hybridization events. Hybridization of the internal oligo to sequences in this library is avoided, rather than priming from them. The file is in a restricted FASTA format. beta12orEarlier Primer3 mispriming library file true A nucleotide sequence library of sequences to avoid during amplification (for example repetitive sequences, or possibly the sequences of genes in a gene family that should not be amplified. The file must is in a restricted FASTA format. beta12orEarlier beta12orEarlier primersearch primer pairs sequence record true beta12orEarlier beta12orEarlier File of one or more pairs of primer sequences, as used by EMBOSS primersearch application. Sequence cluster (protein) Protein sequence cluster The sequences are typically related, for example a family of sequences. beta12orEarlier A cluster of protein sequences. Sequence cluster (nucleic acid) A cluster of nucleotide sequences. Nucleotide sequence cluster beta12orEarlier The sequences are typically related, for example a family of sequences. Sequence length beta12orEarlier The size (length) of a sequence, subsequence or region in a sequence, or range(s) of lengths. Word size Word size is used for example in word-based sequence database search methods. Word length 1.5 Size of a sequence word. true beta12orEarlier Window size 1.5 true A window is a region of fixed size but not fixed position over a molecular sequence. It is typically moved (computationally) over a sequence during scoring. beta12orEarlier Size of a sequence window. Sequence length range true Specification of range(s) of length of sequences. beta12orEarlier 1.5 Sequence information report Report on basic information about a molecular sequence such as name, accession number, type (nucleic or protein), length, description etc. beta12orEarlier beta12orEarlier true Sequence property beta12orEarlier An informative report about non-positional sequence features, typically a report on general molecular sequence properties derived from sequence analysis. Sequence properties report Sequence features Sequence features report beta12orEarlier http://purl.bioontology.org/ontology/MSH/D058977 SO:0000110 This includes annotation of positional sequence features, organized into a standard feature table, or any other report of sequence features. General feature reports are a source of sequence feature table information although internal conversion would be required. General sequence features Annotation of positional features of molecular sequence(s), i.e. that can be mapped to position(s) in the sequence. Features Feature record Sequence features (comparative) Comparative data on sequence features such as statistics, intersections (and data on intersections), differences etc. beta13 This is a broad data type and is used a placeholder for other, more specific types. It is primarily intended to help navigation of EDAM and would not typically be used for annotation. true beta12orEarlier Sequence property (protein) true A report of general sequence properties derived from protein sequence data. beta12orEarlier beta12orEarlier Sequence property (nucleic acid) A report of general sequence properties derived from nucleotide sequence data. beta12orEarlier true beta12orEarlier Sequence complexity report A report on sequence complexity, for example low-complexity or repeat regions in sequences. beta12orEarlier Sequence property (complexity) Sequence ambiguity report A report on ambiguity in molecular sequence(s). Sequence property (ambiguity) beta12orEarlier Sequence composition report beta12orEarlier A report (typically a table) on character or word composition / frequency of a molecular sequence(s). Sequence property (composition) Peptide molecular weight hits A report on peptide fragments of certain molecular weight(s) in one or more protein sequences. beta12orEarlier Base position variability plot beta12orEarlier A plot of third base position variability in a nucleotide sequence. Sequence composition table A table of character or word composition / frequency of a molecular sequence. beta12orEarlier true beta12orEarlier Base frequencies table beta12orEarlier A table of base frequencies of a nucleotide sequence. Base word frequencies table A table of word composition of a nucleotide sequence. beta12orEarlier Amino acid frequencies table Sequence composition (amino acid frequencies) A table of amino acid frequencies of a protein sequence. beta12orEarlier Amino acid word frequencies table A table of amino acid word composition of a protein sequence. Sequence composition (amino acid words) beta12orEarlier DAS sequence feature annotation beta12orEarlier Annotation of a molecular sequence in DAS format. beta12orEarlier true Feature table Sequence feature table beta12orEarlier Annotation of positional sequence features, organized into a standard feature table. Map DNA map beta12orEarlier A map of (typically one) DNA sequence annotated with positional or non-positional features. Nucleic acid features An informative report on intrinsic positional features of a nucleotide sequence. beta12orEarlier Genome features This includes nucleotide sequence feature annotation in any known sequence feature table format and any other report of nucleic acid features. Genomic features Nucleic acid feature table Feature table (nucleic acid) Protein features An informative report on intrinsic positional features of a protein sequence. beta12orEarlier This includes protein sequence feature annotation in any known sequence feature table format and any other report of protein features. Feature table (protein) Protein feature table Genetic map A map showing the relative positions of genetic markers in a nucleic acid sequence, based on estimation of non-physical distance such as recombination frequencies. beta12orEarlier A genetic (linkage) map indicates the proximity of two genes on a chromosome, whether two genes are linked and the frequency they are transmitted together to an offspring. They are limited to genetic markers of traits observable only in whole organisms. Linkage map Moby:GeneticMap Sequence map A sequence map typically includes annotation on significant subsequences such as contigs, haplotypes and genes. The contigs shown will (typically) be a set of small overlapping clones representing a complete chromosomal segment. beta12orEarlier A map of genetic markers in a contiguous, assembled genomic sequence, with the sizes and separation of markers measured in base pairs. Physical map A map of DNA (linear or circular) annotated with physical features or landmarks such as restriction sites, cloned DNA fragments, genes or genetic markers, along with the physical distances between them. Distance in a physical map is measured in base pairs. A physical map might be ordered relative to a reference map (typically a genetic map) in the process of genome sequencing. beta12orEarlier Sequence signature map true Image of a sequence with matches to signatures, motifs or profiles. beta12orEarlier beta12orEarlier Cytogenetic map beta12orEarlier A map showing banding patterns derived from direct observation of a stained chromosome. Cytologic map Chromosome map Cytogenic map This is the lowest-resolution physical map and can provide only rough estimates of physical (base pair) distances. Like a genetic map, they are limited to genetic markers of traits observable only in whole organisms. DNA transduction map beta12orEarlier A gene map showing distances between loci based on relative cotransduction frequencies. Gene map Sequence map of a single gene annotated with genetic features such as introns, exons, untranslated regions, polyA signals, promoters, enhancers and (possibly) mutations defining alleles of a gene. beta12orEarlier Plasmid map Sequence map of a plasmid (circular DNA). beta12orEarlier Genome map beta12orEarlier Sequence map of a whole genome. Restriction map Image of the restriction enzyme cleavage sites (restriction sites) in a nucleic acid sequence. beta12orEarlier InterPro compact match image beta12orEarlier Image showing matches between protein sequence(s) and InterPro Entries. The sequence(s) might be screened against InterPro, or be the sequences from the InterPro entry itself. Each protein is represented as a scaled horizontal line with colored bars indicating the position of the matches. beta12orEarlier true InterPro detailed match image beta12orEarlier beta12orEarlier Image showing detailed information on matches between protein sequence(s) and InterPro Entries. The sequence(s) might be screened against InterPro, or be the sequences from the InterPro entry itself. true InterPro architecture image beta12orEarlier beta12orEarlier true The sequence(s) might be screened against InterPro, or be the sequences from the InterPro entry itself. Domain architecture is shown as a series of non-overlapping domains in the protein. Image showing the architecture of InterPro domains in a protein sequence. SMART protein schematic true beta12orEarlier beta12orEarlier SMART protein schematic in PNG format. GlobPlot domain image beta12orEarlier beta12orEarlier true Images based on GlobPlot prediction of intrinsic disordered regions and globular domains in protein sequences. Sequence motif matches beta12orEarlier Report on the location of matches to profiles, motifs (conserved or functional patterns) or other signatures in one or more sequences. 1.8 true Sequence features (repeats) beta12orEarlier true 1.5 Repeat sequence map The report might include derived data map such as classification, annotation, organization, periodicity etc. Location of short repetitive subsequences (repeat sequences) in (typically nucleotide) sequences. Gene and transcript structure (report) 1.5 beta12orEarlier A report on predicted or actual gene structure, regions which make an RNA product and features such as promoters, coding regions, splice sites etc. true Mobile genetic elements true beta12orEarlier regions of a nucleic acid sequence containing mobile genetic elements. 1.8 Nucleic acid features report (PolyA signal or site) true regions or sites in a eukaryotic and eukaryotic viral RNA sequence which directs endonuclease cleavage or polyadenylation of an RNA transcript. 1.8 beta12orEarlier Nucleic acid features (quadruplexes) true 1.5 A report on quadruplex-forming motifs in a nucleotide sequence. beta12orEarlier Nucleic acid features report (CpG island and isochore) 1.8 CpG rich regions (isochores) in a nucleotide sequence. beta12orEarlier true Nucleic acid features report (restriction sites) beta12orEarlier true 1.8 restriction enzyme recognition sites (restriction sites) in a nucleic acid sequence. Nucleosome exclusion sequences beta12orEarlier true Report on nucleosome formation potential or exclusion sequence(s). 1.8 Nucleic acid features report (splice sites) splice sites in a nucleotide sequence or alternative RNA splicing events. beta12orEarlier true 1.8 Nucleic acid features report (matrix/scaffold attachment sites) 1.8 matrix/scaffold attachment regions (MARs/SARs) in a DNA sequence. true beta12orEarlier Gene features (exonic splicing enhancer) beta12orEarlier beta13 true A report on exonic splicing enhancers (ESE) in an exon. Nucleic acid features (microRNA) true beta12orEarlier A report on microRNA sequence (miRNA) or precursor, microRNA targets, miRNA binding sites in an RNA sequence etc. 1.5 Gene features report (operon) true operons (operators, promoters and genes) from a bacterial genome. 1.8 beta12orEarlier Nucleic acid features report (promoters) 1.8 whole promoters or promoter elements (transcription start sites, RNA polymerase binding site, transcription factor binding sites, promoter enhancers etc) in a DNA sequence. true beta12orEarlier Coding region beta12orEarlier protein-coding regions including coding sequences (CDS), exons, translation initiation sites and open reading frames. 1.8 true Gene features (SECIS element) beta12orEarlier beta13 A report on selenocysteine insertion sequence (SECIS) element in a DNA sequence. true Transcription factor binding sites transcription factor binding sites (TFBS) in a DNA sequence. beta12orEarlier true 1.8 Protein features (sites) true beta12orEarlier Use this concept for collections of specific sites which are not necessarily contiguous, rather than contiguous stretches of amino acids. beta12orEarlier A report on predicted or known key residue positions (sites) in a protein sequence, such as binding or functional sites. Protein features report (signal peptides) true signal peptides or signal peptide cleavage sites in protein sequences. 1.8 beta12orEarlier Protein features report (cleavage sites) true 1.8 cleavage sites (for a proteolytic enzyme or agent) in a protein sequence. beta12orEarlier Protein features (post-translation modifications) true beta12orEarlier post-translation modifications in a protein sequence, typically describing the specific sites involved. 1.8 Protein features report (active sites) 1.8 true beta12orEarlier catalytic residues (active site) of an enzyme. Protein features report (binding sites) beta12orEarlier ligand-binding (non-catalytic) residues of a protein, such as sites that bind metal, prosthetic groups or lipids. true 1.8 Protein features (epitopes) A report on antigenic determinant sites (epitopes) in proteins, from sequence and / or structural data. beta13 beta12orEarlier Epitope mapping is commonly done during vaccine design. true Protein features report (nucleic acid binding sites) true beta12orEarlier 1.8 RNA and DNA-binding proteins and binding sites in protein sequences. MHC Class I epitopes report beta12orEarlier beta12orEarlier true A report on epitopes that bind to MHC class I molecules. MHC Class II epitopes report beta12orEarlier beta12orEarlier true A report on predicted epitopes that bind to MHC class II molecules. Protein features (PEST sites) beta12orEarlier A report or plot of PEST sites in a protein sequence. true beta13 'PEST' motifs target proteins for proteolytic degradation and reduce the half-lives of proteins dramatically. Sequence database hits scores list Scores from a sequence database search (for example a BLAST search). beta12orEarlier true beta12orEarlier Sequence database hits alignments list beta12orEarlier Alignments from a sequence database search (for example a BLAST search). beta12orEarlier true Sequence database hits evaluation data beta12orEarlier A report on the evaluation of the significance of sequence similarity scores from a sequence database search (for example a BLAST search). beta12orEarlier true MEME motif alphabet Alphabet for the motifs (patterns) that MEME will search for. beta12orEarlier beta12orEarlier true MEME background frequencies file MEME background frequencies file. true beta12orEarlier beta12orEarlier MEME motifs directive file beta12orEarlier true File of directives for ordering and spacing of MEME motifs. beta12orEarlier Dirichlet distribution Dirichlet distribution used by hidden Markov model analysis programs. beta12orEarlier HMM emission and transition counts Emission and transition counts of a hidden Markov model, generated once HMM has been determined, for example after residues/gaps have been assigned to match, delete and insert states. true 1.4 beta12orEarlier Regular expression Regular expression pattern. beta12orEarlier Sequence motif beta12orEarlier Any specific or conserved pattern (typically expressed as a regular expression) in a molecular sequence. Sequence profile Some type of statistical model representing a (typically multiple) sequence alignment. http://semanticscience.org/resource/SIO_010531 beta12orEarlier Protein signature An informative report about a specific or conserved protein sequence pattern. InterPro entry Protein repeat signature Protein region signature Protein site signature beta12orEarlier Protein family signature Protein domain signature Prosite nucleotide pattern A nucleotide regular expression pattern from the Prosite database. beta12orEarlier true beta12orEarlier Prosite protein pattern A protein regular expression pattern from the Prosite database. beta12orEarlier beta12orEarlier true Position frequency matrix beta12orEarlier PFM A profile (typically representing a sequence alignment) that is a simple matrix of nucleotide (or amino acid) counts per position. Position weight matrix PWM beta12orEarlier A profile (typically representing a sequence alignment) that is weighted matrix of nucleotide (or amino acid) counts per position. Contributions of individual sequences to the matrix might be uneven (weighted). Information content matrix beta12orEarlier ICM A profile (typically representing a sequence alignment) derived from a matrix of nucleotide (or amino acid) counts per position that reflects information content at each position. Hidden Markov model HMM beta12orEarlier A hidden Markov model representation of a set or alignment of sequences. Fingerprint beta12orEarlier One or more fingerprints (sequence classifiers) as used in the PRINTS database. Domainatrix signature A protein signature of the type used in the EMBASSY Signature package. true beta12orEarlier beta12orEarlier HMMER NULL hidden Markov model beta12orEarlier beta12orEarlier true NULL hidden Markov model representation used by the HMMER package. Protein family signature Protein family signatures cover all domains in the matching proteins and span >80% of the protein length and with no adjacent protein domain signatures or protein region signatures. beta12orEarlier true 1.5 A protein family signature (sequence classifier) from the InterPro database. Protein domain signature beta12orEarlier 1.5 true A protein domain signature (sequence classifier) from the InterPro database. Protein domain signatures identify structural or functional domains or other units with defined boundaries. Protein region signature A protein region signature (sequence classifier) from the InterPro database. true beta12orEarlier 1.5 A protein region signature defines a region which cannot be described as a protein family or domain signature. Protein repeat signature true 1.5 A protein repeat signature is a repeated protein motif, that is not in single copy expected to independently fold into a globular domain. beta12orEarlier A protein repeat signature (sequence classifier) from the InterPro database. Protein site signature A protein site signature is a classifier for a specific site in a protein. beta12orEarlier A protein site signature (sequence classifier) from the InterPro database. true 1.5 Protein conserved site signature 1.4 true A protein conserved site signature is any short sequence pattern that may contain one or more unique residues and is cannot be described as a active site, binding site or post-translational modification. A protein conserved site signature (sequence classifier) from the InterPro database. beta12orEarlier Protein active site signature A protein active site signature (sequence classifier) from the InterPro database. A protein active site signature corresponds to an enzyme catalytic pocket. An active site typically includes non-contiguous residues, therefore multiple signatures may be required to describe an active site. ; residues involved in enzymatic reactions for which mutational data is typically available. true 1.4 beta12orEarlier Protein binding site signature 1.4 A protein binding site signature (sequence classifier) from the InterPro database. true A protein binding site signature corresponds to a site that reversibly binds chemical compounds, which are not themselves substrates of the enzymatic reaction. This includes enzyme cofactors and residues involved in electron transport or protein structure modification. beta12orEarlier Protein post-translational modification signature A protein post-translational modification signature (sequence classifier) from the InterPro database. A protein post-translational modification signature corresponds to sites that undergo modification of the primary structure, typically to activate or de-activate a function. For example, methylation, sumoylation, glycosylation etc. The modification might be permanent or reversible. 1.4 beta12orEarlier true Sequence alignment (pair) http://semanticscience.org/resource/SIO_010068 beta12orEarlier Alignment of exactly two molecular sequences. Sequence alignment (multiple) beta12orEarlier beta12orEarlier Alignment of more than two molecular sequences. true Sequence alignment (nucleic acid) beta12orEarlier Alignment of multiple nucleotide sequences. Sequence alignment (protein) Alignment of multiple protein sequences. beta12orEarlier Sequence alignment (hybrid) Alignment of multiple molecular sequences of different types. Hybrid sequence alignments include for example genomic DNA to EST, cDNA or mRNA. beta12orEarlier Sequence alignment (nucleic acid pair) beta12orEarlier Alignment of exactly two nucleotide sequences. Sequence alignment (protein pair) Alignment of exactly two protein sequences. beta12orEarlier Hybrid sequence alignment (pair) true beta12orEarlier beta12orEarlier Alignment of exactly two molecular sequences of different types. Multiple nucleotide sequence alignment beta12orEarlier Alignment of more than two nucleotide sequences. true beta12orEarlier Multiple protein sequence alignment true beta12orEarlier beta12orEarlier Alignment of more than two protein sequences. Alignment score or penalty beta12orEarlier A simple floating point number defining the penalty for opening or extending a gap in an alignment. Score end gaps control beta12orEarlier beta12orEarlier Whether end gaps are scored or not. true Aligned sequence order beta12orEarlier beta12orEarlier true Controls the order of sequences in an output sequence alignment. Gap opening penalty A penalty for opening a gap in an alignment. beta12orEarlier Gap extension penalty A penalty for extending a gap in an alignment. beta12orEarlier Gap separation penalty beta12orEarlier A penalty for gaps that are close together in an alignment. Terminal gap penalty beta12orEarlier A penalty for gaps at the termini of an alignment, either from the N/C terminal of protein or 5'/3' terminal of nucleotide sequences. true beta12orEarlier Match reward score beta12orEarlier The score for a 'match' used in various sequence database search applications with simple scoring schemes. Mismatch penalty score beta12orEarlier The score (penalty) for a 'mismatch' used in various alignment and sequence database search applications with simple scoring schemes. Drop off score This is the threshold drop in score at which extension of word alignment is halted. beta12orEarlier Gap opening penalty (integer) beta12orEarlier true A simple floating point number defining the penalty for opening a gap in an alignment. beta12orEarlier Gap opening penalty (float) beta12orEarlier beta12orEarlier A simple floating point number defining the penalty for opening a gap in an alignment. true Gap extension penalty (integer) true A simple floating point number defining the penalty for extending a gap in an alignment. beta12orEarlier beta12orEarlier Gap extension penalty (float) beta12orEarlier true A simple floating point number defining the penalty for extending a gap in an alignment. beta12orEarlier Gap separation penalty (integer) A simple floating point number defining the penalty for gaps that are close together in an alignment. beta12orEarlier true beta12orEarlier Gap separation penalty (float) beta12orEarlier true beta12orEarlier A simple floating point number defining the penalty for gaps that are close together in an alignment. Terminal gap opening penalty beta12orEarlier A number defining the penalty for opening gaps at the termini of an alignment, either from the N/C terminal of protein or 5'/3' terminal of nucleotide sequences. Terminal gap extension penalty A number defining the penalty for extending gaps at the termini of an alignment, either from the N/C terminal of protein or 5'/3' terminal of nucleotide sequences. beta12orEarlier Sequence identity Sequence identity is the number (%) of matches (identical characters) in positions from an alignment of two molecular sequences. beta12orEarlier Sequence similarity beta12orEarlier Sequence similarity is the similarity (expressed as a percentage) of two molecular sequences calculated from their alignment, a scoring matrix for scoring characters substitutions and penalties for gap insertion and extension. Data Type is float probably. Sequence alignment metadata (quality report) beta12orEarlier true beta12orEarlier Data on molecular sequence alignment quality (estimated accuracy). Sequence alignment report (site conservation) beta12orEarlier Data on character conservation in a molecular sequence alignment. 1.4 This is a broad data type and is used a placeholder for other, more specific types. It is primarily intended to help navigation of EDAM and would not typically be used for annotation. Use this concept for calculated substitution rates, relative site variability, data on sites with biased properties, highly conserved or very poorly conserved sites, regions, blocks etc. true Sequence alignment report (site correlation) 1.4 beta12orEarlier Data on correlations between sites in a molecular sequence alignment, typically to identify possible covarying positions and predict contacts or structural constraints in protein structures. true Sequence-profile alignment (Domainatrix signature) beta12orEarlier Alignment of molecular sequences to a Domainatrix signature (representing a sequence alignment). beta12orEarlier true Sequence-profile alignment (HMM) beta12orEarlier 1.5 true Alignment of molecular sequence(s) to a hidden Markov model(s). Sequence-profile alignment (fingerprint) Alignment of molecular sequences to a protein fingerprint from the PRINTS database. 1.5 beta12orEarlier true Phylogenetic continuous quantitative data beta12orEarlier Phylogenetic continuous quantitative characters Quantitative traits Continuous quantitative data that may be read during phylogenetic tree calculation. Phylogenetic discrete data Discrete characters Character data with discrete states that may be read during phylogenetic tree calculation. Phylogenetic discrete states beta12orEarlier Discretely coded characters Phylogenetic character cliques One or more cliques of mutually compatible characters that are generated, for example from analysis of discrete character data, and are used to generate a phylogeny. Phylogenetic report (cliques) beta12orEarlier Phylogenetic invariants Phylogenetic invariants data for testing alternative tree topologies. beta12orEarlier Phylogenetic report (invariants) Phylogenetic report beta12orEarlier A report of data concerning or derived from a phylogenetic tree, or from comparing two or more phylogenetic trees. Phylogenetic tree report 1.5 Phylogenetic report Phylogenetic tree-derived report This is a broad data type and is used for example for reports on confidence, shape or stratigraphic (age) data derived from phylogenetic tree analysis. true DNA substitution model Substitution model Phylogenetic tree report (DNA substitution model) Sequence alignment report (DNA substitution model) beta12orEarlier A model of DNA substitution that explains a DNA sequence alignment, derived from phylogenetic tree analysis. Phylogenetic tree report (tree shape) beta12orEarlier true 1.4 Data about the shape of a phylogenetic tree. Phylogenetic tree report (tree evaluation) beta12orEarlier true 1.4 Data on the confidence of a phylogenetic tree. Phylogenetic tree distances beta12orEarlier Phylogenetic tree report (tree distances) Distances, such as Branch Score distance, between two or more phylogenetic trees. Phylogenetic tree report (tree stratigraphic) beta12orEarlier 1.4 true Molecular clock and stratigraphic (age) data derived from phylogenetic tree analysis. Phylogenetic character contrasts Phylogenetic report (character contrasts) Independent contrasts for characters used in a phylogenetic tree, or covariances, regressions and correlations between characters for those contrasts. beta12orEarlier Comparison matrix (integers) beta12orEarlier Substitution matrix (integers) beta12orEarlier Matrix of integer numbers for sequence comparison. true Comparison matrix (floats) beta12orEarlier beta12orEarlier true Matrix of floating point numbers for sequence comparison. Substitution matrix (floats) Comparison matrix (nucleotide) Matrix of integer or floating point numbers for nucleotide comparison. beta12orEarlier Nucleotide substitution matrix Comparison matrix (amino acid) Amino acid comparison matrix beta12orEarlier Matrix of integer or floating point numbers for amino acid comparison. Amino acid substitution matrix Nucleotide comparison matrix (integers) Nucleotide substitution matrix (integers) beta12orEarlier Matrix of integer numbers for nucleotide comparison. true beta12orEarlier Nucleotide comparison matrix (floats) beta12orEarlier true Matrix of floating point numbers for nucleotide comparison. beta12orEarlier Nucleotide substitution matrix (floats) Amino acid comparison matrix (integers) beta12orEarlier Matrix of integer numbers for amino acid comparison. Amino acid substitution matrix (integers) true beta12orEarlier Amino acid comparison matrix (floats) beta12orEarlier Amino acid substitution matrix (floats) beta12orEarlier true Matrix of floating point numbers for amino acid comparison. Protein features report (membrane regions) true beta12orEarlier 1.8 trans- or intra-membrane regions of a protein, typically describing physicochemical properties of the secondary structure elements. Nucleic acid structure 3D coordinate and associated data for a nucleic acid tertiary (3D) structure. beta12orEarlier Protein structure Protein structures 3D coordinate and associated data for a protein tertiary (3D) structure. beta12orEarlier Protein-ligand complex The structure of a protein in complex with a ligand, typically a small molecule such as an enzyme substrate or cofactor, but possibly another macromolecule. beta12orEarlier This includes interactions of proteins with atoms, ions and small molecules or macromolecules such as nucleic acids or other polypeptides. For stable inter-polypeptide interactions use 'Protein complex' instead. Carbohydrate structure beta12orEarlier 3D coordinate and associated data for a carbohydrate (3D) structure. Small molecule structure 3D coordinate and associated data for the (3D) structure of a small molecule, such as any common chemical compound. CHEBI:23367 beta12orEarlier DNA structure beta12orEarlier 3D coordinate and associated data for a DNA tertiary (3D) structure. RNA structure beta12orEarlier 3D coordinate and associated data for an RNA tertiary (3D) structure. tRNA structure 3D coordinate and associated data for a tRNA tertiary (3D) structure, including tmRNA, snoRNAs etc. beta12orEarlier Protein chain beta12orEarlier 3D coordinate and associated data for the tertiary (3D) structure of a polypeptide chain. Protein domain 3D coordinate and associated data for the tertiary (3D) structure of a protein domain. beta12orEarlier Protein structure (all atoms) beta12orEarlier 1.5 true 3D coordinate and associated data for a protein tertiary (3D) structure (all atoms). C-alpha trace 3D coordinate and associated data for a protein tertiary (3D) structure (typically C-alpha atoms only). C-beta atoms from amino acid side-chains may be included. Protein structure (C-alpha atoms) beta12orEarlier Protein chain (all atoms) 3D coordinate and associated data for a polypeptide chain tertiary (3D) structure (all atoms). beta12orEarlier beta12orEarlier true Protein chain (C-alpha atoms) true 3D coordinate and associated data for a polypeptide chain tertiary (3D) structure (typically C-alpha atoms only). beta12orEarlier beta12orEarlier C-beta atoms from amino acid side-chains may be included. Protein domain (all atoms) 3D coordinate and associated data for a protein domain tertiary (3D) structure (all atoms). beta12orEarlier true beta12orEarlier Protein domain (C-alpha atoms) C-beta atoms from amino acid side-chains may be included. true 3D coordinate and associated data for a protein domain tertiary (3D) structure (typically C-alpha atoms only). beta12orEarlier beta12orEarlier Structure alignment (pair) Alignment (superimposition) of exactly two molecular tertiary (3D) structures. beta12orEarlier Pair structure alignment Structure alignment (multiple) beta12orEarlier beta12orEarlier true Alignment (superimposition) of more than two molecular tertiary (3D) structures. Structure alignment (protein) Protein structure alignment beta12orEarlier Alignment (superimposition) of protein tertiary (3D) structures. Structure alignment (nucleic acid) beta12orEarlier Alignment (superimposition) of nucleic acid tertiary (3D) structures. Nucleic acid structure alignment Structure alignment (protein pair) Protein pair structural alignment beta12orEarlier Alignment (superimposition) of exactly two protein tertiary (3D) structures. Multiple protein tertiary structure alignment Alignment (superimposition) of more than two protein tertiary (3D) structures. beta12orEarlier true beta12orEarlier Structure alignment (protein all atoms) 1.5 Alignment (superimposition) of protein tertiary (3D) structures (all atoms considered). beta12orEarlier true Structure alignment (protein C-alpha atoms) Alignment (superimposition) of protein tertiary (3D) structures (typically C-alpha atoms only considered). C-beta atoms from amino acid side-chains may be considered. 1.5 C-alpha trace true beta12orEarlier Pairwise protein tertiary structure alignment (all atoms) Alignment (superimposition) of exactly two protein tertiary (3D) structures (all atoms considered). true beta12orEarlier beta12orEarlier Pairwise protein tertiary structure alignment (C-alpha atoms) C-beta atoms from amino acid side-chains may be included. true beta12orEarlier Alignment (superimposition) of exactly two protein tertiary (3D) structures (typically C-alpha atoms only considered). beta12orEarlier Multiple protein tertiary structure alignment (all atoms) beta12orEarlier true Alignment (superimposition) of exactly two protein tertiary (3D) structures (all atoms considered). beta12orEarlier Multiple protein tertiary structure alignment (C-alpha atoms) beta12orEarlier Alignment (superimposition) of exactly two protein tertiary (3D) structures (typically C-alpha atoms only considered). true beta12orEarlier C-beta atoms from amino acid side-chains may be included. Structure alignment (nucleic acid pair) beta12orEarlier Nucleic acid pair structure alignment Alignment (superimposition) of exactly two nucleic acid tertiary (3D) structures. Multiple nucleic acid tertiary structure alignment beta12orEarlier Alignment (superimposition) of more than two nucleic acid tertiary (3D) structures. true beta12orEarlier Structure alignment (RNA) RNA structure alignment Alignment (superimposition) of RNA tertiary (3D) structures. beta12orEarlier Structural transformation matrix Matrix to transform (rotate/translate) 3D coordinates, typically the transformation necessary to superimpose two molecular structures. beta12orEarlier DaliLite hit table DaliLite hit table of protein chain tertiary structure alignment data. The significant and top-scoring hits for regions of the compared structures is shown. Data such as Z-Scores, number of aligned residues, root-mean-square deviation (RMSD) of atoms and sequence identity are given. beta12orEarlier true beta12orEarlier Molecular similarity score beta12orEarlier A score reflecting structural similarities of two molecules. true beta12orEarlier Root-mean-square deviation RMSD beta12orEarlier Root-mean-square deviation (RMSD) is calculated to measure the average distance between superimposed macromolecular coordinates. Tanimoto similarity score beta12orEarlier A measure of the similarity between two ligand fingerprints. A ligand fingerprint is derived from ligand structural data from a Protein DataBank file. It reflects the elements or groups present or absent, covalent bonds and bond orders and the bonded environment in terms of SATIS codes and BLEEP atom types. 3D-1D scoring matrix A matrix of 3D-1D scores reflecting the probability of amino acids to occur in different tertiary structural environments. beta12orEarlier Amino acid index beta12orEarlier A table of 20 numerical values which quantify a property (e.g. physicochemical or biochemical) of the common amino acids. Amino acid index (chemical classes) Chemical classes (amino acids) Chemical classification (small, aliphatic, aromatic, polar, charged etc) of amino acids. beta12orEarlier Amino acid pair-wise contact potentials Contact potentials (amino acid pair-wise) Statistical protein contact potentials. beta12orEarlier Amino acid index (molecular weight) Molecular weights of amino acids. Molecular weight (amino acids) beta12orEarlier Amino acid index (hydropathy) Hydrophobic, hydrophilic or charge properties of amino acids. beta12orEarlier Hydropathy (amino acids) Amino acid index (White-Wimley data) beta12orEarlier White-Wimley data (amino acids) Experimental free energy values for the water-interface and water-octanol transitions for the amino acids. Amino acid index (van der Waals radii) van der Waals radii (amino acids) Van der Waals radii of atoms for different amino acid residues. beta12orEarlier Enzyme report true 1.5 Protein report (enzyme) beta12orEarlier Enzyme report An informative report on a specific enzyme. Restriction enzyme report An informative report on a specific restriction enzyme such as enzyme reference data. Restriction enzyme pattern data beta12orEarlier 1.5 This might include name of enzyme, organism, isoschizomers, methylation, source, suppliers, literature references, or data on restriction enzyme patterns such as name of enzyme, recognition site, length of pattern, number of cuts made by enzyme, details of blunt or sticky end cut etc. Protein report (restriction enzyme) Restriction enzyme report true Peptide molecular weights beta12orEarlier List of molecular weight(s) of one or more proteins or peptides, for example cut by proteolytic enzymes or reagents. The report might include associated data such as frequency of peptide fragment molecular weights. Peptide hydrophobic moment beta12orEarlier Report on the hydrophobic moment of a polypeptide sequence. Hydrophobic moment is a peptides hydrophobicity measured for different angles of rotation. Protein aliphatic index The aliphatic index of a protein. beta12orEarlier The aliphatic index is the relative protein volume occupied by aliphatic side chains. Protein sequence hydropathy plot Hydrophobic moment is a peptides hydrophobicity measured for different angles of rotation. A protein sequence with annotation on hydrophobic or hydrophilic / charged regions, hydrophobicity plot etc. beta12orEarlier Protein charge plot beta12orEarlier A plot of the mean charge of the amino acids within a window of specified length as the window is moved along a protein sequence. Protein solubility beta12orEarlier The solubility or atomic solvation energy of a protein sequence or structure. Protein solubility data Protein crystallizability beta12orEarlier Protein crystallizability data Data on the crystallizability of a protein sequence. Protein globularity Protein globularity data beta12orEarlier Data on the stability, intrinsic disorder or globularity of a protein sequence. Protein titration curve The titration curve of a protein. beta12orEarlier Protein isoelectric point beta12orEarlier The isoelectric point of one proteins. Protein pKa value The pKa value of a protein. beta12orEarlier Protein hydrogen exchange rate beta12orEarlier The hydrogen exchange rate of a protein. Protein extinction coefficient The extinction coefficient of a protein. beta12orEarlier Protein optical density The optical density of a protein. beta12orEarlier Protein subcellular localization Protein report (subcellular localization) An informative report on protein subcellular localization (nuclear, cytoplasmic, mitochondrial, chloroplast, plastid, membrane etc) or destination (exported / extracellular proteins). beta12orEarlier true beta13 Peptide immunogenicity data An report on allergenicity / immunogenicity of peptides and proteins. Peptide immunogenicity report beta12orEarlier Peptide immunogenicity This includes data on peptide ligands that elicit an immune response (immunogens), allergic cross-reactivity, predicted antigenicity (Hopp and Woods plot) etc. These data are useful in the development of peptide-specific antibodies or multi-epitope vaccines. Methods might use sequence data (for example motifs) and / or structural data. MHC peptide immunogenicity report A report on the immunogenicity of MHC class I or class II binding peptides. beta13 true beta12orEarlier Protein structure report Protein structural property Protein structure-derived report This includes for example reports on the surface properties (shape, hydropathy, electrostatic patches etc) of a protein structure, protein flexibility or motion, and protein architecture (spatial arrangement of secondary structure). Protein property (structural) Annotation on or structural information derived from one or more specific protein 3D structure(s) or structural domains. beta12orEarlier Protein report (structure) Protein structure report (domain) Protein structural quality report Report on the quality of a protein three-dimensional model. Protein structure report (quality evaluation) Protein structure validation report Protein property (structural quality) Model validation might involve checks for atomic packing, steric clashes, agreement with electron density maps etc. Protein report (structural quality) beta12orEarlier Protein residue interactions Residue interaction data Data on inter-atomic or inter-residue contacts, distances and interactions in protein structure(s) or on the interactions of protein atoms or residues with non-protein groups. beta12orEarlier Atom interaction data Protein flexibility or motion report This is a broad data type and is used a placeholder for other, more specific types. It is primarily intended to help navigation of EDAM and would not typically be used for annotation. Protein property (flexibility or motion) Informative report on flexibility or motion of a protein structure. Protein flexibility or motion beta12orEarlier true 1.4 Protein structure report (flexibility or motion) Protein solvent accessibility This is a broad data type and is used a placeholder for other, more specific types. It is primarily intended to help navigation of EDAM and would not typically be used for annotation. This concept covers definitions of the protein surface, interior and interfaces, accessible and buried residues, surface accessible pockets, interior inaccessible cavities etc. beta12orEarlier Data on the solvent accessible or buried surface area of a protein structure. Protein surface report This is a broad data type and is used a placeholder for other, more specific types. It is primarily intended to help navigation of EDAM and would not typically be used for annotation. Protein structure report (surface) 1.4 Data on the surface properties (shape, hydropathy, electrostatic patches etc) of a protein structure. beta12orEarlier true Ramachandran plot beta12orEarlier Phi/psi angle data or a Ramachandran plot of a protein structure. Protein dipole moment Data on the net charge distribution (dipole moment) of a protein structure. beta12orEarlier Protein distance matrix beta12orEarlier A matrix of distances between amino acid residues (for example the C-alpha atoms) in a protein structure. Protein contact map An amino acid residue contact map for a protein structure. beta12orEarlier Protein residue 3D cluster beta12orEarlier Report on clusters of contacting residues in protein structures such as a key structural residue network. Protein hydrogen bonds Patterns of hydrogen bonding in protein structures. beta12orEarlier Protein non-canonical interactions Protein non-canonical interactions report true Non-canonical atomic interactions in protein structures. 1.4 beta12orEarlier CATH node Information on a node from the CATH database. The report (for example http://www.cathdb.info/cathnode/1.10.10.10) includes CATH code (of the node and upper levels in the hierarchy), classification text (of appropriate levels in hierarchy), list of child nodes, representative domain and other relevant data and links. 1.5 beta12orEarlier true CATH classification node report SCOP node true SCOP classification node Information on a node from the SCOP database. 1.5 beta12orEarlier EMBASSY domain classification beta12orEarlier beta12orEarlier true An EMBASSY domain classification file (DCF) of classification and other data for domains from SCOP or CATH, in EMBL-like format. CATH class beta12orEarlier 1.5 Information on a protein 'class' node from the CATH database. true CATH architecture beta12orEarlier 1.5 Information on a protein 'architecture' node from the CATH database. true CATH topology true 1.5 Information on a protein 'topology' node from the CATH database. beta12orEarlier CATH homologous superfamily 1.5 true beta12orEarlier Information on a protein 'homologous superfamily' node from the CATH database. CATH structurally similar group 1.5 true beta12orEarlier Information on a protein 'structurally similar group' node from the CATH database. CATH functional category Information on a protein 'functional category' node from the CATH database. true 1.5 beta12orEarlier Protein fold recognition report Methods use some type of mapping between sequence and fold, for example secondary structure prediction and alignment, profile comparison, sequence properties, homologous sequence search, kernel machines etc. Domains and folds might be taken from SCOP or CATH. beta12orEarlier A report on known protein structural domains or folds that are recognized (identified) in protein sequence(s). true beta12orEarlier Protein-protein interaction report protein-protein interaction(s), including interactions between protein domains. beta12orEarlier true 1.8 Protein-ligand interaction report beta12orEarlier An informative report on protein-ligand (small molecule) interaction(s). Protein-nucleic acid interactions report true protein-DNA/RNA interaction(s). beta12orEarlier 1.8 Nucleic acid melting profile Nucleic acid stability profile A melting (stability) profile calculated the free energy required to unwind and separate the nucleic acid strands, plotted for sliding windows over a sequence. Data on the dissociation characteristics of a double-stranded nucleic acid molecule (DNA or a DNA/RNA hybrid) during heating. beta12orEarlier Nucleic acid enthalpy beta12orEarlier Enthalpy of hybridized or double stranded nucleic acid (DNA or RNA/DNA). Nucleic acid entropy Entropy of hybridized or double stranded nucleic acid (DNA or RNA/DNA). beta12orEarlier Nucleic acid melting temperature Melting temperature of hybridized or double stranded nucleic acid (DNA or RNA/DNA). beta12orEarlier beta12orEarlier true Nucleic acid stitch profile beta12orEarlier Stitch profile of hybridized or double stranded nucleic acid (DNA or RNA/DNA). A stitch profile diagram shows partly melted DNA conformations (with probabilities) at a range of temperatures. For example, a stitch profile might show possible loop openings with their location, size, probability and fluctuations at a given temperature. DNA base pair stacking energies data DNA base pair stacking energies data. beta12orEarlier DNA base pair twist angle data beta12orEarlier DNA base pair twist angle data. DNA base trimer roll angles data beta12orEarlier DNA base trimer roll angles data. Vienna RNA parameters RNA parameters used by the Vienna package. true beta12orEarlier beta12orEarlier Vienna RNA structure constraints true Structure constraints used by the Vienna package. beta12orEarlier beta12orEarlier Vienna RNA concentration data RNA concentration data used by the Vienna package. beta12orEarlier true beta12orEarlier Vienna RNA calculated energy beta12orEarlier beta12orEarlier true RNA calculated energy data generated by the Vienna package. Base pairing probability matrix dotplot beta12orEarlier Such as generated by the Vienna package. Dotplot of RNA base pairing probability matrix. Nucleic acid folding report Nucleic acid report (folding) beta12orEarlier Nucleic acid report (folding model) RNA secondary structure folding probablities A report on an analysis of RNA/DNA folding, minimum folding energies for DNA or RNA sequences, energy landscape of RNA mutants etc. This is a broad data type and is used a placeholder for other, more specific types. It is primarily intended to help navigation of EDAM and would not typically be used for annotation. RNA secondary structure folding classification Codon usage table Table of codon usage data calculated from one or more nucleic acid sequences. A codon usage table might include the codon usage table name, optional comments and a table with columns for codons and corresponding codon usage data. A genetic code can be extracted from or represented by a codon usage table. beta12orEarlier Genetic code beta12orEarlier A genetic code for an organism. A genetic code need not include detailed codon usage information. Codon adaptation index true A simple measure of synonymous codon usage bias often used to predict gene expression levels. CAI beta12orEarlier beta12orEarlier Codon usage bias plot Synonymous codon usage statistic plot beta12orEarlier A plot of the synonymous codon usage calculated for windows over a nucleotide sequence. Nc statistic true beta12orEarlier The effective number of codons used in a gene sequence. This reflects how far codon usage of a gene departs from equal usage of synonymous codons. beta12orEarlier Codon usage fraction difference The differences in codon usage fractions between two codon usage tables. beta12orEarlier Pharmacogenomic test report beta12orEarlier The report might correlate gene expression or single-nucleotide polymorphisms with drug efficacy or toxicity. Data on the influence of genotype on drug response. Disease report An informative report on a specific disease. For example, an informative report on a specific tumor including nature and origin of the sample, anatomic site, organ or tissue, tumor type, including morphology and/or histologic type, and so on. beta12orEarlier Disease report Linkage disequilibrium (report) true A report on linkage disequilibrium; the non-random association of alleles or polymorphisms at two or more loci (not necessarily on the same chromosome). 1.8 beta12orEarlier Heat map A graphical 2D tabular representation of gene expression data, typically derived from a DNA microarray experiment. beta12orEarlier A heat map is a table where rows and columns correspond to different genes and contexts (for example, cells or samples) and the cell color represents the level of expression of a gene that context. Affymetrix probe sets library file true Affymetrix library file of information about which probes belong to which probe set. CDF file beta12orEarlier beta12orEarlier Affymetrix probe sets information library file true Affymetrix library file of information about the probe sets such as the gene name with which the probe set is associated. GIN file beta12orEarlier beta12orEarlier Molecular weights standard fingerprint beta12orEarlier Standard protonated molecular masses from trypsin (modified porcine trypsin, Promega) and keratin peptides, used in EMBOSS. Metabolic pathway report This includes carbohydrate, energy, lipid, nucleotide, amino acid, glycan, PK/NRP, cofactor/vitamin, secondary metabolite, xenobiotics etc. beta12orEarlier A report typically including a map (diagram) of a metabolic pathway. 1.8 true Genetic information processing pathway report beta12orEarlier 1.8 true genetic information processing pathways. Environmental information processing pathway report true environmental information processing pathways. beta12orEarlier 1.8 Signal transduction pathway report A report typically including a map (diagram) of a signal transduction pathway. 1.8 true beta12orEarlier Cellular process pathways report 1.8 Topic concernning cellular process pathways. true beta12orEarlier Disease pathway or network report true beta12orEarlier disease pathways, typically of human disease. 1.8 Drug structure relationship map A report typically including a map (diagram) of drug structure relationships. beta12orEarlier Protein interaction networks 1.8 networks of protein interactions. true beta12orEarlier MIRIAM datatype A MIRIAM entry describes a MIRIAM data type including the official name, synonyms, root URI, identifier pattern (regular expression applied to a unique identifier of the data type) and documentation. Each data type can be associated with several resources. Each resource is a physical location of a service (typically a database) providing information on the elements of a data type. Several resources may exist for each data type, provided the same (mirrors) or different information. MIRIAM provides a stable and persistent reference to its data types. An entry (data type) from the Minimal Information Requested in the Annotation of Biochemical Models (MIRIAM) database of data resources. beta12orEarlier true 1.5 E-value An expectation value (E-Value) is the expected number of observations which are at least as extreme as observations expected to occur by random chance. The E-value describes the number of hits with a given score or better that are expected to occur at random when searching a database of a particular size. It decreases exponentially with the score (S) of a hit. A low E value indicates a more significant score. beta12orEarlier A simple floating point number defining the lower or upper limit of an expectation value (E-value). Expectation value Z-value beta12orEarlier The z-value is the number of standard deviations a data value is above or below a mean value. A z-value might be specified as a threshold for reporting hits from database searches. P-value beta12orEarlier A z-value might be specified as a threshold for reporting hits from database searches. The P-value is the probability of obtaining by random chance a result that is at least as extreme as an observed result, assuming a NULL hypothesis is true. Database version information true Ontology version information 1.5 Information on a database (or ontology) version, for example name, version number and release date. beta12orEarlier Tool version information beta12orEarlier Information on an application version, for example name, version number and release date. true 1.5 CATH version information beta12orEarlier beta12orEarlier true Information on a version of the CATH database. Swiss-Prot to PDB mapping Cross-mapping of Swiss-Prot codes to PDB identifiers. beta12orEarlier true beta12orEarlier Sequence database cross-references Cross-references from a sequence record to other databases. beta12orEarlier true beta12orEarlier Job status Metadata on the status of a submitted job. beta12orEarlier 1.5 true Values for EBI services are 'DONE' (job has finished and the results can then be retrieved), 'ERROR' (the job failed or no results where found), 'NOT_FOUND' (the job id is no longer available; job results might be deleted, 'PENDING' (the job is in a queue waiting processing), 'RUNNING' (the job is currently being processed). Job ID 1.0 The (typically numeric) unique identifier of a submitted job. beta12orEarlier true Job type 1.5 true beta12orEarlier A label (text token) describing the type of job, for example interactive or non-interactive. Tool log 1.5 A report of tool-specific metadata on some analysis or process performed, for example a log of diagnostic or error messages. true beta12orEarlier DaliLite log file true beta12orEarlier DaliLite log file describing all the steps taken by a DaliLite alignment of two protein structures. beta12orEarlier STRIDE log file STRIDE log file. true beta12orEarlier beta12orEarlier NACCESS log file beta12orEarlier beta12orEarlier true NACCESS log file. EMBOSS wordfinder log file EMBOSS wordfinder log file. beta12orEarlier beta12orEarlier true EMBOSS domainatrix log file beta12orEarlier EMBOSS (EMBASSY) domainatrix application log file. beta12orEarlier true EMBOSS sites log file true beta12orEarlier beta12orEarlier EMBOSS (EMBASSY) sites application log file. EMBOSS supermatcher error file EMBOSS (EMBASSY) supermatcher error file. beta12orEarlier beta12orEarlier true EMBOSS megamerger log file beta12orEarlier beta12orEarlier EMBOSS megamerger log file. true EMBOSS whichdb log file beta12orEarlier true EMBOSS megamerger log file. beta12orEarlier EMBOSS vectorstrip log file true beta12orEarlier beta12orEarlier EMBOSS vectorstrip log file. Username A username on a computer system. beta12orEarlier Password beta12orEarlier A password on a computer system. Email address beta12orEarlier Moby:Email A valid email address of an end-user. Moby:EmailAddress Person name beta12orEarlier The name of a person. Number of iterations 1.5 Number of iterations of an algorithm. true beta12orEarlier Number of output entities Number of entities (for example database hits, sequences, alignments etc) to write to an output file. 1.5 beta12orEarlier true Hit sort order Controls the order of hits (reported matches) in an output file from a database search. beta12orEarlier beta12orEarlier true Drug report An informative report on a specific drug. beta12orEarlier Drug annotation Phylogenetic tree image beta12orEarlier An image (for viewing or printing) of a phylogenetic tree including (typically) a plot of rooted or unrooted phylogenies, cladograms, circular trees or phenograms and associated information. See also 'Phylogenetic tree' RNA secondary structure image beta12orEarlier Image of RNA secondary structure, knots, pseudoknots etc. Protein secondary structure image Image of protein secondary structure. beta12orEarlier Structure image beta12orEarlier Image of one or more molecular tertiary (3D) structures. Sequence alignment image beta12orEarlier Image of two or more aligned molecular sequences possibly annotated with alignment features. Chemical structure image An image of the structure of a small chemical compound. The molecular identifier and formula are typically included. Small molecule structure image beta12orEarlier Fate map beta12orEarlier A fate map is a plan of early stage of an embryo such as a blastula, showing areas that are significance to development. Microarray spots image beta12orEarlier An image of spots from a microarray experiment. BioPax term beta12orEarlier A term from the BioPax ontology. beta12orEarlier true GO beta12orEarlier Gene Ontology term Moby:Annotated_GO_Term Moby:Annotated_GO_Term_With_Probability true A term definition from The Gene Ontology (GO). beta12orEarlier Moby:GO_Term Moby:GOTerm MeSH true A term from the MeSH vocabulary. beta12orEarlier beta12orEarlier HGNC beta12orEarlier true A term from the HGNC controlled vocabulary. beta12orEarlier NCBI taxonomy vocabulary beta12orEarlier beta12orEarlier true A term from the NCBI taxonomy vocabulary. Plant ontology term beta12orEarlier true beta12orEarlier A term from the Plant Ontology (PO). UMLS beta12orEarlier beta12orEarlier A term from the UMLS vocabulary. true FMA beta12orEarlier Classifies anatomical entities according to their shared characteristics (genus) and distinguishing characteristics (differentia). Specifies the part-whole and spatial relationships of the entities, morphological transformation of the entities during prenatal development and the postnatal life cycle and principles, rules and definitions according to which classes and relationships in the other three components of FMA are represented. beta12orEarlier A term from Foundational Model of Anatomy. true EMAP A term from the EMAP mouse ontology. true beta12orEarlier beta12orEarlier ChEBI beta12orEarlier A term from the ChEBI ontology. true beta12orEarlier MGED beta12orEarlier true A term from the MGED ontology. beta12orEarlier myGrid The ontology is provided as two components, the service ontology and the domain ontology. The domain ontology acts provides concepts for core bioinformatics data types and their relations. The service ontology describes the physical and operational features of web services. beta12orEarlier true A term from the myGrid ontology. beta12orEarlier GO (biological process) beta12orEarlier true beta12orEarlier Data Type is an enumerated string. A term definition for a biological process from the Gene Ontology (GO). GO (molecular function) A term definition for a molecular function from the Gene Ontology (GO). beta12orEarlier Data Type is an enumerated string. true beta12orEarlier GO (cellular component) beta12orEarlier true A term definition for a cellular component from the Gene Ontology (GO). beta12orEarlier Data Type is an enumerated string. Ontology relation type 1.5 beta12orEarlier true A relation type defined in an ontology. Ontology concept definition beta12orEarlier Ontology class definition The definition of a concept from an ontology. Ontology concept comment beta12orEarlier 1.4 true A comment on a concept from an ontology. Ontology concept reference beta12orEarlier true Reference for a concept from an ontology. beta12orEarlier doc2loc document information beta12orEarlier true The doc2loc output includes the url, format, type and availability code of a document for every service provider. beta12orEarlier Information on a published article provided by the doc2loc program. PDB residue number WHATIF: pdb_number PDBML:PDB_residue_no beta12orEarlier A residue identifier (a string) from a PDB file. Atomic coordinate Cartesian coordinate of an atom (in a molecular structure). beta12orEarlier Cartesian coordinate Atomic x coordinate WHATIF: PDBx_Cartn_x Cartesian x coordinate beta12orEarlier PDBML:_atom_site.Cartn_x in PDBML Cartesian x coordinate of an atom (in a molecular structure). Atomic y coordinate WHATIF: PDBx_Cartn_y Cartesian y coordinate beta12orEarlier PDBML:_atom_site.Cartn_y in PDBML Cartesian y coordinate of an atom (in a molecular structure). Atomic z coordinate PDBML:_atom_site.Cartn_z WHATIF: PDBx_Cartn_z Cartesian z coordinate of an atom (in a molecular structure). beta12orEarlier Cartesian z coordinate PDB atom name WHATIF: PDBx_type_symbol beta12orEarlier WHATIF: PDBx_auth_atom_id WHATIF: alternate_atom PDBML:pdbx_PDB_atom_name WHATIF: atom_type Identifier (a string) of a specific atom from a PDB file for a molecular structure. Protein atom Atom data CHEBI:33250 This is a broad data type and is used a placeholder for other, more specific types. It is primarily intended to help navigation of EDAM and would not typically be used for annotation. Data on a single atom from a protein structure. beta12orEarlier Protein residue beta12orEarlier Data on a single amino acid residue position in a protein structure. This is a broad data type and is used a placeholder for other, more specific types. It is primarily intended to help navigation of EDAM and would not typically be used for annotation. Residue Atom name Name of an atom. beta12orEarlier PDB residue name Three-letter amino acid residue names as used in PDB files. WHATIF: type beta12orEarlier PDB model number Identifier of a model structure from a PDB file. beta12orEarlier PDBML:pdbx_PDB_model_num Model number WHATIF: model_number CATH domain report beta12orEarlier true beta13 The report (for example http://www.cathdb.info/domain/1cukA01) includes CATH codes for levels in the hierarchy for the domain, level descriptions and relevant data and links. Summary of domain classification information for a CATH domain. CATH representative domain sequences (ATOM) beta12orEarlier beta12orEarlier FASTA sequence database (based on ATOM records in PDB) for CATH domains (clustered at different levels of sequence identity). true CATH representative domain sequences (COMBS) true FASTA sequence database (based on COMBS sequence data) for CATH domains (clustered at different levels of sequence identity). beta12orEarlier beta12orEarlier CATH domain sequences (ATOM) true FASTA sequence database for all CATH domains (based on PDB ATOM records). beta12orEarlier beta12orEarlier CATH domain sequences (COMBS) FASTA sequence database for all CATH domains (based on COMBS sequence data). beta12orEarlier true beta12orEarlier Sequence version beta12orEarlier Information on an molecular sequence version. Sequence version information Score A numerical value, that is some type of scored value arising for example from a prediction method. beta12orEarlier Protein report (function) true For properties that can be mapped to a sequence, use 'Sequence report' instead. beta13 Report on general functional properties of specific protein(s). beta12orEarlier Gene name (ASPGD) 1.3 beta12orEarlier true Name of a gene from Aspergillus Genome Database. http://www.geneontology.org/doc/GO.xrf_abbs:ASPGD_LOCUS Gene name (CGD) Name of a gene from Candida Genome Database. true http://www.geneontology.org/doc/GO.xrf_abbs:CGD_LOCUS beta12orEarlier 1.3 Gene name (dictyBase) http://www.geneontology.org/doc/GO.xrf_abbs:dictyBase beta12orEarlier 1.3 true Name of a gene from dictyBase database. Gene name (EcoGene primary) http://www.geneontology.org/doc/GO.xrf_abbs:ECOGENE_G Primary name of a gene from EcoGene Database. EcoGene primary gene name 1.3 true beta12orEarlier Gene name (MaizeGDB) http://www.geneontology.org/doc/GO.xrf_abbs:MaizeGDB_Locus 1.3 Name of a gene from MaizeGDB (maize genes) database. true beta12orEarlier Gene name (SGD) true 1.3 beta12orEarlier http://www.geneontology.org/doc/GO.xrf_abbs:SGD_LOCUS Name of a gene from Saccharomyces Genome Database. Gene name (TGD) beta12orEarlier 1.3 Name of a gene from Tetrahymena Genome Database. true http://www.geneontology.org/doc/GO.xrf_abbs:TGD_LOCUS Gene name (CGSC) beta12orEarlier 1.3 true http://www.geneontology.org/doc/GO.xrf_abbs: CGSC Symbol of a gene from E.coli Genetic Stock Center. Gene name (HGNC) beta12orEarlier HUGO symbol 1.3 true HGNC symbol Official gene name HUGO gene name http://www.geneontology.org/doc/GO.xrf_abbs: HGNC_gene HGNC gene name HUGO gene symbol HGNC:[0-9]{1,5} Gene name (HUGO) HGNC gene symbol Symbol of a gene approved by the HUGO Gene Nomenclature Committee. Gene name (MGD) MGI:[0-9]+ Symbol of a gene from the Mouse Genome Database. http://www.geneontology.org/doc/GO.xrf_abbs: MGD 1.3 true beta12orEarlier Gene name (Bacillus subtilis) http://www.geneontology.org/doc/GO.xrf_abbs: SUBTILISTG Symbol of a gene from Bacillus subtilis Genome Sequence Project. beta12orEarlier 1.3 true Gene ID (PlasmoDB) Identifier of a gene from PlasmoDB Plasmodium Genome Resource. beta12orEarlier http://www.geneontology.org/doc/GO.xrf_abbs: ApiDB_PlasmoDB Gene ID (EcoGene) Identifier of a gene from EcoGene Database. EcoGene Accession EcoGene ID beta12orEarlier Gene ID (FlyBase) beta12orEarlier Gene identifier from FlyBase database. http://www.geneontology.org/doc/GO.xrf_abbs: FB http://www.geneontology.org/doc/GO.xrf_abbs: FlyBase Gene ID (GeneDB Glossina morsitans) true http://www.geneontology.org/doc/GO.xrf_abbs: GeneDB_Gmorsitans beta13 Gene identifier from Glossina morsitans GeneDB database. beta12orEarlier Gene ID (GeneDB Leishmania major) Gene identifier from Leishmania major GeneDB database. true http://www.geneontology.org/doc/GO.xrf_abbs: GeneDB_Lmajor beta12orEarlier beta13 Gene ID (GeneDB Plasmodium falciparum) Gene identifier from Plasmodium falciparum GeneDB database. true http://www.geneontology.org/doc/GO.xrf_abbs: GeneDB_Pfalciparum beta13 beta12orEarlier Gene ID (GeneDB Schizosaccharomyces pombe) http://www.geneontology.org/doc/GO.xrf_abbs: GeneDB_Spombe beta12orEarlier true beta13 Gene identifier from Schizosaccharomyces pombe GeneDB database. Gene ID (GeneDB Trypanosoma brucei) Gene identifier from Trypanosoma brucei GeneDB database. true beta13 beta12orEarlier http://www.geneontology.org/doc/GO.xrf_abbs: GeneDB_Tbrucei Gene ID (Gramene) http://www.geneontology.org/doc/GO.xrf_abbs: GR_gene beta12orEarlier http://www.geneontology.org/doc/GO.xrf_abbs: GR_GENE Gene identifier from Gramene database. Gene ID (Virginia microbial) beta12orEarlier http://www.geneontology.org/doc/GO.xrf_abbs: PAMGO_VMD Gene identifier from Virginia Bioinformatics Institute microbial database. http://www.geneontology.org/doc/GO.xrf_abbs: VMD Gene ID (SGN) http://www.geneontology.org/doc/GO.xrf_abbs: SGN Gene identifier from Sol Genomics Network. beta12orEarlier Gene ID (WormBase) Gene identifier used by WormBase database. WBGene[0-9]{8} http://www.geneontology.org/doc/GO.xrf_abbs: WB http://www.geneontology.org/doc/GO.xrf_abbs: WormBase beta12orEarlier Gene synonym Gene name synonym true Any name (other than the recommended one) for a gene. beta12orEarlier beta12orEarlier ORF name beta12orEarlier The name of an open reading frame attributed by a sequencing project. Sequence assembly component A component of a larger sequence assembly. true beta12orEarlier beta12orEarlier Chromosome annotation (aberration) beta12orEarlier beta12orEarlier true A report on a chromosome aberration such as abnormalities in chromosome structure. Clone ID beta12orEarlier An identifier of a clone (cloned molecular sequence) from a database. PDB insertion code beta12orEarlier WHATIF: insertion_code PDBML:pdbx_PDB_ins_code An insertion code (part of the residue number) for an amino acid residue from a PDB file. Atomic occupancy WHATIF: PDBx_occupancy The fraction of an atom type present at a site in a molecular structure. beta12orEarlier The sum of the occupancies of all the atom types at a site should not normally significantly exceed 1.0. Isotropic B factor Isotropic B factor (atomic displacement parameter) for an atom from a PDB file. WHATIF: PDBx_B_iso_or_equiv beta12orEarlier Deletion map A cytogenetic map is built from a set of mutant cell lines with sub-chromosomal deletions and a reference wild-type line ('genome deletion panel'). The panel is used to map markers onto the genome by comparing mutant to wild-type banding patterns. Markers are linked (occur in the same deleted region) if they share the same banding pattern (presence or absence) as the deletion panel. beta12orEarlier A cytogenetic map showing chromosome banding patterns in mutant cell lines relative to the wild type. Deletion-based cytogenetic map QTL map A genetic map which shows the approximate location of quantitative trait loci (QTL) between two or more markers. beta12orEarlier Quantitative trait locus map Haplotype map beta12orEarlier Moby:Haplotyping_Study_obj A map of haplotypes in a genome or other sequence, describing common patterns of genetic variation. Map set data beta12orEarlier Data describing a set of multiple genetic or physical maps, typically sharing a common set of features which are mapped. Moby:GCP_CorrelatedLinkageMapSet Moby:GCP_CorrelatedMapSet Map feature beta12orEarlier true A feature which may mapped (positioned) on a genetic or other type of map. Moby:MapFeature beta12orEarlier Mappable features may be based on Gramene's notion of map features; see http://www.gramene.org/db/cmap/feature_type_info. Map type A designation of the type of map (genetic map, physical map, sequence map etc) or map set. Map types may be based on Gramene's notion of a map type; see http://www.gramene.org/db/cmap/map_type_info. 1.5 true beta12orEarlier Protein fold name The name of a protein fold. beta12orEarlier Taxon Moby:PotentialTaxon Taxonomy rank beta12orEarlier Taxonomic rank For a complete list of taxonomic ranks see https://www.phenoscape.org/wiki/Taxonomic_Rank_Vocabulary. The name of a group of organisms belonging to the same taxonomic rank. Moby:BriefTaxonConcept Organism identifier beta12orEarlier A unique identifier of a (group of) organisms. Genus name beta12orEarlier The name of a genus of organism. Taxonomic classification Moby:TaxonName Moby:GCP_Taxon beta12orEarlier The full name for a group of organisms, reflecting their biological classification and (usually) conforming to a standard nomenclature. Moby:iANT_organism-xml Taxonomic name Name components correspond to levels in a taxonomic hierarchy (e.g. 'Genus', 'Species', etc.) Meta information such as a reference where the name was defined and a date might be included. Taxonomic information Moby:TaxonScientificName Moby:TaxonTCS iHOP organism ID beta12orEarlier Moby_namespace:iHOPorganism A unique identifier for an organism used in the iHOP database. Genbank common name Common name for an organism as used in the GenBank database. beta12orEarlier NCBI taxon The name of a taxon from the NCBI taxonomy database. beta12orEarlier Synonym beta12orEarlier Alternative name beta12orEarlier true An alternative for a word. Misspelling A common misspelling of a word. beta12orEarlier true beta12orEarlier Acronym true An abbreviation of a phrase or word. beta12orEarlier beta12orEarlier Misnomer A term which is likely to be misleading of its meaning. beta12orEarlier beta12orEarlier true Author ID Information on the authors of a published work. Moby:Author beta12orEarlier DragonDB author identifier An identifier representing an author in the DragonDB database. beta12orEarlier Annotated URI beta12orEarlier A URI along with annotation describing the data found at the address. Moby:DescribedLink UniProt keywords true beta12orEarlier beta12orEarlier A controlled vocabulary for words and phrases that can appear in the keywords field (KW line) of entries from the UniProt database. Gene ID (GeneFarm) Moby_namespace:GENEFARM_GeneID Identifier of a gene from the GeneFarm database. beta12orEarlier Blattner number beta12orEarlier Moby_namespace:Blattner_number The blattner identifier for a gene. Gene ID (MIPS Maize) MIPS genetic element identifier (Maize) Identifier for genetic elements in MIPS Maize database. beta12orEarlier Moby_namespace:MIPS_GE_Maize beta13 true Gene ID (MIPS Medicago) MIPS genetic element identifier (Medicago) beta12orEarlier beta13 true Moby_namespace:MIPS_GE_Medicago Identifier for genetic elements in MIPS Medicago database. Gene name (DragonDB) true The name of an Antirrhinum Gene from the DragonDB database. beta12orEarlier Moby_namespace:DragonDB_Gene 1.3 Gene name (Arabidopsis) Moby_namespace:ArabidopsisGeneSymbol true A unique identifier for an Arabidopsis gene, which is an acronym or abbreviation of the gene name. beta12orEarlier 1.3 iHOP symbol A unique identifier of a protein or gene used in the iHOP database. Moby_namespace:iHOPsymbol beta12orEarlier Gene name (GeneFarm) 1.3 true Name of a gene from the GeneFarm database. Moby_namespace:GENEFARM_GeneName GeneFarm gene ID beta12orEarlier Locus ID A unique name or other identifier of a genetic locus, typically conforming to a scheme that names loci (such as predicted genes) depending on their position in a molecular sequence, for example a completely sequenced genome or chromosome. Locus name beta12orEarlier Locus identifier Locus ID (AGI) AT[1-5]G[0-9]{5} AGI ID Locus identifier for Arabidopsis Genome Initiative (TAIR, TIGR and MIPS databases) http://www.geneontology.org/doc/GO.xrf_abbs:AGI_LocusCode Arabidopsis gene loci number AGI locus code beta12orEarlier AGI identifier Locus ID (ASPGD) beta12orEarlier http://www.geneontology.org/doc/GO.xrf_abbs: ASPGD http://www.geneontology.org/doc/GO.xrf_abbs: ASPGDID Identifier for loci from ASPGD (Aspergillus Genome Database). Locus ID (MGG) Identifier for loci from Magnaporthe grisea Database at the Broad Institute. http://www.geneontology.org/doc/GO.xrf_abbs: Broad_MGG beta12orEarlier Locus ID (CGD) Identifier for loci from CGD (Candida Genome Database). http://www.geneontology.org/doc/GO.xrf_abbs: CGDID beta12orEarlier CGDID CGD locus identifier http://www.geneontology.org/doc/GO.xrf_abbs: CGD Locus ID (CMR) http://www.geneontology.org/doc/GO.xrf_abbs: TIGR_CMR Locus identifier for Comprehensive Microbial Resource at the J. Craig Venter Institute. http://www.geneontology.org/doc/GO.xrf_abbs: JCVI_CMR beta12orEarlier NCBI locus tag beta12orEarlier Moby_namespace:LocusID Locus ID (NCBI) http://www.geneontology.org/doc/GO.xrf_abbs: NCBI_locus_tag Identifier for loci from NCBI database. Locus ID (SGD) Identifier for loci from SGD (Saccharomyces Genome Database). http://www.geneontology.org/doc/GO.xrf_abbs: SGDID beta12orEarlier http://www.geneontology.org/doc/GO.xrf_abbs: SGD SGDID Locus ID (MMP) Identifier of loci from Maize Mapping Project. Moby_namespace:MMP_Locus beta12orEarlier Locus ID (DictyBase) Moby_namespace:DDB_gene Identifier of locus from DictyBase (Dictyostelium discoideum). beta12orEarlier Locus ID (EntrezGene) Identifier of a locus from EntrezGene database. beta12orEarlier Moby_namespace:EntrezGene_ID Moby_namespace:EntrezGene_EntrezGeneID Locus ID (MaizeGDB) Identifier of locus from MaizeGDB (Maize genome database). Moby_namespace:MaizeGDB_Locus beta12orEarlier Quantitative trait locus QTL A QTL sometimes but does not necessarily correspond to a gene. true beta12orEarlier beta12orEarlier A stretch of DNA that is closely linked to the genes underlying a quantitative trait (a phenotype that varies in degree and depends upon the interactions between multiple genes and their environment). Moby:SO_QTL Gene ID (KOME) Identifier of a gene from the KOME database. beta12orEarlier Moby_namespace:GeneId Locus ID (Tropgene) Identifier of a locus from the Tropgene database. Moby:Tropgene_locus beta12orEarlier Alignment An alignment of molecular sequences, structures or profiles derived from them. beta12orEarlier Atomic property General atomic property Data for an atom (in a molecular structure). beta12orEarlier UniProt keyword beta12orEarlier A word or phrase that can appear in the keywords field (KW line) of entries from the UniProt database. Moby_namespace:SP_KW http://www.geneontology.org/doc/GO.xrf_abbs: SP_KW Ordered locus name beta12orEarlier true A name for a genetic locus conforming to a scheme that names loci (such as predicted genes) depending on their position in a molecular sequence, for example a completely sequenced genome or chromosome. beta12orEarlier Sequence coordinates Map position Moby:Position Locus Sequence co-ordinates A position in a map (for example a genetic map), either a single position (point) or a region / interval. Moby:GenePosition This includes positions in genomes based on a reference sequence. A position may be specified for any mappable object, i.e. anything that may have positional information such as a physical position in a chromosome. Data might include sequence region name, strand, coordinate system name, assembly name, start position and end position. Moby:HitPosition beta12orEarlier Moby:MapPosition Moby:Locus Moby:GCP_MapInterval Moby:GCP_MapPosition Moby:GCP_MapPoint PDBML:_atom_site.id Amino acid property Data concerning the intrinsic physical (e.g. structural) or chemical properties of one, more or all amino acids. Amino acid data beta12orEarlier Annotation beta12orEarlier true beta13 This is a broad data type and is used a placeholder for other, more specific types. A human-readable collection of information which (typically) is generated or collated by hand and which describes a biological entity, phenomena or associated primary (e.g. sequence or structural) data, as distinct from the primary data itself and computer-generated reports derived from it. Map data Map attribute beta12orEarlier An attribute of a molecular map (genetic or physical), or data extracted from or derived from the analysis of such a map. Vienna RNA structural data true Data used by the Vienna RNA analysis package. beta12orEarlier beta12orEarlier Sequence mask parameter beta12orEarlier 1.5 true Data used to replace (mask) characters in a molecular sequence. Enzyme kinetics data Data concerning chemical reaction(s) catalysed by enzyme(s). beta12orEarlier This is a broad data type and is used a placeholder for other, more specific types. Michaelis Menten plot A plot giving an approximation of the kinetics of an enzyme-catalysed reaction, assuming simple kinetics (i.e. no intermediate or product inhibition, allostericity or cooperativity). It plots initial reaction rate to the substrate concentration (S) from which the maximum rate (vmax) is apparent. beta12orEarlier Hanes Woolf plot beta12orEarlier A plot based on the Michaelis Menten equation of enzyme kinetics plotting the ratio of the initial substrate concentration (S) against the reaction velocity (v). Experimental data This is a broad data type and is used a placeholder for other, more specific types. It is primarily intended to help navigation of EDAM and would not typically be used for annotation. true Raw data from or annotation on laboratory experiments. beta12orEarlier Experimental measurement data beta13 Genome version information beta12orEarlier true Information on a genome version. 1.5 Evidence Typically a statement about some data or results, including evidence or the source of a statement, which may include computational prediction, laboratory experiment, literature reference etc. beta12orEarlier Sequence record lite beta12orEarlier A molecular sequence and minimal metadata, typically an identifier of the sequence and/or a comment. true 1.8 Sequence http://purl.bioontology.org/ontology/MSH/D008969 Sequences http://purl.org/biotop/biotop.owl#BioMolecularSequenceInformation This concept is a placeholder of concepts for primary sequence data including raw sequences and sequence records. It should not normally be used for derivatives such as sequence alignments, motifs or profiles. beta12orEarlier One or more molecular sequences, possibly with associated annotation. Nucleic acid sequence record (lite) beta12orEarlier 1.8 true A nucleic acid sequence and minimal metadata, typically an identifier of the sequence and/or a comment. Protein sequence record (lite) 1.8 Sequence record lite (protein) beta12orEarlier A protein sequence and minimal metadata, typically an identifier of the sequence and/or a comment. true Report You can use this term by default for any textual report, in case you can't find another, more specific term. Reports may be generated automatically or collated by hand and can include metadata on the origin, source, history, ownership or location of some thing. http://semanticscience.org/resource/SIO_000148 Document A human-readable collection of information including annotation on a biological entity or phenomena, computer-generated reports of analysis of primary data (e.g. sequence or structural), and metadata (data about primary data) or any other free (essentially unformatted) text, as distinct from the primary data itself. beta12orEarlier Molecular property (general) General molecular property General data for a molecule. beta12orEarlier Structural data This is a broad data type and is used a placeholder for other, more specific types. beta12orEarlier true Data concerning molecular structural data. beta13 Sequence motif (nucleic acid) Nucleic acid sequence motif DNA sequence motif A nucleotide sequence motif. beta12orEarlier RNA sequence motif Sequence motif (protein) beta12orEarlier An amino acid sequence motif. Protein sequence motif Search parameter beta12orEarlier 1.5 true Some simple value controlling a search operation, typically a search of a database. Database search results beta12orEarlier A report of hits from searching a database of some type. Search results Database hits Secondary structure 1.5 true beta12orEarlier The secondary structure assignment (predicted or real) of a nucleic acid or protein. Matrix beta12orEarlier Array This is a broad data type and is used a placeholder for other, more specific types. An array of numerical values. Alignment data beta12orEarlier 1.8 true Data concerning, extracted from, or derived from the analysis of molecular alignment of some type. This is a broad data type and is used a placeholder for other, more specific types. Alignment report Nucleic acid report An informative human-readable report about one or more specific nucleic acid molecules, derived from analysis of primary (sequence or structural) data. beta12orEarlier Structure report An informative report on general information, properties or features of one or more molecular tertiary (3D) structures. beta12orEarlier Structure-derived report Nucleic acid structure data Nucleic acid property (structural) This includes reports on the stiffness, curvature, twist/roll data or other conformational parameters or properties. Nucleic acid structural property beta12orEarlier A report on nucleic acid structure-derived data, describing structural properties of a DNA molecule, or any other annotation or information about specific nucleic acid 3D structure(s). Molecular property beta12orEarlier SO:0000400 A report on the physical (e.g. structural) or chemical properties of molecules, or parts of a molecule. Physicochemical property DNA base structural data Structural data for DNA base pairs or runs of bases, such as energy or angle data. beta12orEarlier Database entry version information true beta12orEarlier 1.5 Information on a database (or ontology) entry version, such as name (or other identifier) or parent database, unique identifier of entry, data, author and so on. Accession beta12orEarlier http://semanticscience.org/resource/SIO_000731 A persistent (stable) and unique identifier, typically identifying an object (entry) from a database. http://semanticscience.org/resource/SIO_000675 SNP single nucleotide polymorphism (SNP) in a DNA sequence. true beta12orEarlier 1.8 Data reference A list of database accessions or identifiers are usually included. Reference to a dataset (or a cross-reference between two datasets), typically one or more entries in a biological database or ontology. beta12orEarlier Job identifier http://wsio.org/data_009 An identifier of a submitted job. beta12orEarlier Name http://semanticscience.org/resource/SIO_000116 http://usefulinc.com/ns/doap#name "http://www.w3.org/2000/01/rdf-schema#label beta12orEarlier A name of a thing, which need not necessarily uniquely identify it. Symbolic name Closely related, but focusing on labeling and human readability but not on identification. Type A label (text token) describing the type of a thing, typically an enumerated string (a string with one of a limited set of values). http://purl.org/dc/elements/1.1/type 1.5 beta12orEarlier true User ID An identifier of a software end-user (typically a person). beta12orEarlier KEGG organism code A three-letter code used in the KEGG databases to uniquely identify organisms. beta12orEarlier Gene name (KEGG GENES) beta12orEarlier KEGG GENES entry name [a-zA-Z_0-9]+:[a-zA-Z_0-9\.-]* Name of an entry (gene) from the KEGG GENES database. Moby_namespace:GeneId true 1.3 BioCyc ID Identifier of an object from one of the BioCyc databases. beta12orEarlier Compound ID (BioCyc) BioCyc compound identifier Identifier of a compound from the BioCyc chemical compounds database. BioCyc compound ID beta12orEarlier Reaction ID (BioCyc) beta12orEarlier Identifier of a biological reaction from the BioCyc reactions database. Enzyme ID (BioCyc) BioCyc enzyme ID beta12orEarlier Identifier of an enzyme from the BioCyc enzymes database. Reaction ID beta12orEarlier Identifier of a biological reaction from a database. Identifier (hybrid) An identifier that is re-used for data objects of fundamentally different types (typically served from a single database). beta12orEarlier This branch provides an alternative organisation of the concepts nested under 'Accession' and 'Name'. All concepts under here are already included under 'Accession' or 'Name'. Molecular property identifier beta12orEarlier Identifier of a molecular property. Codon usage table ID Identifier of a codon usage table, for example a genetic code. Codon usage table identifier beta12orEarlier FlyBase primary identifier beta12orEarlier Primary identifier of an object from the FlyBase database. WormBase identifier beta12orEarlier Identifier of an object from the WormBase database. WormBase wormpep ID Protein identifier used by WormBase database. CE[0-9]{5} beta12orEarlier Nucleic acid features (codon) beta12orEarlier true An informative report on a trinucleotide sequence that encodes an amino acid including the triplet sequence, the encoded amino acid or whether it is a start or stop codon. beta12orEarlier Map identifier An identifier of a map of a molecular sequence. beta12orEarlier Person identifier An identifier of a software end-user (typically a person). beta12orEarlier Nucleic acid identifier Name or other identifier of a nucleic acid molecule. beta12orEarlier Translation frame specification beta12orEarlier Frame for translation of DNA (3 forward and 3 reverse frames relative to a chromosome). Genetic code identifier An identifier of a genetic code. beta12orEarlier Genetic code name Informal name for a genetic code, typically an organism name. beta12orEarlier File format name Name of a file format such as HTML, PNG, PDF, EMBL, GenBank and so on. beta12orEarlier Sequence profile type true 1.5 A label (text token) describing a type of sequence profile such as frequency matrix, Gribskov profile, hidden Markov model etc. beta12orEarlier Operating system name beta12orEarlier Name of a computer operating system such as Linux, PC or Mac. Mutation type beta12orEarlier true beta12orEarlier A type of point or block mutation, including insertion, deletion, change, duplication and moves. Logical operator beta12orEarlier A logical operator such as OR, AND, XOR, and NOT. Results sort order Possible options including sorting by score, rank, by increasing P-value (probability, i.e. most statistically significant hits given first) and so on. beta12orEarlier true 1.5 A control of the order of data that is output, for example the order of sequences in an alignment. Toggle beta12orEarlier A simple parameter that is a toggle (boolean value), typically a control for a modal tool. true beta12orEarlier Sequence width true beta12orEarlier beta12orEarlier The width of an output sequence or alignment. Gap penalty beta12orEarlier A penalty for introducing or extending a gap in an alignment. Nucleic acid melting temperature beta12orEarlier A temperature concerning nucleic acid denaturation, typically the temperature at which the two strands of a hybridized or double stranded nucleic acid (DNA or RNA/DNA) molecule separate. Melting temperature Concentration beta12orEarlier The concentration of a chemical compound. Window step size 1.5 beta12orEarlier true Size of the incremental 'step' a sequence window is moved over a sequence. EMBOSS graph beta12orEarlier true beta12orEarlier An image of a graph generated by the EMBOSS suite. EMBOSS report An application report generated by the EMBOSS suite. beta12orEarlier beta12orEarlier true Sequence offset true beta12orEarlier 1.5 An offset for a single-point sequence position. Threshold 1.5 beta12orEarlier true A value that serves as a threshold for a tool (usually to control scoring or output). Protein report (transcription factor) beta13 true This might include conformational or physicochemical properties, as well as sequence information for transcription factor(s) binding sites. An informative report on a transcription factor protein. Transcription factor binding site data beta12orEarlier Database category name true The name of a category of biological or bioinformatics database. beta12orEarlier beta12orEarlier Sequence profile name beta12orEarlier Name of a sequence profile. true beta12orEarlier Color Specification of one or more colors. beta12orEarlier true beta12orEarlier Rendering parameter true beta12orEarlier 1.5 A parameter that is used to control rendering (drawing) to a device or image. Graphics parameter Graphical parameter Sequence name Any arbitrary name of a molecular sequence. beta12orEarlier Date 1.5 A temporal date. beta12orEarlier true Word composition beta12orEarlier Word composition data for a molecular sequence. true beta12orEarlier Fickett testcode plot A plot of Fickett testcode statistic (identifying protein coding regions) in a nucleotide sequences. beta12orEarlier Sequence similarity plot Use this concept for calculated substitution rates, relative site variability, data on sites with biased properties, highly conserved or very poorly conserved sites, regions, blocks etc. beta12orEarlier Sequence conservation report Sequence similarity plot A plot of sequence similarities identified from word-matching or character comparison. Helical wheel beta12orEarlier An image of peptide sequence sequence looking down the axis of the helix for highlighting amphipathicity and other properties. Helical net beta12orEarlier Useful for highlighting amphipathicity and other properties. An image of peptide sequence sequence in a simple 3,4,3,4 repeating pattern that emulates at a simple level the arrangement of residues around an alpha helix. Protein sequence properties plot true beta12orEarlier beta12orEarlier A plot of general physicochemical properties of a protein sequence. Protein ionization curve beta12orEarlier A plot of pK versus pH for a protein. Sequence composition plot beta12orEarlier A plot of character or word composition / frequency of a molecular sequence. Nucleic acid density plot beta12orEarlier Density plot (of base composition) for a nucleotide sequence. Sequence trace image Image of a sequence trace (nucleotide sequence versus probabilities of each of the 4 bases). beta12orEarlier Nucleic acid features (siRNA) true 1.5 beta12orEarlier A report on siRNA duplexes in mRNA. Sequence set (stream) beta12orEarlier true This concept may be used for sequence sets that are expected to be read and processed a single sequence at a time. A collection of multiple molecular sequences and (typically) associated metadata that is intended for sequential processing. beta12orEarlier FlyBase secondary identifier Secondary identifier of an object from the FlyBase database. Secondary identifier are used to handle entries that were merged with or split from other entries in the database. beta12orEarlier Cardinality The number of a certain thing. beta12orEarlier true beta12orEarlier Exactly 1 beta12orEarlier beta12orEarlier A single thing. true 1 or more One or more things. beta12orEarlier true beta12orEarlier Exactly 2 Exactly two things. beta12orEarlier true beta12orEarlier 2 or more Two or more things. beta12orEarlier beta12orEarlier true Sequence checksum A fixed-size datum calculated (by using a hash function) for a molecular sequence, typically for purposes of error detection or indexing. beta12orEarlier Hash code Hash sum Hash Hash value Protein features report (chemical modifications) 1.8 beta12orEarlier chemical modification of a protein. true Error beta12orEarlier Data on an error generated by computer system or tool. 1.5 true Database entry metadata beta12orEarlier Basic information on any arbitrary database entry. Gene cluster beta13 true beta12orEarlier A cluster of similar genes. Sequence record full true beta12orEarlier A molecular sequence and comprehensive metadata (such as a feature table), typically corresponding to a full entry from a molecular sequence database. 1.8 Plasmid identifier An identifier of a plasmid in a database. beta12orEarlier Mutation ID beta12orEarlier A unique identifier of a specific mutation catalogued in a database. Mutation annotation (basic) Information describing the mutation itself, the organ site, tissue and type of lesion where the mutation has been identified, description of the patient origin and life-style. beta12orEarlier true beta12orEarlier Mutation annotation (prevalence) beta12orEarlier true An informative report on the prevalence of mutation(s), including data on samples and mutation prevalence (e.g. by tumour type).. beta12orEarlier Mutation annotation (prognostic) beta12orEarlier An informative report on mutation prognostic data, such as information on patient cohort, the study settings and the results of the study. beta12orEarlier true Mutation annotation (functional) An informative report on the functional properties of mutant proteins including transcriptional activities, promotion of cell growth and tumorigenicity, dominant negative effects, capacity to induce apoptosis, cell-cycle arrest or checkpoints in human cells and so on. true beta12orEarlier beta12orEarlier Codon number beta12orEarlier The number of a codon, for instance, at which a mutation is located. Tumor annotation true 1.4 An informative report on a specific tumor including nature and origin of the sample, anatomic site, organ or tissue, tumor type, including morphology and/or histologic type, and so on. beta12orEarlier Server metadata Basic information about a server on the web, such as an SRS server. beta12orEarlier 1.5 true Database field name The name of a field in a database. beta12orEarlier Sequence cluster ID (SYSTERS) SYSTERS cluster ID Unique identifier of a sequence cluster from the SYSTERS database. beta12orEarlier Ontology metadata beta12orEarlier Data concerning a biological ontology. Raw SCOP domain classification true beta12orEarlier Raw SCOP domain classification data files. beta13 These are the parsable data files provided by SCOP. Raw CATH domain classification Raw CATH domain classification data files. These are the parsable data files provided by CATH. true beta13 beta12orEarlier Heterogen annotation 1.4 true beta12orEarlier An informative report on the types of small molecules or 'heterogens' (non-protein groups) that are represented in PDB files. Phylogenetic property values beta12orEarlier Phylogenetic property values data. true beta12orEarlier Sequence set (bootstrapped) 1.5 beta12orEarlier Bootstrapping is often performed in phylogenetic analysis. true A collection of sequences output from a bootstrapping (resampling) procedure. Phylogenetic consensus tree true A consensus phylogenetic tree derived from comparison of multiple trees. beta12orEarlier beta12orEarlier Schema beta12orEarlier true A data schema for organising or transforming data of some type. 1.5 DTD A DTD (document type definition). true beta12orEarlier 1.5 XML Schema beta12orEarlier XSD An XML Schema. true 1.5 Relax-NG schema beta12orEarlier 1.5 A relax-NG schema. true XSLT stylesheet 1.5 beta12orEarlier An XSLT stylesheet. true Data resource definition name beta12orEarlier The name of a data type. OBO file format name Name of an OBO file format such as OBO-XML, plain and so on. beta12orEarlier Gene ID (MIPS) Identifier for genetic elements in MIPS database. beta12orEarlier MIPS genetic element identifier Sequence identifier (protein) An identifier of protein sequence(s) or protein sequence database entries. beta12orEarlier beta12orEarlier true Sequence identifier (nucleic acid) An identifier of nucleotide sequence(s) or nucleotide sequence database entries. beta12orEarlier true beta12orEarlier EMBL accession EMBL ID beta12orEarlier EMBL accession number EMBL identifier An accession number of an entry from the EMBL sequence database. UniProt ID UniProtKB identifier An identifier of a polypeptide in the UniProt database. UniProtKB entry name beta12orEarlier UniProt identifier UniProt entry name GenBank accession GenBank ID GenBank identifier Accession number of an entry from the GenBank sequence database. beta12orEarlier GenBank accession number Gramene secondary identifier beta12orEarlier Gramene internal identifier Gramene internal ID Secondary (internal) identifier of a Gramene database entry. Gramene secondary ID Sequence variation ID An identifier of an entry from a database of molecular sequence variation. beta12orEarlier Gene ID Gene accession beta12orEarlier A unique (and typically persistent) identifier of a gene in a database, that is (typically) different to the gene name/symbol. Gene code Gene name (AceView) AceView gene name 1.3 true Name of an entry (gene) from the AceView genes database. beta12orEarlier Gene ID (ECK) ECK accession beta12orEarlier E. coli K-12 gene identifier Identifier of an E. coli K-12 gene from EcoGene Database. http://www.geneontology.org/doc/GO.xrf_abbs: ECK Gene ID (HGNC) HGNC ID beta12orEarlier Identifier for a gene approved by the HUGO Gene Nomenclature Committee. Gene name The name of a gene, (typically) assigned by a person and/or according to a naming scheme. It may contain white space characters and is typically more intuitive and readable than a gene symbol. It (typically) may be used to identify similar genes in different species and to derive a gene symbol. Allele name beta12orEarlier Gene name (NCBI) beta12orEarlier 1.3 NCBI gene name Name of an entry (gene) from the NCBI genes database. true SMILES string A specification of a chemical structure in SMILES format. beta12orEarlier STRING ID Unique identifier of an entry from the STRING database of protein-protein interactions. beta12orEarlier Virus annotation An informative report on a specific virus. true 1.4 beta12orEarlier Virus annotation (taxonomy) An informative report on the taxonomy of a specific virus. beta12orEarlier true 1.4 Reaction ID (SABIO-RK) Identifier of a biological reaction from the SABIO-RK reactions database. beta12orEarlier [0-9]+ Carbohydrate report Annotation on or information derived from one or more specific carbohydrate 3D structure(s). beta12orEarlier GI number beta12orEarlier NCBI GI number gi number A series of digits that are assigned consecutively to each sequence record processed by NCBI. The GI number bears no resemblance to the Accession number of the sequence record. Nucleotide sequence GI number is shown in the VERSION field of the database record. Protein sequence GI number is shown in the CDS/db_xref field of a nucleotide database record, and the VERSION field of a protein database record. NCBI version beta12orEarlier NCBI accession.version Nucleotide sequence version contains two letters followed by six digits, a dot, and a version number (or for older nucleotide sequence records, the format is one letter followed by five digits, a dot, and a version number). Protein sequence version contains three letters followed by five digits, a dot, and a version number. An identifier assigned to sequence records processed by NCBI, made of the accession number of the database record followed by a dot and a version number. accession.version Cell line name beta12orEarlier The name of a cell line. Cell line name (exact) beta12orEarlier The name of a cell line. Cell line name (truncated) The name of a cell line. beta12orEarlier Cell line name (no punctuation) The name of a cell line. beta12orEarlier Cell line name (assonant) The name of a cell line. beta12orEarlier Enzyme ID beta12orEarlier A unique, persistent identifier of an enzyme. Enzyme accession REBASE enzyme number Identifier of an enzyme from the REBASE enzymes database. beta12orEarlier DrugBank ID beta12orEarlier DB[0-9]{5} Unique identifier of a drug from the DrugBank database. GI number (protein) beta12orEarlier protein gi number A unique identifier assigned to NCBI protein sequence records. Nucleotide sequence GI number is shown in the VERSION field of the database record. Protein sequence GI number is shown in the CDS/db_xref field of a nucleotide database record, and the VERSION field of a protein database record. protein gi Bit score A score derived from the alignment of two sequences, which is then normalized with respect to the scoring system. Bit scores are normalized with respect to the scoring system and therefore can be used to compare alignment scores from different searches. beta12orEarlier Translation phase specification beta12orEarlier Phase for translation of DNA (0, 1 or 2) relative to a fragment of the coding sequence. Phase Resource metadata Data concerning or describing some core computational resource, as distinct from primary data. This includes metadata on the origin, source, history, ownership or location of some thing. This is a broad data type and is used a placeholder for other, more specific types. Provenance metadata beta12orEarlier Ontology identifier beta12orEarlier Any arbitrary identifier of an ontology. Ontology concept name The name of a concept in an ontology. beta12orEarlier Genome build identifier beta12orEarlier An identifier of a build of a particular genome. Pathway or network name The name of a biological pathway or network. beta12orEarlier Pathway ID (KEGG) Identifier of a pathway from the KEGG pathway database. beta12orEarlier [a-zA-Z_0-9]{2,3}[0-9]{5} KEGG pathway ID Pathway ID (NCI-Nature) beta12orEarlier [a-zA-Z_0-9]+ Identifier of a pathway from the NCI-Nature pathway database. Pathway ID (ConsensusPathDB) beta12orEarlier Identifier of a pathway from the ConsensusPathDB pathway database. Sequence cluster ID (UniRef) Unique identifier of an entry from the UniRef database. UniRef cluster id UniRef entry accession beta12orEarlier Sequence cluster ID (UniRef100) UniRef100 cluster id beta12orEarlier UniRef100 entry accession Unique identifier of an entry from the UniRef100 database. Sequence cluster ID (UniRef90) UniRef90 entry accession beta12orEarlier UniRef90 cluster id Unique identifier of an entry from the UniRef90 database. Sequence cluster ID (UniRef50) beta12orEarlier UniRef50 cluster id UniRef50 entry accession Unique identifier of an entry from the UniRef50 database. Ontology data Data concerning or derived from an ontology. Ontological data beta12orEarlier This is a broad data type and is used a placeholder for other, more specific types. RNA family report beta12orEarlier An informative report on a specific RNA family or other group of classified RNA sequences. RNA family annotation RNA family identifier beta12orEarlier Identifier of an RNA family, typically an entry from a RNA sequence classification database. RFAM accession Stable accession number of an entry (RNA family) from the RFAM database. beta12orEarlier Protein signature type beta12orEarlier true A label (text token) describing a type of protein family signature (sequence classifier) from the InterPro database. 1.5 Domain-nucleic acid interaction report 1.5 true An informative report on protein domain-DNA/RNA interaction(s). beta12orEarlier Domain-domain interactions 1.8 An informative report on protein domain-protein domain interaction(s). beta12orEarlier true Domain-domain interaction (indirect) true beta12orEarlier beta12orEarlier Data on indirect protein domain-protein domain interaction(s). Sequence accession (hybrid) Accession number of a nucleotide or protein sequence database entry. beta12orEarlier 2D PAGE data This is a broad data type and is used a placeholder for other, more specific types. It is primarily intended to help navigation of EDAM and would not typically be used for annotation. beta13 beta12orEarlier true Data concerning two-dimensional polygel electrophoresis. 2D PAGE report beta12orEarlier two-dimensional gel electrophoresis experiments, gels or spots in a gel. 1.8 true Pathway or network accession A persistent, unique identifier of a biological pathway or network (typically a database entry). beta12orEarlier Secondary structure alignment Alignment of the (1D representations of) secondary structure of two or more molecules. beta12orEarlier ASTD ID beta12orEarlier Identifier of an object from the ASTD database. ASTD ID (exon) beta12orEarlier Identifier of an exon from the ASTD database. ASTD ID (intron) beta12orEarlier Identifier of an intron from the ASTD database. ASTD ID (polya) Identifier of a polyA signal from the ASTD database. beta12orEarlier ASTD ID (tss) Identifier of a transcription start site from the ASTD database. beta12orEarlier 2D PAGE spot report 2D PAGE spot annotation beta12orEarlier An informative report on individual spot(s) from a two-dimensional (2D PAGE) gel. 1.8 true Spot ID beta12orEarlier Unique identifier of a spot from a two-dimensional (protein) gel. Spot serial number Unique identifier of a spot from a two-dimensional (protein) gel in the SWISS-2DPAGE database. beta12orEarlier Spot ID (HSC-2DPAGE) Unique identifier of a spot from a two-dimensional (protein) gel from a HSC-2DPAGE database. beta12orEarlier Protein-motif interaction beta13 true Data on the interaction of a protein (or protein domain) with specific structural (3D) and/or sequence motifs. beta12orEarlier Strain identifier Identifier of a strain of an organism variant, typically a plant, virus or bacterium. beta12orEarlier CABRI accession A unique identifier of an item from the CABRI database. beta12orEarlier Experiment report (genotyping) true Report of genotype experiment including case control, population, and family studies. These might use array based methods and re-sequencing methods. 1.8 beta12orEarlier Genotype experiment ID beta12orEarlier Identifier of an entry from a database of genotype experiment metadata. EGA accession beta12orEarlier Identifier of an entry from the EGA database. IPI protein ID Identifier of a protein entry catalogued in the International Protein Index (IPI) database. IPI[0-9]{8} beta12orEarlier RefSeq accession (protein) RefSeq protein ID Accession number of a protein from the RefSeq database. beta12orEarlier EPD ID beta12orEarlier Identifier of an entry (promoter) from the EPD database. EPD identifier TAIR accession beta12orEarlier Identifier of an entry from the TAIR database. TAIR accession (At gene) beta12orEarlier Identifier of an Arabidopsis thaliana gene from the TAIR database. UniSTS accession beta12orEarlier Identifier of an entry from the UniSTS database. UNITE accession beta12orEarlier Identifier of an entry from the UNITE database. UTR accession beta12orEarlier Identifier of an entry from the UTR database. UniParc accession beta12orEarlier UPI[A-F0-9]{10} Accession number of a UniParc (protein sequence) database entry. UniParc ID UPI mFLJ/mKIAA number beta12orEarlier Identifier of an entry from the Rouge or HUGE databases. Fungi annotation true beta12orEarlier 1.4 An informative report on a specific fungus. Fungi annotation (anamorph) beta12orEarlier An informative report on a specific fungus anamorph. 1.4 true Gene features report (exon) true exons in a nucleotide sequences. 1.8 beta12orEarlier Ensembl protein ID Ensembl ID (protein) beta12orEarlier Protein ID (Ensembl) Unique identifier for a protein from the Ensembl database. Gene transcriptional features report 1.8 beta12orEarlier transcription of DNA into RNA including the regulation of transcription. true Toxin annotation beta12orEarlier An informative report on a specific toxin. 1.4 true Protein report (membrane protein) beta12orEarlier true An informative report on a membrane protein. beta12orEarlier Protein-drug interaction report An informative report on tentative or known protein-drug interaction(s). beta12orEarlier Map data beta12orEarlier This is a broad data type and is used a placeholder for other, more specific types. true beta13 Data concerning a map of molecular sequence(s). Phylogenetic data Data concerning phylogeny, typically of molecular sequences, including reports of information concerning or derived from a phylogenetic tree, or from comparing two or more phylogenetic trees. Phylogenetic data This is a broad data type and is used a placeholder for other, more specific types. beta12orEarlier Protein data This is a broad data type and is used a placeholder for other, more specific types. beta13 Data concerning one or more protein molecules. true beta12orEarlier Nucleic acid data true Data concerning one or more nucleic acid molecules. beta13 beta12orEarlier This is a broad data type and is used a placeholder for other, more specific types. Article data beta12orEarlier This is a broad data type and is used a placeholder for other, more specific types. It is primarily intended to help navigation of EDAM and would not typically be used for annotation. It includes concepts that are best described as scientific text or closely concerned with or derived from text. Article report Data concerning, extracted from, or derived from the analysis of a scientific text (or texts) such as a full text article from a scientific journal. Parameter http://semanticscience.org/resource/SIO_000144 Tool-specific parameter beta12orEarlier http://www.e-lico.eu/ontologies/dmo/DMOP/DMOP.owl#Parameter Typically a simple numerical or string value that controls the operation of a tool. Parameters Tool parameter Molecular data Molecule-specific data true Data concerning a specific type of molecule. beta13 beta12orEarlier This is a broad data type and is used a placeholder for other, more specific types. Molecule report An informative report on a specific molecule. beta12orEarlier Molecular report 1.5 true Organism report An informative report on a specific organism. beta12orEarlier Organism annotation Experiment report Experiment metadata beta12orEarlier Experiment annotation Annotation on a wet lab experiment, such as experimental conditions. Nucleic acid features report (mutation) DNA mutation. 1.8 true beta12orEarlier Sequence attribute An attribute of a molecular sequence, possibly in reference to some other sequence. Sequence parameter beta12orEarlier Sequence tag profile SAGE, MPSS and SBS experiments are usually performed to study gene expression. The sequence tags are typically subsequently annotated (after a database search) with the mRNA (and therefore gene) the tag was extracted from. beta12orEarlier Sequencing-based expression profile Output from a serial analysis of gene expression (SAGE), massively parallel signature sequencing (MPSS) or sequencing by synthesis (SBS) experiment. In all cases this is a list of short sequence tags and the number of times it is observed. Mass spectrometry data beta12orEarlier Data concerning a mass spectrometry measurement. Protein structure raw data beta12orEarlier Raw data from experimental methods for determining protein structure. This is a broad data type and is used a placeholder for other, more specific types. It is primarily intended to help navigation of EDAM and would not typically be used for annotation. Mutation identifier An identifier of a mutation. beta12orEarlier Alignment data This is a broad data type and is used a placeholder for other, more specific types. This includes entities derived from sequences and structures such as motifs and profiles. true beta13 Data concerning an alignment of two or more molecular sequences, structures or derived data. beta12orEarlier Data index data true Data concerning an index of data. beta12orEarlier beta13 Database index This is a broad data type and is used a placeholder for other, more specific types. Amino acid name (single letter) beta12orEarlier Single letter amino acid identifier, e.g. G. Amino acid name (three letter) beta12orEarlier Three letter amino acid identifier, e.g. GLY. Amino acid name (full name) beta12orEarlier Full name of an amino acid, e.g. Glycine. Toxin identifier beta12orEarlier Identifier of a toxin. ArachnoServer ID Unique identifier of a toxin from the ArachnoServer database. beta12orEarlier Expressed gene list beta12orEarlier true 1.5 Gene annotation (expressed gene list) A simple summary of expressed genes. BindingDB Monomer ID Unique identifier of a monomer from the BindingDB database. beta12orEarlier GO concept name true beta12orEarlier beta12orEarlier The name of a concept from the GO ontology. GO concept ID (biological process) [0-9]{7}|GO:[0-9]{7} beta12orEarlier An identifier of a 'biological process' concept from the the Gene Ontology. GO concept ID (molecular function) beta12orEarlier [0-9]{7}|GO:[0-9]{7} An identifier of a 'molecular function' concept from the the Gene Ontology. GO concept name (cellular component) The name of a concept for a cellular component from the GO ontology. true beta12orEarlier beta12orEarlier Northern blot image beta12orEarlier An image arising from a Northern Blot experiment. Blot ID Unique identifier of a blot from a Northern Blot. beta12orEarlier BlotBase blot ID beta12orEarlier Unique identifier of a blot from a Northern Blot from the BlotBase database. Hierarchy beta12orEarlier Raw data on a biological hierarchy, describing the hierarchy proper, hierarchy components and possibly associated annotation. Hierarchy annotation Hierarchy identifier Identifier of an entry from a database of biological hierarchies. beta12orEarlier beta12orEarlier true Brite hierarchy ID beta12orEarlier Identifier of an entry from the Brite database of biological hierarchies. Cancer type true A type (represented as a string) of cancer. beta12orEarlier beta12orEarlier BRENDA organism ID A unique identifier for an organism used in the BRENDA database. beta12orEarlier UniGene taxon The name of a taxon using the controlled vocabulary of the UniGene database. UniGene organism abbreviation beta12orEarlier UTRdb taxon beta12orEarlier The name of a taxon using the controlled vocabulary of the UTRdb database. Catalogue ID beta12orEarlier An identifier of a catalogue of biological resources. Catalogue identifier CABRI catalogue name The name of a catalogue of biological resources from the CABRI database. beta12orEarlier Secondary structure alignment metadata An informative report on protein secondary structure alignment-derived data or metadata. beta12orEarlier beta12orEarlier true Molecule interaction report An informative report on the physical, chemical or other information concerning the interaction of two or more molecules (or parts of molecules). beta12orEarlier Molecular interaction report Molecular interaction data Pathway or network Network beta12orEarlier Pathway Primary data about a specific biological pathway or network (the nodes and connections within the pathway or network). Small molecule data true This is a broad data type and is used a placeholder for other, more specific types. beta12orEarlier beta13 Data concerning one or more small molecules. Genotype and phenotype data beta12orEarlier true beta13 Data concerning a particular genotype, phenotype or a genotype / phenotype relation. Gene expression data beta12orEarlier Image or hybridisation data for a microarray, typically a study of gene expression. Microarray data This is a broad data type and is used a placeholder for other, more specific types. See also http://edamontology.org/data_0931 Compound ID (KEGG) C[0-9]+ Unique identifier of a chemical compound from the KEGG database. beta12orEarlier KEGG compound ID KEGG compound identifier RFAM name Name (not necessarily stable) an entry (RNA family) from the RFAM database. beta12orEarlier Reaction ID (KEGG) Identifier of a biological reaction from the KEGG reactions database. R[0-9]+ beta12orEarlier Drug ID (KEGG) beta12orEarlier Unique identifier of a drug from the KEGG Drug database. D[0-9]+ Ensembl ID beta12orEarlier ENS[A-Z]*[FPTG][0-9]{11} Identifier of an entry (exon, gene, transcript or protein) from the Ensembl database. Ensembl IDs ICD identifier An identifier of a disease from the International Classification of Diseases (ICD) database. beta12orEarlier [A-Z][0-9]+(\.[-[0-9]+])? Sequence cluster ID (CluSTr) Unique identifier of a sequence cluster from the CluSTr database. [0-9A-Za-z]+:[0-9]+:[0-9]{1,5}(\.[0-9])? CluSTr ID beta12orEarlier CluSTr cluster ID KEGG Glycan ID G[0-9]+ Unique identifier of a glycan ligand from the KEGG GLYCAN database (a subset of KEGG LIGAND). beta12orEarlier TCDB ID beta12orEarlier OBO file for regular expression. TC number [0-9]+\.[A-Z]\.[0-9]+\.[0-9]+\.[0-9]+ A unique identifier of a family from the transport classification database (TCDB) of membrane transport proteins. MINT ID MINT\-[0-9]{1,5} Unique identifier of an entry from the MINT database of protein-protein interactions. beta12orEarlier DIP ID Unique identifier of an entry from the DIP database of protein-protein interactions. beta12orEarlier DIP[\:\-][0-9]{3}[EN] Signaling Gateway protein ID beta12orEarlier Unique identifier of a protein listed in the UCSD-Nature Signaling Gateway Molecule Pages database. A[0-9]{6} Protein modification ID beta12orEarlier Identifier of a protein modification catalogued in a database. RESID ID Identifier of a protein modification catalogued in the RESID database. AA[0-9]{4} beta12orEarlier RGD ID [0-9]{4,7} beta12orEarlier Identifier of an entry from the RGD database. TAIR accession (protein) AASequence:[0-9]{10} Identifier of a protein sequence from the TAIR database. beta12orEarlier Compound ID (HMDB) HMDB[0-9]{5} beta12orEarlier HMDB ID Identifier of a small molecule metabolite from the Human Metabolome Database (HMDB). LIPID MAPS ID beta12orEarlier LM ID Identifier of an entry from the LIPID MAPS database. LM(FA|GL|GP|SP|ST|PR|SL|PK)[0-9]{4}([0-9a-zA-Z]{4})? PeptideAtlas ID Identifier of a peptide from the PeptideAtlas peptide databases. PDBML:pdbx_PDB_strand_id beta12orEarlier PAp[0-9]{8} Molecular interaction ID Identifier of a report of molecular interactions from a database (typically). true beta12orEarlier 1.7 BioGRID interaction ID [0-9]+ beta12orEarlier A unique identifier of an interaction from the BioGRID database. Enzyme ID (MEROPS) MEROPS ID Unique identifier of a peptidase enzyme from the MEROPS database. beta12orEarlier S[0-9]{2}\.[0-9]{3} Mobile genetic element ID An identifier of a mobile genetic element. beta12orEarlier ACLAME ID beta12orEarlier mge:[0-9]+ An identifier of a mobile genetic element from the Aclame database. SGD ID PWY[a-zA-Z_0-9]{2}\-[0-9]{3} beta12orEarlier Identifier of an entry from the Saccharomyces genome database (SGD). Book ID beta12orEarlier Unique identifier of a book. ISBN beta12orEarlier (ISBN)?(-13|-10)?[:]?[ ]?([0-9]{2,3}[ -]?)?[0-9]{1,5}[ -]?[0-9]{1,7}[ -]?[0-9]{1,6}[ -]?([0-9]|X) The International Standard Book Number (ISBN) is for identifying printed books. Compound ID (3DMET) B[0-9]{5} 3DMET ID beta12orEarlier Identifier of a metabolite from the 3DMET database. MatrixDB interaction ID ([A-NR-Z][0-9][A-Z][A-Z0-9][A-Z0-9][0-9])_.*|([OPQ][0-9][A-Z0-9][A-Z0-9][A-Z0-9][0-9]_.*)|(GAG_.*)|(MULT_.*)|(PFRAG_.*)|(LIP_.*)|(CAT_.*) A unique identifier of an interaction from the MatrixDB database. beta12orEarlier cPath ID [0-9]+ These identifiers are unique within the cPath database, however, they are not stable between releases. beta12orEarlier A unique identifier for pathways, reactions, complexes and small molecules from the cPath (Pathway Commons) database. PubChem bioassay ID Identifier of an assay from the PubChem database. [0-9]+ beta12orEarlier PubChem ID PubChem identifier beta12orEarlier Identifier of an entry from the PubChem database. Reaction ID (MACie) beta12orEarlier M[0-9]{4} MACie entry number Identifier of an enzyme reaction mechanism from the MACie database. Gene ID (miRBase) beta12orEarlier miRNA name miRNA ID Identifier for a gene from the miRBase database. MI[0-9]{7} miRNA identifier Gene ID (ZFIN) Identifier for a gene from the Zebrafish information network genome (ZFIN) database. beta12orEarlier ZDB\-GENE\-[0-9]+\-[0-9]+ Reaction ID (Rhea) [0-9]{5} Identifier of an enzyme-catalysed reaction from the Rhea database. beta12orEarlier Pathway ID (Unipathway) UPA[0-9]{5} upaid beta12orEarlier Identifier of a biological pathway from the Unipathway database. Compound ID (ChEMBL) Identifier of a small molecular from the ChEMBL database. ChEMBL ID beta12orEarlier [0-9]+ LGICdb identifier Unique identifier of an entry from the Ligand-gated ion channel (LGICdb) database. beta12orEarlier [a-zA-Z_0-9]+ Reaction kinetics ID (SABIO-RK) Identifier of a biological reaction (kinetics entry) from the SABIO-RK reactions database. [0-9]+ beta12orEarlier PharmGKB ID beta12orEarlier Identifier of an entry from the pharmacogenetics and pharmacogenomics knowledge base (PharmGKB). PA[0-9]+ Pathway ID (PharmGKB) PA[0-9]+ Identifier of a pathway from the pharmacogenetics and pharmacogenomics knowledge base (PharmGKB). beta12orEarlier Disease ID (PharmGKB) Identifier of a disease from the pharmacogenetics and pharmacogenomics knowledge base (PharmGKB). beta12orEarlier PA[0-9]+ Drug ID (PharmGKB) beta12orEarlier Identifier of a drug from the pharmacogenetics and pharmacogenomics knowledge base (PharmGKB). PA[0-9]+ Drug ID (TTD) DAP[0-9]+ Identifier of a drug from the Therapeutic Target Database (TTD). beta12orEarlier Target ID (TTD) TTDS[0-9]+ Identifier of a target protein from the Therapeutic Target Database (TTD). beta12orEarlier Cell type identifier beta12orEarlier Cell type ID A unique identifier of a type or group of cells. NeuronDB ID [0-9]+ beta12orEarlier A unique identifier of a neuron from the NeuronDB database. NeuroMorpho ID beta12orEarlier A unique identifier of a neuron from the NeuroMorpho database. [a-zA-Z_0-9]+ Compound ID (ChemIDplus) Identifier of a chemical from the ChemIDplus database. ChemIDplus ID [0-9]+ beta12orEarlier Pathway ID (SMPDB) beta12orEarlier Identifier of a pathway from the Small Molecule Pathway Database (SMPDB). SMP[0-9]{5} BioNumbers ID Identifier of an entry from the BioNumbers database of key numbers and associated data in molecular biology. [0-9]+ beta12orEarlier T3DB ID beta12orEarlier T3D[0-9]+ Unique identifier of a toxin from the Toxin and Toxin Target Database (T3DB) database. Carbohydrate identifier beta12orEarlier Identifier of a carbohydrate. GlycomeDB ID Identifier of an entry from the GlycomeDB database. beta12orEarlier [0-9]+ LipidBank ID beta12orEarlier [a-zA-Z_0-9]+[0-9]+ Identifier of an entry from the LipidBank database. CDD ID beta12orEarlier cd[0-9]{5} Identifier of a conserved domain from the Conserved Domain Database. MMDB ID [0-9]{1,5} beta12orEarlier An identifier of an entry from the MMDB database. MMDB accession iRefIndex ID Unique identifier of an entry from the iRefIndex database of protein-protein interactions. beta12orEarlier [0-9]+ ModelDB ID Unique identifier of an entry from the ModelDB database. [0-9]+ beta12orEarlier Pathway ID (DQCS) [0-9]+ Identifier of a signaling pathway from the Database of Quantitative Cellular Signaling (DQCS). beta12orEarlier Ensembl ID (Homo sapiens) beta12orEarlier true beta12orEarlier ENS([EGTP])[0-9]{11} Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database (Homo sapiens division). Ensembl ID ('Bos taurus') beta12orEarlier Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Bos taurus' division). true beta12orEarlier ENSBTA([EGTP])[0-9]{11} Ensembl ID ('Canis familiaris') beta12orEarlier Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Canis familiaris' division). true ENSCAF([EGTP])[0-9]{11} beta12orEarlier Ensembl ID ('Cavia porcellus') ENSCPO([EGTP])[0-9]{11} true beta12orEarlier Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Cavia porcellus' division). beta12orEarlier Ensembl ID ('Ciona intestinalis') true Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Ciona intestinalis' division). beta12orEarlier beta12orEarlier ENSCIN([EGTP])[0-9]{11} Ensembl ID ('Ciona savignyi') Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Ciona savignyi' division). ENSCSAV([EGTP])[0-9]{11} beta12orEarlier beta12orEarlier true Ensembl ID ('Danio rerio') Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Danio rerio' division). true beta12orEarlier beta12orEarlier ENSDAR([EGTP])[0-9]{11} Ensembl ID ('Dasypus novemcinctus') Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Dasypus novemcinctus' division). beta12orEarlier beta12orEarlier ENSDNO([EGTP])[0-9]{11} true Ensembl ID ('Echinops telfairi') ENSETE([EGTP])[0-9]{11} true beta12orEarlier beta12orEarlier Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Echinops telfairi' division). Ensembl ID ('Erinaceus europaeus') true ENSEEU([EGTP])[0-9]{11} beta12orEarlier Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Erinaceus europaeus' division). beta12orEarlier Ensembl ID ('Felis catus') beta12orEarlier true ENSFCA([EGTP])[0-9]{11} Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Felis catus' division). beta12orEarlier Ensembl ID ('Gallus gallus') ENSGAL([EGTP])[0-9]{11} Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Gallus gallus' division). beta12orEarlier true beta12orEarlier Ensembl ID ('Gasterosteus aculeatus') beta12orEarlier Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Gasterosteus aculeatus' division). true ENSGAC([EGTP])[0-9]{11} beta12orEarlier Ensembl ID ('Homo sapiens') ENSHUM([EGTP])[0-9]{11} beta12orEarlier beta12orEarlier Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Homo sapiens' division). true Ensembl ID ('Loxodonta africana') beta12orEarlier true Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Loxodonta africana' division). ENSLAF([EGTP])[0-9]{11} beta12orEarlier Ensembl ID ('Macaca mulatta') Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Macaca mulatta' division). beta12orEarlier ENSMMU([EGTP])[0-9]{11} true beta12orEarlier Ensembl ID ('Monodelphis domestica') beta12orEarlier Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Monodelphis domestica' division). true ENSMOD([EGTP])[0-9]{11} beta12orEarlier Ensembl ID ('Mus musculus') ENSMUS([EGTP])[0-9]{11} true Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Mus musculus' division). beta12orEarlier beta12orEarlier Ensembl ID ('Myotis lucifugus') beta12orEarlier ENSMLU([EGTP])[0-9]{11} true beta12orEarlier Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Myotis lucifugus' division). Ensembl ID ("Ornithorhynchus anatinus") beta12orEarlier true Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Ornithorhynchus anatinus' division). ENSOAN([EGTP])[0-9]{11} beta12orEarlier Ensembl ID ('Oryctolagus cuniculus') beta12orEarlier ENSOCU([EGTP])[0-9]{11} true Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Oryctolagus cuniculus' division). beta12orEarlier Ensembl ID ('Oryzias latipes') ENSORL([EGTP])[0-9]{11} true beta12orEarlier Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Oryzias latipes' division). beta12orEarlier Ensembl ID ('Otolemur garnettii') beta12orEarlier Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Otolemur garnettii' division). true beta12orEarlier ENSSAR([EGTP])[0-9]{11} Ensembl ID ('Pan troglodytes') beta12orEarlier beta12orEarlier ENSPTR([EGTP])[0-9]{11} Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Pan troglodytes' division). true Ensembl ID ('Rattus norvegicus') beta12orEarlier true Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Rattus norvegicus' division). ENSRNO([EGTP])[0-9]{11} beta12orEarlier Ensembl ID ('Spermophilus tridecemlineatus') true beta12orEarlier ENSSTO([EGTP])[0-9]{11} Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Spermophilus tridecemlineatus' division). beta12orEarlier Ensembl ID ('Takifugu rubripes') beta12orEarlier beta12orEarlier Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Takifugu rubripes' division). ENSFRU([EGTP])[0-9]{11} true Ensembl ID ('Tupaia belangeri') beta12orEarlier beta12orEarlier Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Tupaia belangeri' division). true ENSTBE([EGTP])[0-9]{11} Ensembl ID ('Xenopus tropicalis') Identifier of an entry (exon, gene, transcript or protein) from the Ensembl 'core' database ('Xenopus tropicalis' division). beta12orEarlier beta12orEarlier true ENSXET([EGTP])[0-9]{11} CATH identifier beta12orEarlier Identifier of a protein domain (or other node) from the CATH database. CATH node ID (family) beta12orEarlier A code number identifying a family from the CATH database. 2.10.10.10 Enzyme ID (CAZy) Identifier of an enzyme from the CAZy enzymes database. beta12orEarlier CAZy ID Clone ID (IMAGE) I.M.A.G.E. cloneID IMAGE cloneID A unique identifier assigned by the I.M.A.G.E. consortium to a clone (cloned molecular sequence). beta12orEarlier GO concept ID (cellular compartment) An identifier of a 'cellular compartment' concept from the Gene Ontology. [0-9]{7}|GO:[0-9]{7} beta12orEarlier GO concept identifier (cellular compartment) Chromosome name (BioCyc) Name of a chromosome as used in the BioCyc database. beta12orEarlier CleanEx entry name beta12orEarlier An identifier of a gene expression profile from the CleanEx database. CleanEx dataset code beta12orEarlier An identifier of (typically a list of) gene expression experiments catalogued in the CleanEx database. Genome report An informative report of general information concerning a genome as a whole. beta12orEarlier Protein ID (CORUM) beta12orEarlier CORUM complex ID Unique identifier for a protein complex from the CORUM database. CDD PSSM-ID beta12orEarlier Unique identifier of a position-specific scoring matrix from the CDD database. Protein ID (CuticleDB) CuticleDB ID beta12orEarlier Unique identifier for a protein from the CuticleDB database. DBD ID Identifier of a predicted transcription factor from the DBD database. beta12orEarlier Oligonucleotide probe annotation beta12orEarlier General annotation on an oligonucleotide probe. Oligonucleotide ID Identifier of an oligonucleotide from a database. beta12orEarlier dbProbe ID Identifier of an oligonucleotide probe from the dbProbe database. beta12orEarlier Dinucleotide property beta12orEarlier Physicochemical property data for one or more dinucleotides. DiProDB ID beta12orEarlier Identifier of an dinucleotide property from the DiProDB database. Protein features report (disordered structure) 1.8 true beta12orEarlier disordered structure in a protein. Protein ID (DisProt) DisProt ID beta12orEarlier Unique identifier for a protein from the DisProt database. Embryo report Annotation on an embryo or concerning embryological development. true Embryo annotation beta12orEarlier 1.5 Ensembl transcript ID beta12orEarlier Transcript ID (Ensembl) Unique identifier for a gene transcript from the Ensembl database. Inhibitor annotation 1.4 beta12orEarlier An informative report on one or more small molecules that are enzyme inhibitors. true Promoter ID beta12orEarlier An identifier of a promoter of a gene that is catalogued in a database. Moby:GeneAccessionList EST accession Identifier of an EST sequence. beta12orEarlier COGEME EST ID beta12orEarlier Identifier of an EST sequence from the COGEME database. COGEME unisequence ID Identifier of a unisequence from the COGEME database. A unisequence is a single sequence assembled from ESTs. beta12orEarlier Protein family ID (GeneFarm) GeneFarm family ID beta12orEarlier Accession number of an entry (family) from the TIGRFam database. Family name beta12orEarlier The name of a family of organism. Genus name (virus) true The name of a genus of viruses. beta13 beta12orEarlier Family name (virus) beta13 The name of a family of viruses. true beta12orEarlier Database name (SwissRegulon) true beta13 The name of a SwissRegulon database. beta12orEarlier Sequence feature ID (SwissRegulon) beta12orEarlier A feature identifier as used in the SwissRegulon database. This can be name of a gene, the ID of a TFBS, or genomic coordinates in form "chr:start..end". FIG ID A FIG ID consists of four parts: a prefix, genome id, locus type and id number. A unique identifier of gene in the NMPDR database. beta12orEarlier Gene ID (Xenbase) A unique identifier of gene in the Xenbase database. beta12orEarlier Gene ID (Genolist) beta12orEarlier A unique identifier of gene in the Genolist database. Gene name (Genolist) beta12orEarlier true Genolist gene name 1.3 Name of an entry (gene) from the Genolist genes database. ABS ID ABS identifier beta12orEarlier Identifier of an entry (promoter) from the ABS database. AraC-XylS ID Identifier of a transcription factor from the AraC-XylS database. beta12orEarlier Gene name (HUGO) beta12orEarlier beta12orEarlier true Name of an entry (gene) from the HUGO database. Locus ID (PseudoCAP) beta12orEarlier Identifier of a locus from the PseudoCAP database. Locus ID (UTR) beta12orEarlier Identifier of a locus from the UTR database. MonosaccharideDB ID Unique identifier of a monosaccharide from the MonosaccharideDB database. beta12orEarlier Database name (CMD) beta12orEarlier true The name of a subdivision of the Collagen Mutation Database (CMD) database. beta13 Database name (Osteogenesis) beta12orEarlier true beta13 The name of a subdivision of the Osteogenesis database. Genome identifier An identifier of a particular genome. beta12orEarlier GenomeReviews ID beta12orEarlier An identifier of a particular genome. GlycoMap ID [0-9]+ beta12orEarlier Identifier of an entry from the GlycosciencesDB database. Carbohydrate conformational map beta12orEarlier A conformational energy map of the glycosidic linkages in a carbohydrate molecule. Gene features report (intron) introns in a nucleotide sequences. true beta12orEarlier 1.8 Transcription factor name The name of a transcription factor. beta12orEarlier TCID Identifier of a membrane transport proteins from the transport classification database (TCDB). beta12orEarlier Pfam domain name beta12orEarlier Name of a domain from the Pfam database. PF[0-9]{5} Pfam clan ID beta12orEarlier CL[0-9]{4} Accession number of a Pfam clan. Gene ID (VectorBase) VectorBase ID beta12orEarlier Identifier for a gene from the VectorBase database. UTRSite ID Identifier of an entry from the UTRSite database of regulatory motifs in eukaryotic UTRs. beta12orEarlier Sequence signature report Sequence motif report Sequence profile report An informative report about a specific or conserved pattern in a molecular sequence, such as its context in genes or proteins, its role, origin or method of construction, etc. beta12orEarlier Locus annotation Locus report true beta12orEarlier An informative report on a particular locus. beta12orEarlier Protein name (UniProt) Official name of a protein as used in the UniProt database. beta12orEarlier Term ID list One or more terms from one or more controlled vocabularies which are annotations on an entity. beta12orEarlier true The concepts are typically provided as a persistent identifier or some other link the source ontologies. Evidence of the validity of the annotation might be included. 1.5 HAMAP ID Name of a protein family from the HAMAP database. beta12orEarlier Identifier with metadata Basic information concerning an identifier of data (typically including the identifier itself). For example, a gene symbol with information concerning its provenance. beta12orEarlier Gene symbol annotation true beta12orEarlier Annotation about a gene symbol. beta12orEarlier Transcript ID Identifier of a RNA transcript. beta12orEarlier HIT ID Identifier of an RNA transcript from the H-InvDB database. beta12orEarlier HIX ID A unique identifier of gene cluster in the H-InvDB database. beta12orEarlier HPA antibody id beta12orEarlier Identifier of a antibody from the HPA database. IMGT/HLA ID Identifier of a human major histocompatibility complex (HLA) or other protein from the IMGT/HLA database. beta12orEarlier Gene ID (JCVI) A unique identifier of gene assigned by the J. Craig Venter Institute (JCVI). beta12orEarlier Kinase name beta12orEarlier The name of a kinase protein. ConsensusPathDB entity ID Identifier of a physical entity from the ConsensusPathDB database. beta12orEarlier ConsensusPathDB entity name beta12orEarlier Name of a physical entity from the ConsensusPathDB database. CCAP strain number The number of a strain of algae and protozoa from the CCAP database. beta12orEarlier Stock number beta12orEarlier An identifier of stock from a catalogue of biological resources. Stock number (TAIR) beta12orEarlier A stock number from The Arabidopsis information resource (TAIR). REDIdb ID beta12orEarlier Identifier of an entry from the RNA editing database (REDIdb). SMART domain name Name of a domain from the SMART database. beta12orEarlier Protein family ID (PANTHER) beta12orEarlier Panther family ID Accession number of an entry (family) from the PANTHER database. RNAVirusDB ID beta12orEarlier Could list (or reference) other taxa here from https://www.phenoscape.org/wiki/Taxonomic_Rank_Vocabulary. A unique identifier for a virus from the RNAVirusDB database. Virus ID beta12orEarlier An accession of annotation on a (group of) viruses (catalogued in a database). NCBI Genome Project ID An identifier of a genome project assigned by NCBI. beta12orEarlier NCBI genome accession A unique identifier of a whole genome assigned by the NCBI. beta12orEarlier Sequence profile data 1.8 Data concerning, extracted from, or derived from the analysis of a sequence profile, such as its name, length, technical details about the profile or it's construction, the biological role or annotation, and so on. true beta12orEarlier Protein ID (TopDB) beta12orEarlier TopDB ID Unique identifier for a membrane protein from the TopDB database. Gel ID Gel identifier Identifier of a two-dimensional (protein) gel. beta12orEarlier Reference map name (SWISS-2DPAGE) beta12orEarlier Name of a reference map gel from the SWISS-2DPAGE database. Protein ID (PeroxiBase) PeroxiBase ID beta12orEarlier Unique identifier for a peroxidase protein from the PeroxiBase database. SISYPHUS ID beta12orEarlier Identifier of an entry from the SISYPHUS database of tertiary structure alignments. ORF ID beta12orEarlier Accession of an open reading frame (catalogued in a database). ORF identifier An identifier of an open reading frame. beta12orEarlier Linucs ID Identifier of an entry from the GlycosciencesDB database. beta12orEarlier Protein ID (LGICdb) beta12orEarlier LGICdb ID Unique identifier for a ligand-gated ion channel protein from the LGICdb database. MaizeDB ID beta12orEarlier Identifier of an EST sequence from the MaizeDB database. Gene ID (MfunGD) beta12orEarlier A unique identifier of gene in the MfunGD database. Orpha number beta12orEarlier An identifier of a disease from the Orpha database. Protein ID (EcID) beta12orEarlier Unique identifier for a protein from the EcID database. Clone ID (RefSeq) A unique identifier of a cDNA molecule catalogued in the RefSeq database. beta12orEarlier Protein ID (ConoServer) beta12orEarlier Unique identifier for a cone snail toxin protein from the ConoServer database. GeneSNP ID Identifier of a GeneSNP database entry. beta12orEarlier Lipid identifier Identifier of a lipid. beta12orEarlier Databank true beta12orEarlier A flat-file (textual) data archive. beta12orEarlier Web portal A web site providing data (web pages) on a common theme to a HTTP client. beta12orEarlier true beta12orEarlier Gene ID (VBASE2) Identifier for a gene from the VBASE2 database. beta12orEarlier VBASE2 ID DPVweb ID DPVweb virus ID beta12orEarlier A unique identifier for a virus from the DPVweb database. Pathway ID (BioSystems) beta12orEarlier Identifier of a pathway from the BioSystems pathway database. [0-9]+ Experimental data (proteomics) true Data concerning a proteomics experiment. beta12orEarlier beta12orEarlier Abstract beta12orEarlier An abstract of a scientific article. Lipid structure beta12orEarlier 3D coordinate and associated data for a lipid structure. Drug structure beta12orEarlier 3D coordinate and associated data for the (3D) structure of a drug. Toxin structure 3D coordinate and associated data for the (3D) structure of a toxin. beta12orEarlier Position-specific scoring matrix beta12orEarlier PSSM A simple matrix of numbers, where each value (or column of values) is derived derived from analysis of the corresponding position in a sequence alignment. Distance matrix A matrix of distances between molecular entities, where a value (distance) is (typically) derived from comparison of two entities and reflects their similarity. beta12orEarlier Structural distance matrix Distances (values representing similarity) between a group of molecular structures. beta12orEarlier Article metadata true beta12orEarlier Bibliographic data concerning scientific article(s). 1.5 Ontology concept beta12orEarlier This includes any fields from the concept definition such as concept name, definition, comments and so on. A concept from a biological ontology. Codon usage bias A numerical measure of differences in the frequency of occurrence of synonymous codons in DNA sequences. beta12orEarlier Northern blot report true beta12orEarlier 1.8 Northern Blot experiments. Nucleic acid features report (VNTR) 1.8 beta12orEarlier true variable number of tandem repeat (VNTR) polymorphism in a DNA sequence. Nucleic acid features report (microsatellite) true microsatellite polymorphism in a DNA sequence. 1.8 beta12orEarlier Nucleic acid features report (RFLP) beta12orEarlier true 1.8 restriction fragment length polymorphisms (RFLP) in a DNA sequence. Radiation hybrid map The radiation method can break very closely linked markers providing a more detailed map. Most genetic markers and subsequences may be located to a defined map position and with a more precise estimates of distance than a linkage map. A map showing distance between genetic markers estimated by radiation-induced breaks in a chromosome. beta12orEarlier RH map ID list A simple list of data identifiers (such as database accessions), possibly with additional basic information on the addressed data. beta12orEarlier Phylogenetic gene frequencies data beta12orEarlier Gene frequencies data that may be read during phylogenetic tree calculation. Sequence set (polymorphic) beta13 beta12orEarlier true A set of sub-sequences displaying some type of polymorphism, typically indicating the sequence in which they occur, their position and other metadata. DRCAT resource 1.5 An entry (resource) from the DRCAT bioinformatics resource catalogue. beta12orEarlier true Protein complex beta12orEarlier 3D coordinate and associated data for a multi-protein complex; two or more polypeptides chains in a stable, functional association with one another. Protein structural motif beta12orEarlier 3D coordinate and associated data for a protein (3D) structural motif; any group of contiguous or non-contiguous amino acid residues but typically those forming a feature with a structural or functional role. Lipid report beta12orEarlier Annotation on or information derived from one or more specific lipid 3D structure(s). Secondary structure image 1.4 beta12orEarlier Image of one or more molecular secondary structures. true Secondary structure report Secondary structure-derived report beta12orEarlier true An informative report on general information, properties or features of one or more molecular secondary structures. 1.5 DNA features beta12orEarlier DNA sequence-specific feature annotation (not in a feature table). true beta12orEarlier RNA features report true beta12orEarlier 1.5 Features concerning RNA or regions of DNA that encode an RNA molecule. RNA features Nucleic acid features (RNA features) Plot beta12orEarlier true beta12orEarlier Biological data that has been plotted as a graph of some type. Nucleic acid features report (polymorphism) true DNA polymorphism. beta12orEarlier Protein sequence record A protein sequence and associated metadata. beta12orEarlier Protein sequence record Sequence record (protein) Nucleic acid sequence record RNA sequence record Nucleotide sequence record A nucleic acid sequence and associated metadata. beta12orEarlier DNA sequence record Sequence record (nucleic acid) Protein sequence record (full) A protein sequence and comprehensive metadata (such as a feature table), typically corresponding to a full entry from a molecular sequence database. 1.8 beta12orEarlier true Nucleic acid sequence record (full) true A nucleic acid sequence and comprehensive metadata (such as a feature table), typically corresponding to a full entry from a molecular sequence database. beta12orEarlier 1.8 Biological model accession beta12orEarlier Accession of a mathematical model, typically an entry from a database. Cell type name The name of a type or group of cells. beta12orEarlier Cell type accession beta12orEarlier Accession of a type or group of cells (catalogued in a database). Compound accession Small molecule accession Accession of an entry from a database of chemicals. beta12orEarlier Chemical compound accession Drug accession Accession of a drug. beta12orEarlier Toxin name Name of a toxin. beta12orEarlier Toxin accession beta12orEarlier Accession of a toxin (catalogued in a database). Monosaccharide accession Accession of a monosaccharide (catalogued in a database). beta12orEarlier Drug name beta12orEarlier Common name of a drug. Carbohydrate accession Accession of an entry from a database of carbohydrates. beta12orEarlier Molecule accession Accession of a specific molecule (catalogued in a database). beta12orEarlier Data resource definition accession beta12orEarlier Accession of a data definition (catalogued in a database). Genome accession An accession of a particular genome (in a database). beta12orEarlier Map accession An accession of a map of a molecular sequence (deposited in a database). beta12orEarlier Lipid accession beta12orEarlier Accession of an entry from a database of lipids. Peptide ID beta12orEarlier Accession of a peptide deposited in a database. Protein accession Protein accessions beta12orEarlier Accession of a protein deposited in a database. Organism accession An accession of annotation on a (group of) organisms (catalogued in a database). beta12orEarlier Organism name Moby:Organism_Name Moby:OrganismsShortName Moby:OccurrenceRecord Moby:BriefOccurrenceRecord Moby:FirstEpithet Moby:InfraspecificEpithet beta12orEarlier Moby:OrganismsLongName The name of an organism (or group of organisms). Protein family accession beta12orEarlier Accession of a protein family (that is deposited in a database). Transcription factor accession beta12orEarlier Accession of an entry from a database of transcription factors or binding sites. Strain accession beta12orEarlier Identifier of a strain of an organism variant, typically a plant, virus or bacterium. Virus identifier An accession of annotation on a (group of) viruses (catalogued in a database). beta12orEarlier Sequence features metadata beta12orEarlier Metadata on sequence features. Gramene identifier beta12orEarlier Identifier of a Gramene database entry. DDBJ accession beta12orEarlier DDBJ accession number DDBJ identifier DDBJ ID An identifier of an entry from the DDBJ sequence database. ConsensusPathDB identifier beta12orEarlier An identifier of an entity from the ConsensusPathDB database. Sequence data This is a broad data type and is used a placeholder for other, more specific types. 1.8 beta12orEarlier true Data concerning, extracted from, or derived from the analysis of molecular sequence(s). Codon usage beta12orEarlier true beta13 Data concerning codon usage. This is a broad data type and is used a placeholder for other, more specific types. Article report beta12orEarlier 1.5 Data derived from the analysis of a scientific text such as a full text article from a scientific journal. true Sequence report An informative report of information about molecular sequence(s), including basic information (metadata), and reports generated from molecular sequence analysis, including positional features and non-positional properties. beta12orEarlier Sequence-derived report Protein secondary structure report An informative report about the properties or features of one or more protein secondary structures. beta12orEarlier Hopp and Woods plot A Hopp and Woods plot of predicted antigenicity of a peptide or protein. beta12orEarlier Nucleic acid melting curve Shows the proportion of nucleic acid which are double-stranded versus temperature. A melting curve of a double-stranded nucleic acid molecule (DNA or DNA/RNA). beta12orEarlier Nucleic acid probability profile A probability profile of a double-stranded nucleic acid molecule (DNA or DNA/RNA). beta12orEarlier Shows the probability of a base pair not being melted (i.e. remaining as double-stranded DNA) at a specified temperature Nucleic acid temperature profile A temperature profile of a double-stranded nucleic acid molecule (DNA or DNA/RNA). Plots melting temperature versus base position. beta12orEarlier Melting map Gene regulatory network report 1.8 A report typically including a map (diagram) of a gene regulatory network. true beta12orEarlier 2D PAGE gel report An informative report on a two-dimensional (2D PAGE) gel. 2D PAGE image report 1.8 true 2D PAGE gel annotation beta12orEarlier 2D PAGE image annotation Oligonucleotide probe sets annotation beta12orEarlier General annotation on a set of oligonucleotide probes, such as the gene name with which the probe set is associated and which probes belong to the set. Microarray image 1.5 beta12orEarlier Gene expression image An image from a microarray experiment which (typically) allows a visualisation of probe hybridisation and gene-expression data. true Image http://semanticscience.org/resource/SIO_000081 Biological or biomedical data has been rendered into an image, typically for display on screen. http://semanticscience.org/resource/SIO_000079 Image data beta12orEarlier Sequence image Image of a molecular sequence, possibly with sequence features or properties shown. beta12orEarlier Protein hydropathy data Protein hydropathy report A report on protein properties concerning hydropathy. beta12orEarlier Workflow data beta12orEarlier beta13 Data concerning a computational workflow. true Workflow true beta12orEarlier 1.5 A computational workflow. Secondary structure data beta13 true beta12orEarlier Data concerning molecular secondary structure data. Protein sequence (raw) Raw protein sequence beta12orEarlier Raw sequence (protein) A raw protein sequence (string of characters). Nucleic acid sequence (raw) Nucleic acid raw sequence beta12orEarlier Nucleotide sequence (raw) Raw sequence (nucleic acid) A raw nucleic acid sequence. Protein sequence One or more protein sequences, possibly with associated annotation. Protein sequences beta12orEarlier http://purl.org/biotop/biotop.owl#AminoAcidSequenceInformation Nucleic acid sequence One or more nucleic acid sequences, possibly with associated annotation. beta12orEarlier DNA sequence Nucleotide sequence Nucleotide sequences Nucleic acid sequences http://purl.org/biotop/biotop.owl#NucleotideSequenceInformation Reaction data Enzyme kinetics annotation This is a broad data type and is used a placeholder for other, more specific types. beta12orEarlier Reaction annotation Data concerning a biochemical reaction, typically data and more general annotation on the kinetics of enzyme-catalysed reaction. Peptide property beta12orEarlier Peptide data Data concerning small peptides. Protein classification This is a broad data type and is used a placeholder for other, more specific types. Protein classification data An informative report concerning the classification of protein sequences or structures. beta12orEarlier Sequence motif data true 1.8 Data concerning specific or conserved pattern in molecular sequences. beta12orEarlier This is a broad data type and is used a placeholder for other, more specific types. Sequence profile data beta12orEarlier true This is a broad data type and is used a placeholder for other, more specific types. beta13 Data concerning models representing a (typically multiple) sequence alignment. Pathway or network data Data concerning a specific biological pathway or network. beta13 true beta12orEarlier Pathway or network report beta12orEarlier An informative report concerning or derived from the analysis of a biological pathway or network, such as a map (diagram) or annotation. Nucleic acid thermodynamic data Nucleic acid property (thermodynamic or kinetic) A thermodynamic or kinetic property of a nucleic acid molecule. Nucleic acid thermodynamic property beta12orEarlier Nucleic acid classification This is a broad data type and is used a placeholder for other, more specific types. beta12orEarlier Data concerning the classification of nucleic acid sequences or structures. Nucleic acid classification data Classification report This can include an entire classification, components such as classifiers, assignments of entities to a classification and so on. beta12orEarlier true Classification data A report on a classification of molecular sequences, structures or other entities. 1.5 Protein features report (key folding sites) beta12orEarlier key residues involved in protein folding. 1.8 true Protein torsion angle data Torsion angle data Torsion angle data for a protein structure. beta12orEarlier Protein structure image An image of protein structure. beta12orEarlier Structure image (protein) Phylogenetic character weights Weights for sequence positions or characters in phylogenetic analysis where zero is defined as unweighted. beta12orEarlier Annotation track beta12orEarlier Genomic track Annotation of one particular positional feature on a biomolecular (typically genome) sequence, suitable for import and display in a genome browser. Genome annotation track Genome-browser track Genome track Sequence annotation track UniProt accession UniProtKB accession number beta12orEarlier P43353|Q7M1G0|Q9C199|A5A6J6 UniProt entry accession [OPQ][0-9][A-Z0-9]{3}[0-9]|[A-NR-Z][0-9]([A-Z][A-Z0-9]{2}[0-9]){1,2} Swiss-Prot entry accession TrEMBL entry accession Accession number of a UniProt (protein sequence) database entry. UniProtKB accession UniProt accession number NCBI genetic code ID Identifier of a genetic code in the NCBI list of genetic codes. [1-9][0-9]? 16 beta12orEarlier Ontology concept identifier Identifier of a concept in an ontology of biological or bioinformatics concepts and relations. beta12orEarlier GO concept name (biological process) true The name of a concept for a biological process from the GO ontology. beta12orEarlier beta12orEarlier GO concept name (molecular function) true beta12orEarlier The name of a concept for a molecular function from the GO ontology. beta12orEarlier Taxonomy This is a broad data type and is used a placeholder for other, more specific types. beta12orEarlier Data concerning the classification, identification and naming of organisms. Taxonomic data Protein ID (EMBL/GenBank/DDBJ) beta13 EMBL/GENBANK/DDBJ coding feature protein identifier, issued by International collaborators. This qualifier consists of a stable ID portion (3+5 format with 3 position letters and 5 numbers) plus a version number after the decimal point. When the protein sequence encoded by the CDS changes, only the version number of the /protein_id value is incremented; the stable part of the /protein_id remains unchanged and as a result will permanently be associated with a given protein; this qualifier is valid only on CDS features which translate into a valid protein. Core data Core data entities typically have a format and may be identified by an accession number. A type of data that (typically) corresponds to entries from the primary biological databases and which is (typically) the primary input or output of a tool, i.e. the data the tool processes or generates, as distinct from metadata and identifiers which describe and identify such core data, parameters that control the behaviour of tools, reports of derivative data generated by tools and annotation. 1.5 true beta13 Sequence feature identifier beta13 Name or other identifier of molecular sequence feature(s). Structure identifier beta13 An identifier of a molecular tertiary structure, typically an entry from a structure database. Matrix identifier An identifier of an array of numerical values, such as a comparison matrix. beta13 Protein sequence composition beta13 1.8 true A report (typically a table) on character or word composition / frequency of protein sequence(s). Nucleic acid sequence composition (report) 1.8 A report (typically a table) on character or word composition / frequency of nucleic acid sequence(s). true beta13 Protein domain classification node beta13 A node from a classification of protein structural domain(s). true 1.5 CAS number beta13 CAS registry number Unique numerical identifier of chemicals in the scientific literature, as assigned by the Chemical Abstracts Service. ATC code Unique identifier of a drug conforming to the Anatomical Therapeutic Chemical (ATC) Classification System, a drug classification system controlled by the WHO Collaborating Centre for Drug Statistics Methodology (WHOCC). beta13 UNII beta13 A unique, unambiguous, alphanumeric identifier of a chemical substance as catalogued by the Substance Registration System of the Food and Drug Administration (FDA). Unique Ingredient Identifier Geotemporal metadata 1.5 beta13 true Basic information concerning geographical location or time. System metadata Metadata concerning the software, hardware or other aspects of a computer system. beta13 Sequence feature name A name of a sequence feature, e.g. the name of a feature to be displayed to an end-user. beta13 Experimental measurement beta13 Raw data such as measurements or other results from laboratory experiments, as generated from laboratory hardware. Experimental measurement data Measurement This is a broad data type and is used a placeholder for other, more specific types. It is primarily intended to help navigation of EDAM and would not typically be used for annotation. Measured data Experimentally measured data Measurement metadata Measurement data Raw experimental data Raw microarray data beta13 Raw data (typically MIAME-compliant) for hybridisations from a microarray experiment. Such data as found in Affymetrix CEL or GPR files. Processed microarray data Data generated from processing and analysis of probe set data from a microarray experiment. Gene annotation (expression) Microarray probe set data beta13 Gene expression report Such data as found in Affymetrix .CHP files or data from other software such as RMA or dChip. Gene expression matrix This combines data from all hybridisations. beta13 Normalised microarray data The final processed (normalised) data for a set of hybridisations in a microarray experiment. Gene expression data matrix Sample annotation Annotation on a biological sample, for example experimental factors and their values. This might include compound and dose in a dose response experiment. beta13 Microarray metadata This might include gene identifiers, genomic coordinates, probe oligonucleotide sequences etc. Annotation on the array itself used in a microarray experiment. beta13 Microarray protocol annotation true This might describe e.g. the normalisation methods used to process the raw data. beta13 1.8 Annotation on laboratory and/or data processing protocols used in an microarray experiment. Microarray hybridisation data Data concerning the hybridisations measured during a microarray experiment. beta13 Protein features report (topological domains) 1.8 beta13 topological domains such as cytoplasmic regions in a protein. true Sequence features (compositionally-biased regions) 1.5 beta13 true A report of regions in a molecular sequence that are biased to certain characters. Nucleic acid features (difference and change) beta13 A report on features in a nucleic acid sequence that indicate changes to or differences between sequences. 1.5 true Nucleic acid features report (expression signal) true beta13 regions within a nucleic acid sequence containing a signal that alters a biological function. 1.8 Nucleic acid features report (binding) nucleic acids binding to some other molecule. 1.8 true beta13 This includes ribosome binding sites (Shine-Dalgarno sequence in prokaryotes). Nucleic acid repeats (report) true repetitive elements within a nucleic acid sequence. 1.8 beta13 Nucleic acid features report (replication and recombination) beta13 true 1.8 DNA replication or recombination. Nucleic acid structure report A report on regions within a nucleic acid sequence which form secondary or tertiary (3D) structures. Stem loop (report) d-loop (report) Nucleic acid features (structure) Quadruplexes (report) beta13 Protein features report (repeats) 1.8 short repetitive subsequences (repeat sequences) in a protein sequence. beta13 true Sequence motif matches (protein) Report on the location of matches to profiles, motifs (conserved or functional patterns) or other signatures in one or more protein sequences. 1.8 beta13 true Sequence motif matches (nucleic acid) Report on the location of matches to profiles, motifs (conserved or functional patterns) or other signatures in one or more nucleic acid sequences. beta13 true 1.8 Nucleic acid features (d-loop) beta13 true 1.5 A report on displacement loops in a mitochondrial DNA sequence. A displacement loop is a region of mitochondrial DNA in which one of the strands is displaced by an RNA molecule. Nucleic acid features (stem loop) beta13 true A report on stem loops in a DNA sequence. 1.5 A stem loop is a hairpin structure; a double-helical structure formed when two complementary regions of a single strand of RNA or DNA molecule form base-pairs. Gene transcript report This includes 5'untranslated region (5'UTR), coding sequences (CDS), exons, intervening sequences (intron) and 3'untranslated regions (3'UTR). Nucleic acid features (mRNA features) beta13 Transcript (report) mRNA features Gene transcript annotation Clone or EST (report) mRNA (report) An informative report on features of a messenger RNA (mRNA) molecules including precursor RNA, primary (unprocessed) transcript and fully processed molecules. This includes reports on a specific gene transcript, clone or EST. Nucleic acid features report (signal or transit peptide) true coding sequences for a signal or transit peptide. 1.8 beta13 Non-coding RNA beta13 true features of non-coding or functional RNA molecules, including tRNA and rRNA. 1.8 Transcriptional features (report) 1.5 true This includes promoters, CAAT signals, TATA signals, -35 signals, -10 signals, GC signals, primer binding sites for initiation of transcription or reverse transcription, enhancer, attenuator, terminators and ribosome binding sites. Features concerning transcription of DNA into RNA including the regulation of transcription. beta13 Nucleic acid features report (STS) sequence tagged sites (STS) in nucleic acid sequences. 1.8 true beta13 Nucleic acid features (immunoglobulin gene structure) true beta13 1.5 A report on predicted or actual immunoglobulin gene structure including constant, switch and variable regions and diversity, joining and variable segments. SCOP class 1.5 beta13 true Information on a 'class' node from the SCOP database. SCOP fold beta13 Information on a 'fold' node from the SCOP database. 1.5 true SCOP superfamily beta13 Information on a 'superfamily' node from the SCOP database. 1.5 true SCOP family 1.5 true Information on a 'family' node from the SCOP database. beta13 SCOP protein Information on a 'protein' node from the SCOP database. true beta13 1.5 SCOP species 1.5 true beta13 Information on a 'species' node from the SCOP database. Mass spectrometry experiment 1.8 true mass spectrometry experiments. beta13 Gene family report An informative report on a particular family of genes, typically a set of genes with similar sequence that originate from duplication of a common ancestor gene, or any other classification of nucleic acid sequences or structures that reflects gene structure. This includes reports on on gene homologues between species. beta13 Gene annotation (homology information) Homology information Gene annotation (homology) Nucleic acid classification Gene family annotation Gene homology (report) Protein image beta13 An image of a protein. Protein alignment An alignment of protein sequences and/or structures. beta13 NGS experiment 1.8 1.0 sequencing experiment, including samples, sampling, preparation, sequencing, and analysis. true Sequence assembly report An informative report about a DNA sequence assembly. 1.1 This might include an overall quality assement of the assembly and summary statistics including counts, average length and number of bases for reads, matches and non-matches, contigs, reads in pairs etc. Assembly report Genome index 1.1 Many sequence alignment tasks involving many or very large sequences rely on a precomputed index of the sequence to accelerate the alignment. An index of a genome sequence. GWAS report 1.8 1.1 Report concerning genome-wide association study experiments. true Genome-wide association study Cytoband position 1.2 The position of a cytogenetic band in a genome. Information might include start and end position in a chromosome sequence, chromosome identifier, name of band and so on. Cell type ontology ID CL ID Cell type ontology concept ID. CL_[0-9]{7} 1.2 beta12orEarlier Kinetic model 1.2 Mathematical model of a network, that contains biochemical kinetics. COSMIC ID COSMIC identifier cosmic ID Identifier of a COSMIC database entry. cosmic identifier cosmic id 1.3 HGMD ID Identifier of a HGMD database entry. hgmd ID hgmd identifier beta12orEarlier hgmd id HGMD identifier Sequence assembly ID Sequence assembly version Unique identifier of sequence assembly. 1.3 Sequence feature type true A label (text token) describing a type of sequence feature such as gene, transcript, cds, exon, repeat, simple, misc, variation, somatic variation, structural variation, somatic structural variation, constrained or regulatory. 1.3 1.5 Gene homology (report) beta12orEarlier true An informative report on gene homologues between species. 1.5 Ensembl gene tree ID ENSGT00390000003602 Ensembl ID (gene tree) Unique identifier for a gene tree from the Ensembl database. 1.3 Gene tree 1.3 A phylogenetic tree that is an estimate of the character's phylogeny. Species tree A phylogenetic tree that reflects phylogeny of the taxa from which the characters (used in calculating the tree) were sampled. 1.3 Sample ID 1.3 Sample accession Name or other identifier of an entry from a biosample database. MGI accession Identifier of an object from the MGI database. 1.3 Phenotype name 1.3 Name of a phenotype. Phenotypes Phenotype Transition matrix A HMM transition matrix contains the probabilities of switching from one HMM state to another. Consider for example an HMM with two states (AT-rich and GC-rich). The transition matrix will hold the probabilities of switching from the AT-rich to the GC-rich state, and vica versa. HMM transition matrix 1.4 Emission matrix A HMM emission matrix holds the probabilities of choosing the four nucleotides (A, C, G and T) in each of the states of a HMM. 1.4 Consider for example an HMM with two states (AT-rich and GC-rich). The emission matrix holds the probabilities of choosing each of the four nucleotides (A, C, G and T) in the AT-rich state and in the GC-rich state. HMM emission matrix Hidden Markov model A statistical Markov model of a system which is assumed to be a Markov process with unobserved (hidden) states. 1.4 Format identifier An identifier of a data format. 1.4 Raw image 1.5 Amino acid data http://semanticscience.org/resource/SIO_000081 beta12orEarlier Image data Raw biological or biomedical image generated by some experimental technique. Carbohydrate property Carbohydrate data Data concerning the intrinsic physical (e.g. structural) or chemical properties of one, more or all carbohydrates. 1.5 Proteomics experiment report true 1.8 Report concerning proteomics experiments. 1.5 RNAi report 1.5 RNAi experiments. true 1.8 Simulation experiment report 1.5 biological computational model experiments (simulation), for example the minimum information required in order to permit its correct interpretation and reproduction. true 1.8 MRI image MRT image 1.7 Magnetic resonance tomography image Nuclear magnetic resonance imaging image Magnetic resonance imaging image NMRI image An imaging technique that uses magnetic fields and radiowaves to form images, typically to investigate the anatomy and physiology of the human body. Cell migration track image 1.7 An image from a cell migration track assay. Rate of association kon 1.7 Rate of association of a protein with another protein or some other molecule. Gene order Such data are often used for genome rearrangement tools and phylogenetic tree labeling. Multiple gene identifiers in a specific order. 1.7 Spectrum 1.7 The spectrum of frequencies of electromagnetic radiation emitted from a molecule as a result of some spectroscopy experiment. Spectra NMR spectrum Spectral information for a molecule from a nuclear magnetic resonance experiment. 1.7 NMR spectra Chemical structure sketch Chemical structure sketches are used for presentational purposes but also as inputs to various analysis software. 1.8 Small molecule sketch A sketch of a small molecule made with some specialised drawing package. Nucleic acid signature 1.8 An informative report about a specific or conserved nucleic acid sequence pattern. DNA sequence DNA sequences 1.8 A DNA sequence. RNA sequence A DNA sequence. DNA sequences RNA sequences 1.8 RNA sequence (raw) Raw sequence (RNA) 1.8 A raw RNA sequence. RNA raw sequence DNA sequence (raw) Raw sequence (DNA) A raw DNA sequence. 1.8 DNA raw sequence Sequence variations 1.8 Data on gene sequence variations resulting large-scale genotyping and DNA sequencing projects. Gene sequence variations Variations are stored along with a reference genome. Bibliography 1.8 A list of publications such as scientic papers or books. Ontology mapping A mapping of supplied textual terms or phrases to ontology concepts (URIs). beta12orEarlier Image metadata Image-associated data This can include basic provenance and technical information about the image, scientific annotation and so on. Any data concerning a specific biological or biomedical image. 1.9 Image data Image-related data Clinical trial report Clinical trial information A report concerning a clinical trial. 1.9 Reference sample report 1.10 A report about a biosample. Biosample report Gene Expression Atlas Experiment ID Accession number of an entry from the Gene Expression Atlas. 1.10 SMILES Chemical structure specified in Simplified Molecular Input Line Entry System (SMILES) line notation. beta12orEarlier InChI Chemical structure specified in IUPAC International Chemical Identifier (InChI) line notation. beta12orEarlier mf Chemical structure specified by Molecular Formula (MF), including a count of each element in a compound. beta12orEarlier The general MF query format consists of a series of valid atomic symbols, with an optional number or range. inchikey The InChIKey (hashed InChI) is a fixed length (25 character) condensed digital representation of an InChI chemical structure specification. It uniquely identifies a chemical compound. beta12orEarlier An InChI identifier is not human-readable but is more suitable for web searches than an InChI chemical structure specification. smarts SMILES ARbitrary Target Specification (SMARTS) format for chemical structure specification, which is a subset of the SMILES line notation. beta12orEarlier unambiguous pure beta12orEarlier Alphabet for a molecular sequence with possible unknown positions but without ambiguity or non-sequence characters. nucleotide Non-sequence characters may be used for example for gaps. http://onto.eva.mpg.de/ontologies/gfo-bio.owl#Nucleotide_sequence beta12orEarlier Alphabet for a nucleotide sequence with possible ambiguity, unknown positions and non-sequence characters. protein Alphabet for a protein sequence with possible ambiguity, unknown positions and non-sequence characters. beta12orEarlier Non-sequence characters may be used for gaps and translation stop. http://onto.eva.mpg.de/ontologies/gfo-bio.owl#Amino_acid_sequence consensus beta12orEarlier Alphabet for the consensus of two or more molecular sequences. pure nucleotide beta12orEarlier Alphabet for a nucleotide sequence with possible ambiguity and unknown positions but without non-sequence characters. unambiguous pure nucleotide beta12orEarlier Alphabet for a nucleotide sequence (characters ACGTU only) with possible unknown positions but without ambiguity or non-sequence characters . dna beta12orEarlier http://onto.eva.mpg.de/ontologies/gfo-bio.owl#DNA_sequence Alphabet for a DNA sequence with possible ambiguity, unknown positions and non-sequence characters. rna Alphabet for an RNA sequence with possible ambiguity, unknown positions and non-sequence characters. http://onto.eva.mpg.de/ontologies/gfo-bio.owl#RNA_sequence beta12orEarlier unambiguous pure dna Alphabet for a DNA sequence (characters ACGT only) with possible unknown positions but without ambiguity or non-sequence characters. beta12orEarlier pure dna Alphabet for a DNA sequence with possible ambiguity and unknown positions but without non-sequence characters. beta12orEarlier unambiguous pure rna sequence Alphabet for an RNA sequence (characters ACGU only) with possible unknown positions but without ambiguity or non-sequence characters. beta12orEarlier pure rna Alphabet for an RNA sequence with possible ambiguity and unknown positions but without non-sequence characters. beta12orEarlier unambiguous pure protein beta12orEarlier Alphabet for any protein sequence with possible unknown positions but without ambiguity or non-sequence characters. pure protein beta12orEarlier Alphabet for any protein sequence with possible ambiguity and unknown positions but without non-sequence characters. UniGene entry format beta12orEarlier Format of an entry from UniGene. A UniGene entry includes a set of transcript sequences assigned to the same transcription locus (gene or expressed pseudogene), with information on protein similarities, gene expression, cDNA clone reagents, and genomic location. beta12orEarlier true COG sequence cluster format beta12orEarlier true beta12orEarlier Format of an entry from the COG database of clusters of (related) protein sequences. EMBL feature location beta12orEarlier Feature location Format for sequence positions (feature location) as used in DDBJ/EMBL/GenBank database. quicktandem Report format for tandem repeats in a nucleotide sequence (format generated by the Sanger Centre quicktandem program). beta12orEarlier Sanger inverted repeats beta12orEarlier Report format for inverted repeats in a nucleotide sequence (format generated by the Sanger Centre inverted program). EMBOSS repeat Report format for tandem repeats in a sequence (an EMBOSS report format). beta12orEarlier est2genome format beta12orEarlier Format of a report on exon-intron structure generated by EMBOSS est2genome. restrict format Report format for restriction enzyme recognition sites used by EMBOSS restrict program. beta12orEarlier restover format beta12orEarlier Report format for restriction enzyme recognition sites used by EMBOSS restover program. REBASE restriction sites beta12orEarlier Report format for restriction enzyme recognition sites used by REBASE database. FASTA search results format Format of results of a sequence database search using FASTA. beta12orEarlier This includes (typically) score data, alignment data and a histogram (of observed and expected distribution of E values.) BLAST results Format of results of a sequence database search using some variant of BLAST. beta12orEarlier This includes score data, alignment data and summary table. mspcrunch beta12orEarlier Format of results of a sequence database search using some variant of MSPCrunch. Smith-Waterman format beta12orEarlier Format of results of a sequence database search using some variant of Smith Waterman. dhf The hits are relatives to a SCOP or CATH family and are found from a search of a sequence database. beta12orEarlier Format of EMBASSY domain hits file (DHF) of hits (sequences) with domain classification information. lhf beta12orEarlier Format of EMBASSY ligand hits file (LHF) of database hits (sequences) with ligand classification information. The hits are putative ligand-binding sequences and are found from a search of a sequence database. InterPro hits format Results format for searches of the InterPro database. beta12orEarlier InterPro protein view report format Format of results of a search of the InterPro database showing matches of query protein sequence(s) to InterPro entries. The report includes a classification of regions in a query protein sequence which are assigned to a known InterPro protein family or group. beta12orEarlier InterPro match table format Format of results of a search of the InterPro database showing matches between protein sequence(s) and signatures for an InterPro entry. beta12orEarlier The table presents matches between query proteins (rows) and signature methods (columns) for this entry. Alternatively the sequence(s) might be from from the InterPro entry itself. The match position in the protein sequence and match status (true positive, false positive etc) are indicated. HMMER Dirichlet prior beta12orEarlier Dirichlet distribution HMMER format. MEME Dirichlet prior beta12orEarlier Dirichlet distribution MEME format. HMMER emission and transition Format of a report from the HMMER package on the emission and transition counts of a hidden Markov model. beta12orEarlier prosite-pattern Format of a regular expression pattern from the Prosite database. beta12orEarlier EMBOSS sequence pattern Format of an EMBOSS sequence pattern. beta12orEarlier meme-motif A motif in the format generated by the MEME program. beta12orEarlier prosite-profile Sequence profile (sequence classifier) format used in the PROSITE database. beta12orEarlier JASPAR format beta12orEarlier A profile (sequence classifier) in the format used in the JASPAR database. MEME background Markov model Format of the model of random sequences used by MEME. beta12orEarlier HMMER format Format of a hidden Markov model representation used by the HMMER package. beta12orEarlier HMMER-aln beta12orEarlier FASTA-style format for multiple sequences aligned by HMMER package to an HMM. DIALIGN format Format of multiple sequences aligned by DIALIGN package. beta12orEarlier daf The format is clustal-like and includes annotation of domain family classification information. EMBASSY 'domain alignment file' (DAF) format, containing a sequence alignment of protein domains belonging to the same SCOP or CATH family. beta12orEarlier Sequence-MEME profile alignment beta12orEarlier Format for alignment of molecular sequences to MEME profiles (position-dependent scoring matrices) as generated by the MAST tool from the MEME package. HMMER profile alignment (sequences versus HMMs) Format used by the HMMER package for an alignment of a sequence against a hidden Markov model database. beta12orEarlier HMMER profile alignment (HMM versus sequences) Format used by the HMMER package for of an alignment of a hidden Markov model against a sequence database. beta12orEarlier Phylip distance matrix Data Type must include the distance matrix, probably as pairs of sequence identifiers with a distance (integer or float). beta12orEarlier Format of PHYLIP phylogenetic distance matrix data. ClustalW dendrogram beta12orEarlier Dendrogram (tree file) format generated by ClustalW. Phylip tree raw Raw data file format used by Phylip from which a phylogenetic tree is directly generated or plotted. beta12orEarlier Phylip continuous quantitative characters beta12orEarlier PHYLIP file format for continuous quantitative character data. Phylogenetic property values format Format of phylogenetic property data. beta12orEarlier beta12orEarlier true Phylip character frequencies format beta12orEarlier PHYLIP file format for phylogenetics character frequency data. Phylip discrete states format Format of PHYLIP discrete states data. beta12orEarlier Phylip cliques format beta12orEarlier Format of PHYLIP cliques data. Phylip tree format Phylogenetic tree data format used by the PHYLIP program. beta12orEarlier TreeBASE format beta12orEarlier The format of an entry from the TreeBASE database of phylogenetic data. TreeFam format beta12orEarlier The format of an entry from the TreeFam database of phylogenetic data. Phylip tree distance format Format for distances, such as Branch Score distance, between two or more phylogenetic trees as used by the Phylip package. beta12orEarlier dssp beta12orEarlier The DSSP database is built using the DSSP application which defines secondary structure, geometrical features and solvent exposure of proteins, given atomic coordinates in PDB format. Format of an entry from the DSSP database (Dictionary of Secondary Structure in Proteins). hssp Entry format of the HSSP database (Homology-derived Secondary Structure in Proteins). beta12orEarlier Dot-bracket format beta12orEarlier Format of RNA secondary structure in dot-bracket notation, originally generated by the Vienna RNA package/server. Vienna RNA secondary structure format Vienna RNA format Vienna local RNA secondary structure format Format of local RNA secondary structure components with free energy values, generated by the Vienna RNA package/server. beta12orEarlier PDB database entry format beta12orEarlier PDB entry format Format of an entry (or part of an entry) from the PDB database. PDB PDB format beta12orEarlier Entry format of PDB database in PDB format. mmCIF Chemical MIME (http://www.ch.ic.ac.uk/chemime): chemical/x-mmcif Entry format of PDB database in mmCIF format. beta12orEarlier mmcif PDBML Entry format of PDB database in PDBML (XML) format. beta12orEarlier Domainatrix 3D-1D scoring matrix format beta12orEarlier true beta12orEarlier Format of a matrix of 3D-1D scores used by the EMBOSS Domainatrix applications. aaindex Amino acid index format used by the AAindex database. beta12orEarlier IntEnz enzyme report format beta12orEarlier beta12orEarlier Format of an entry from IntEnz (The Integrated Relational Enzyme Database). IntEnz is the master copy of the Enzyme Nomenclature, the recommendations of the NC-IUBMB on the Nomenclature and Classification of Enzyme-Catalysed Reactions. true BRENDA enzyme report format true Format of an entry from the BRENDA enzyme database. beta12orEarlier beta12orEarlier KEGG REACTION enzyme report format true beta12orEarlier Format of an entry from the KEGG REACTION database of biochemical reactions. beta12orEarlier KEGG ENZYME enzyme report format beta12orEarlier true Format of an entry from the KEGG ENZYME database. beta12orEarlier REBASE proto enzyme report format Format of an entry from the proto section of the REBASE enzyme database. true beta12orEarlier beta12orEarlier REBASE withrefm enzyme report format beta12orEarlier true beta12orEarlier Format of an entry from the withrefm section of the REBASE enzyme database. Pcons report format Format of output of the Pcons Model Quality Assessment Program (MQAP). beta12orEarlier Pcons ranks protein models by assessing their quality based on the occurrence of recurring common three-dimensional structural patterns. Pcons returns a score reflecting the overall global quality and a score for each individual residue in the protein reflecting the local residue quality. ProQ report format beta12orEarlier ProQ is a neural network-based predictor that predicts the quality of a protein model based on the number of structural features. Format of output of the ProQ protein model quality predictor. SMART domain assignment report format beta12orEarlier true Format of SMART domain assignment data. The SMART output file includes data on genetically mobile domains / analysis of domain architectures, including phyletic distributions, functional class, tertiary structures and functionally important residues. beta12orEarlier BIND entry format Entry format for the BIND database of protein interaction. beta12orEarlier true beta12orEarlier IntAct entry format beta12orEarlier beta12orEarlier Entry format for the IntAct database of protein interaction. true InterPro entry format Entry format for the InterPro database of protein signatures (sequence classifiers) and classified sequences. true beta12orEarlier This includes signature metadata, sequence references and a reference to the signature itself. There is normally a header (entry accession numbers and name), abstract, taxonomy information, example proteins etc. Each entry also includes a match list which give a number of different views of the signature matches for the sequences in each InterPro entry. beta12orEarlier InterPro entry abstract format true beta12orEarlier References are included and a functional inference is made where possible. beta12orEarlier Entry format for the textual abstract of signatures in an InterPro entry and its protein matches. Gene3D entry format Entry format for the Gene3D protein secondary database. true beta12orEarlier beta12orEarlier PIRSF entry format beta12orEarlier Entry format for the PIRSF protein secondary database. true beta12orEarlier PRINTS entry format beta12orEarlier beta12orEarlier true Entry format for the PRINTS protein secondary database. Panther Families and HMMs entry format beta12orEarlier beta12orEarlier Entry format for the Panther library of protein families and subfamilies. true Pfam entry format Entry format for the Pfam protein secondary database. true beta12orEarlier beta12orEarlier SMART entry format true beta12orEarlier Entry format for the SMART protein secondary database. beta12orEarlier Superfamily entry format Entry format for the Superfamily protein secondary database. beta12orEarlier beta12orEarlier true TIGRFam entry format beta12orEarlier true Entry format for the TIGRFam protein secondary database. beta12orEarlier ProDom entry format Entry format for the ProDom protein domain classification database. beta12orEarlier beta12orEarlier true FSSP entry format Entry format for the FSSP database. beta12orEarlier true beta12orEarlier findkm beta12orEarlier A report format for the kinetics of enzyme-catalysed reaction(s) in a format generated by EMBOSS findkm. This includes Michaelis Menten plot, Hanes Woolf plot, Michaelis Menten constant (Km) and maximum velocity (Vmax). Ensembl gene report format beta12orEarlier Entry format of Ensembl genome database. beta12orEarlier true DictyBase gene report format true beta12orEarlier Entry format of DictyBase genome database. beta12orEarlier CGD gene report format beta12orEarlier true beta12orEarlier Entry format of Candida Genome database. DragonDB gene report format beta12orEarlier Entry format of DragonDB genome database. beta12orEarlier true EcoCyc gene report format Entry format of EcoCyc genome database. beta12orEarlier beta12orEarlier true FlyBase gene report format true beta12orEarlier beta12orEarlier Entry format of FlyBase genome database. Gramene gene report format beta12orEarlier beta12orEarlier Entry format of Gramene genome database. true KEGG GENES gene report format true beta12orEarlier Entry format of KEGG GENES genome database. beta12orEarlier MaizeGDB gene report format beta12orEarlier beta12orEarlier true Entry format of the Maize genetics and genomics database (MaizeGDB). MGD gene report format Entry format of the Mouse Genome Database (MGD). beta12orEarlier beta12orEarlier true RGD gene report format true beta12orEarlier Entry format of the Rat Genome Database (RGD). beta12orEarlier SGD gene report format true beta12orEarlier beta12orEarlier Entry format of the Saccharomyces Genome Database (SGD). GeneDB gene report format Entry format of the Sanger GeneDB genome database. true beta12orEarlier beta12orEarlier TAIR gene report format beta12orEarlier beta12orEarlier Entry format of The Arabidopsis Information Resource (TAIR) genome database. true WormBase gene report format Entry format of the WormBase genomes database. beta12orEarlier beta12orEarlier true ZFIN gene report format beta12orEarlier beta12orEarlier true Entry format of the Zebrafish Information Network (ZFIN) genome database. TIGR gene report format true Entry format of the TIGR genome database. beta12orEarlier beta12orEarlier dbSNP polymorphism report format beta12orEarlier Entry format for the dbSNP database. true beta12orEarlier OMIM entry format beta12orEarlier true beta12orEarlier Format of an entry from the OMIM database of genotypes and phenotypes. HGVbase entry format true Format of a record from the HGVbase database of genotypes and phenotypes. beta12orEarlier beta12orEarlier HIVDB entry format beta12orEarlier beta12orEarlier true Format of a record from the HIVDB database of genotypes and phenotypes. KEGG DISEASE entry format beta12orEarlier Format of an entry from the KEGG DISEASE database. true beta12orEarlier Primer3 primer Report format on PCR primers and hybridization oligos as generated by Whitehead primer3 program. beta12orEarlier ABI A format of raw sequence read data from an Applied Biosystems sequencing machine. beta12orEarlier mira Format of MIRA sequence trace information file. beta12orEarlier CAF Common Assembly Format (CAF). A sequence assembly format including contigs, base-call qualities, and other metadata. beta12orEarlier exp Sequence assembly project file EXP format. beta12orEarlier SCF Staden Chromatogram Files format (SCF) of base-called sequence reads, qualities, and other metadata. beta12orEarlier PHD beta12orEarlier PHD sequence trace format to store serialised chromatogram data (reads). dat beta12orEarlier Format of Affymetrix data file of raw image data. Affymetrix image data file format cel beta12orEarlier Affymetrix probe raw data format Format of Affymetrix data file of information about (raw) expression levels of the individual probes. affymetrix Format of affymetrix gene cluster files (hc-genes.txt, hc-chips.txt) from hierarchical clustering. beta12orEarlier ArrayExpress entry format beta12orEarlier true Entry format for the ArrayExpress microarrays database. beta12orEarlier affymetrix-exp Affymetrix data file format for information about experimental conditions and protocols. Affymetrix experimental conditions data file format beta12orEarlier CHP Affymetrix probe normalised data format beta12orEarlier Format of Affymetrix data file of information about (normalised) expression levels of the individual probes. EMDB entry format beta12orEarlier Format of an entry from the Electron Microscopy DataBase (EMDB). true beta12orEarlier KEGG PATHWAY entry format beta12orEarlier beta12orEarlier The format of an entry from the KEGG PATHWAY database of pathway maps for molecular interactions and reaction networks. true MetaCyc entry format true beta12orEarlier The format of an entry from the MetaCyc metabolic pathways database. beta12orEarlier HumanCyc entry format The format of a report from the HumanCyc metabolic pathways database. true beta12orEarlier beta12orEarlier INOH entry format beta12orEarlier true The format of an entry from the INOH signal transduction pathways database. beta12orEarlier PATIKA entry format beta12orEarlier The format of an entry from the PATIKA biological pathways database. beta12orEarlier true Reactome entry format beta12orEarlier The format of an entry from the reactome biological pathways database. true beta12orEarlier aMAZE entry format beta12orEarlier true The format of an entry from the aMAZE biological pathways and molecular interactions database. beta12orEarlier CPDB entry format The format of an entry from the CPDB database. beta12orEarlier true beta12orEarlier Panther Pathways entry format beta12orEarlier true beta12orEarlier The format of an entry from the Panther Pathways database. Taverna workflow format Format of Taverna workflows. beta12orEarlier BioModel mathematical model format beta12orEarlier beta12orEarlier Format of mathematical models from the BioModel database. true Models are annotated and linked to relevant data resources, such as publications, databases of compounds and pathways, controlled vocabularies, etc. KEGG LIGAND entry format The format of an entry from the KEGG LIGAND chemical database. beta12orEarlier beta12orEarlier true KEGG COMPOUND entry format beta12orEarlier The format of an entry from the KEGG COMPOUND database. true beta12orEarlier KEGG PLANT entry format beta12orEarlier beta12orEarlier The format of an entry from the KEGG PLANT database. true KEGG GLYCAN entry format true beta12orEarlier The format of an entry from the KEGG GLYCAN database. beta12orEarlier PubChem entry format beta12orEarlier The format of an entry from PubChem. true beta12orEarlier ChemSpider entry format beta12orEarlier The format of an entry from a database of chemical structures and property predictions. beta12orEarlier true ChEBI entry format beta12orEarlier beta12orEarlier The format of an entry from Chemical Entities of Biological Interest (ChEBI). true ChEBI includes an ontological classification defining relations between entities or classes of entities. MSDchem ligand dictionary entry format The format of an entry from the MSDchem ligand dictionary. beta12orEarlier true beta12orEarlier HET group dictionary entry format The format of an entry from the HET group dictionary (HET groups from PDB files). beta12orEarlier KEGG DRUG entry format The format of an entry from the KEGG DRUG database. true beta12orEarlier beta12orEarlier PubMed citation beta12orEarlier Format of bibliographic reference as used by the PubMed database. Medline Display Format beta12orEarlier Format for abstracts of scientific articles from the Medline database. Bibliographic reference information including citation information is included CiteXplore-core beta12orEarlier CiteXplore 'core' citation format including title, journal, authors and abstract. CiteXplore-all CiteXplore 'all' citation format includes all known details such as Mesh terms and cross-references. beta12orEarlier pmc beta12orEarlier Article format of the PubMed Central database. iHOP text mining abstract format beta12orEarlier iHOP abstract format. Oscar3 Oscar 3 performs chemistry-specific parsing of chemical documents. It attempts to identify chemical names, ontology concepts and chemical data from a document. Text mining abstract format from the Oscar 3 application. beta12orEarlier PDB atom record format true beta13 beta12orEarlier Format of an ATOM record (describing data for an individual atom) from a PDB file. CATH chain report format The report (for example http://www.cathdb.info/chain/1cukA) includes chain identifiers, domain identifiers and CATH codes for domains in a given protein chain. beta12orEarlier Format of CATH domain classification information for a polypeptide chain. beta12orEarlier true CATH PDB report format beta12orEarlier beta12orEarlier true Format of CATH domain classification information for a protein PDB file. The report (for example http://www.cathdb.info/pdb/1cuk) includes chain identifiers, domain identifiers and CATH codes for domains in a given PDB file. NCBI gene report format true Entry (gene) format of the NCBI database. beta12orEarlier beta12orEarlier GeneIlluminator gene report format Report format for biological functions associated with a gene name and its alternative names (synonyms, homonyms), as generated by the GeneIlluminator service. This includes a gene name and abbreviation of the name which may be in a name space indicating the gene status and relevant organisation. beta12orEarlier beta12orEarlier Moby:GI_Gene true BacMap gene card format Format of a report on the DNA and protein sequences for a given gene label from a bacterial chromosome maps from the BacMap database. true beta12orEarlier beta12orEarlier Moby:BacMapGeneCard ColiCard report format Format of a report on Escherichia coli genes, proteins and molecules from the CyberCell Database (CCDB). true beta12orEarlier Moby:ColiCard beta12orEarlier PlasMapper TextMap beta12orEarlier Map of a plasmid (circular DNA) in PlasMapper TextMap format. newick nh beta12orEarlier Phylogenetic tree Newick (text) format. TreeCon format beta12orEarlier Phylogenetic tree TreeCon (text) format. Nexus format Phylogenetic tree Nexus (text) format. beta12orEarlier Format http://en.wikipedia.org/wiki/File_format http://purl.org/biotop/biotop.owl#MachineLanguage File format Data model http://www.onto-med.de/ontologies/gfo.owl#Symbol_structure Exchange format "http://purl.obolibrary.org/obo/IAO_0000098" http://semanticscience.org/resource/SIO_000612 http://semanticscience.org/resource/SIO_000618 beta12orEarlier http://www.ifomis.org/bfo/1.1/snap#Continuant http://www.loa-cnr.it/ontologies/DOLCE-Lite.owl#quality "http://purl.org/dc/elements/1.1/format" http://wsio.org/compression_004 A defined way or layout of representing and structuring data in a computer file, blob, string, message, or elsewhere. http://en.wikipedia.org/wiki/List_of_file_formats http://www.ifomis.org/bfo/1.1/snap#Quality Data format http://purl.org/biotop/biotop.owl#Quality The main focus in EDAM lies on formats as means of structuring data exchanged between different tools or resources. The serialisation, compression, or encoding of concrete data formats/models is not in scope of EDAM. Format 'is format of' Data. http://www.onto-med.de/ontologies/gfo.owl#Perpetuant A defined data format has its implicit or explicit data model, and EDAM does not distinguish the two. Some data models however do not have any standard way of serialisation into an exchange format, and those are thus not considered formats in EDAM. (Remark: even broader - or closely related - term to 'Data model' would be an 'Information model'.) Data model File format denotes only formats of a computer file, but the same formats apply also to data blobs or exchanged messages. File format Atomic data format beta12orEarlier beta13 Data format for an individual atom. true Sequence record format Data format for a molecular sequence record. beta12orEarlier Sequence feature annotation format beta12orEarlier Data format for molecular sequence feature information. Alignment format Data format for molecular sequence alignment information. beta12orEarlier acedb beta12orEarlier ACEDB sequence format. clustal sequence format true beta12orEarlier Clustalw output format. beta12orEarlier codata Codata entry format. beta12orEarlier dbid beta12orEarlier Fasta format variant with database name before ID. EMBL format EMBL entry format. EMBL sequence format EMBL beta12orEarlier Staden experiment format Staden experiment file format. beta12orEarlier FASTA beta12orEarlier FASTA format FASTA sequence format FASTA format including NCBI-style IDs. FASTQ FASTQ short read format ignoring quality scores. beta12orEarlier FASTAQ fq FASTQ-illumina FASTQ Illumina 1.3 short read format. beta12orEarlier FASTQ-sanger FASTQ short read format with phred quality. beta12orEarlier FASTQ-solexa FASTQ Solexa/Illumina 1.0 short read format. beta12orEarlier fitch program Fitch program format. beta12orEarlier GCG GCG SSF beta12orEarlier GCG SSF (single sequence file) file format. GCG sequence file format. GenBank format beta12orEarlier Genbank entry format. genpept beta12orEarlier Genpept protein entry format. Currently identical to refseqp format GFF2-seq GFF feature file format with sequence in the header. beta12orEarlier GFF3-seq GFF3 feature file format with sequence. beta12orEarlier giFASTA format FASTA sequence format including NCBI-style GIs. beta12orEarlier hennig86 beta12orEarlier Hennig86 output sequence format. ig Intelligenetics sequence format. beta12orEarlier igstrict beta12orEarlier Intelligenetics sequence format (strict version). jackknifer Jackknifer interleaved and non-interleaved sequence format. beta12orEarlier mase format beta12orEarlier Mase program sequence format. mega-seq beta12orEarlier Mega interleaved and non-interleaved sequence format. MSF GCG MSF beta12orEarlier GCG MSF (multiple sequence file) file format. nbrf/pir NBRF/PIR entry sequence format. nbrf beta12orEarlier pir nexus-seq beta12orEarlier Nexus/paup interleaved sequence format. pdbatom pdb format in EMBOSS. beta12orEarlier PDB sequence format (ATOM lines). pdbatomnuc beta12orEarlier pdbnuc format in EMBOSS. PDB nucleotide sequence format (ATOM lines). pdbseqresnuc pdbnucseq format in EMBOSS. PDB nucleotide sequence format (SEQRES lines). beta12orEarlier pdbseqres PDB sequence format (SEQRES lines). beta12orEarlier pdbseq format in EMBOSS. Pearson format beta12orEarlier Plain old FASTA sequence format (unspecified format for IDs). phylip sequence format beta12orEarlier Phylip interleaved sequence format. true beta12orEarlier phylipnon sequence format true Phylip non-interleaved sequence format. beta12orEarlier beta12orEarlier raw beta12orEarlier Raw sequence format with no non-sequence characters. refseqp beta12orEarlier Refseq protein entry sequence format. Currently identical to genpept format selex sequence format beta12orEarlier true beta12orEarlier Selex sequence format. Staden format beta12orEarlier Staden suite sequence format. Stockholm format Stockholm multiple sequence alignment format (used by Pfam and Rfam). beta12orEarlier strider format DNA strider output sequence format. beta12orEarlier UniProtKB format UniProt format SwissProt format beta12orEarlier UniProtKB entry sequence format. plain text format (unformatted) beta12orEarlier Plain text sequence format (essentially unformatted). treecon sequence format true beta12orEarlier beta12orEarlier Treecon output sequence format. ASN.1 sequence format NCBI ASN.1-based sequence format. beta12orEarlier DAS format das sequence format DAS sequence (XML) format (any type). beta12orEarlier dasdna beta12orEarlier DAS sequence (XML) format (nucleotide-only). The use of this format is deprecated. debug-seq EMBOSS debugging trace sequence format of full internal data content. beta12orEarlier jackknifernon beta12orEarlier Jackknifer output sequence non-interleaved format. meganon sequence format beta12orEarlier beta12orEarlier Mega non-interleaved output sequence format. true NCBI format NCBI FASTA sequence format with NCBI-style IDs. beta12orEarlier There are several variants of this. nexusnon Nexus/paup non-interleaved sequence format. beta12orEarlier GFF2 beta12orEarlier General Feature Format (GFF) of sequence features. GFF3 beta12orEarlier Generic Feature Format version 3 (GFF3) of sequence features. pir true 1.7 PIR feature format. beta12orEarlier swiss feature true Swiss-Prot feature format. beta12orEarlier beta12orEarlier DASGFF DAS GFF (XML) feature format. das feature DASGFF feature beta12orEarlier debug-feat EMBOSS debugging trace feature format of full internal data content. beta12orEarlier EMBL feature beta12orEarlier EMBL feature format. true beta12orEarlier GenBank feature beta12orEarlier Genbank feature format. beta12orEarlier true ClustalW format clustal beta12orEarlier ClustalW format for (aligned) sequences. debug EMBOSS alignment format for debugging trace of full internal data content. beta12orEarlier FASTA-aln beta12orEarlier Fasta format for (aligned) sequences. markx0 beta12orEarlier Pearson MARKX0 alignment format. markx1 Pearson MARKX1 alignment format. beta12orEarlier markx10 beta12orEarlier Pearson MARKX10 alignment format. markx2 beta12orEarlier Pearson MARKX2 alignment format. markx3 beta12orEarlier Pearson MARKX3 alignment format. match Alignment format for start and end of matches between sequence pairs. beta12orEarlier mega Mega format for (typically aligned) sequences. beta12orEarlier meganon Mega non-interleaved format for (typically aligned) sequences. beta12orEarlier msf alignment format true beta12orEarlier beta12orEarlier MSF format for (aligned) sequences. nexus alignment format Nexus/paup format for (aligned) sequences. beta12orEarlier beta12orEarlier true nexusnon alignment format beta12orEarlier true Nexus/paup non-interleaved format for (aligned) sequences. beta12orEarlier pair EMBOSS simple sequence pair alignment format. beta12orEarlier PHYLIP format phy beta12orEarlier ph http://www.bioperl.org/wiki/PHYLIP_multiple_alignment_format PHYLIP interleaved format Phylip format for (aligned) sequences. phylipnon http://www.bioperl.org/wiki/PHYLIP_multiple_alignment_format beta12orEarlier PHYLIP sequential format Phylip non-interleaved format for (aligned) sequences. scores format Alignment format for score values for pairs of sequences. beta12orEarlier selex beta12orEarlier SELEX format for (aligned) sequences. EMBOSS simple format EMBOSS simple multiple alignment format. beta12orEarlier srs format beta12orEarlier Simple multiple sequence (alignment) format for SRS. srspair beta12orEarlier Simple sequence pair (alignment) format for SRS. T-Coffee format T-Coffee program alignment format. beta12orEarlier TreeCon-seq Treecon format for (aligned) sequences. beta12orEarlier Phylogenetic tree format Data format for a phylogenetic tree. beta12orEarlier Biological pathway or network format beta12orEarlier Data format for a biological pathway or network. Sequence-profile alignment format beta12orEarlier Data format for a sequence-profile alignment. Sequence-profile alignment (HMM) format beta12orEarlier beta12orEarlier true Data format for a sequence-HMM profile alignment. Amino acid index format Data format for an amino acid index. beta12orEarlier Article format beta12orEarlier Literature format Data format for a full-text scientific article. Text mining report format beta12orEarlier Data format for an abstract (report) from text mining. Enzyme kinetics report format Data format for reports on enzyme kinetics. beta12orEarlier Small molecule report format beta12orEarlier Chemical compound annotation format Format of a report on a chemical compound. Gene annotation format Format of a report on a particular locus, gene, gene system or groups of genes. beta12orEarlier Gene features format Workflow format beta12orEarlier Format of a workflow. Tertiary structure format beta12orEarlier Data format for a molecular tertiary structure. Biological model format Data format for a biological model. beta12orEarlier 1.2 true Chemical formula format beta12orEarlier Text format of a chemical formula. Phylogenetic character data format beta12orEarlier Format of raw (unplotted) phylogenetic data. Phylogenetic continuous quantitative character format Format of phylogenetic continuous quantitative character data. beta12orEarlier Phylogenetic discrete states format Format of phylogenetic discrete states data. beta12orEarlier Phylogenetic tree report (cliques) format Format of phylogenetic cliques data. beta12orEarlier Phylogenetic tree report (invariants) format beta12orEarlier Format of phylogenetic invariants data. Electron microscopy model format beta12orEarlier true beta12orEarlier Annotation format for electron microscopy models. Phylogenetic tree report (tree distances) format Format for phylogenetic tree distance data. beta12orEarlier Polymorphism report format beta12orEarlier true 1.0 Format for sequence polymorphism data. Protein family report format beta12orEarlier Format for reports on a protein family. Protein interaction format beta12orEarlier Format for molecular interaction data. Molecular interaction format Sequence assembly format beta12orEarlier Format for sequence assembly data. Microarray experiment data format Format for information about a microarray experimental per se (not the data generated from that experiment). beta12orEarlier Sequence trace format Format for sequence trace data (i.e. including base call information). beta12orEarlier Gene expression report format Gene expression data format Format of a file of gene expression data, e.g. a gene expression matrix or profile. beta12orEarlier Genotype and phenotype annotation format beta12orEarlier true Format of a report on genotype / phenotype information. beta12orEarlier Map format Format of a map of (typically one) molecular sequence annotated with features. beta12orEarlier Nucleic acid features (primers) format beta12orEarlier Format of a report on PCR primers or hybridization oligos in a nucleic acid sequence. Protein report format Format of a report of general information about a specific protein. beta12orEarlier Protein report (enzyme) format Format of a report of general information about a specific enzyme. beta12orEarlier true beta12orEarlier 3D-1D scoring matrix format beta12orEarlier Format of a matrix of 3D-1D scores (amino acid environment probabilities). Protein structure report (quality evaluation) format Format of a report on the quality of a protein three-dimensional model. beta12orEarlier Database hits (sequence) format Format of a report on sequence hits and associated data from searching a sequence database. beta12orEarlier Sequence distance matrix format beta12orEarlier Format of a matrix of genetic distances between molecular sequences. Sequence motif format Format of a sequence motif. beta12orEarlier Sequence profile format Format of a sequence profile. beta12orEarlier Hidden Markov model format beta12orEarlier Format of a hidden Markov model. Dirichlet distribution format Data format of a dirichlet distribution. beta12orEarlier HMM emission and transition counts format Data format for the emission and transition counts of a hidden Markov model. beta12orEarlier RNA secondary structure format beta12orEarlier Format for secondary structure (predicted or real) of an RNA molecule. Protein secondary structure format Format for secondary structure (predicted or real) of a protein molecule. beta12orEarlier Sequence range format beta12orEarlier Format used to specify range(s) of sequence positions. pure Alphabet for molecular sequence with possible unknown positions but without non-sequence characters. beta12orEarlier unpure Alphabet for a molecular sequence with possible unknown positions but possibly with non-sequence characters. beta12orEarlier unambiguous sequence Alphabet for a molecular sequence with possible unknown positions but without ambiguity characters. beta12orEarlier ambiguous beta12orEarlier Alphabet for a molecular sequence with possible unknown positions and possible ambiguity characters. Sequence features (repeats) format beta12orEarlier Format used for map of repeats in molecular (typically nucleotide) sequences. Nucleic acid features (restriction sites) format beta12orEarlier Format used for report on restriction enzyme recognition sites in nucleotide sequences. Gene features (coding region) format beta12orEarlier Format used for report on coding regions in nucleotide sequences. true 1.10 Sequence cluster format beta12orEarlier Format used for clusters of molecular sequences. Sequence cluster format (protein) Format used for clusters of protein sequences. beta12orEarlier Sequence cluster format (nucleic acid) Format used for clusters of nucleotide sequences. beta12orEarlier Gene cluster format true beta13 beta12orEarlier Format used for clusters of genes. EMBL-like (text) This concept may be used for the many non-standard EMBL-like text formats. beta12orEarlier A text format resembling EMBL entry format. FASTQ-like format (text) A text format resembling FASTQ short read format. This concept may be used for non-standard FASTQ short read-like formats. beta12orEarlier EMBLXML XML format for EMBL entries. beta12orEarlier cdsxml XML format for EMBL entries. beta12orEarlier insdxml beta12orEarlier XML format for EMBL entries. geneseq Geneseq sequence format. beta12orEarlier UniProt-like (text) A text sequence format resembling uniprotkb entry format. beta12orEarlier UniProt format beta12orEarlier true UniProt entry sequence format. 1.8 ipi 1.8 beta12orEarlier ipi sequence format. true medline Abstract format used by MedLine database. beta12orEarlier Ontology format Format used for ontologies. beta12orEarlier OBO format beta12orEarlier A serialisation format conforming to the Open Biomedical Ontologies (OBO) model. OWL format A serialisation format conforming to the Web Ontology Language (OWL) model. beta12orEarlier FASTA-like (text) This concept may also be used for the many non-standard FASTA-like formats. http://filext.com/file-extension/FASTA beta12orEarlier A text format resembling FASTA format. Sequence record full format 1.8 beta12orEarlier Data format for a molecular sequence record, typically corresponding to a full entry from a molecular sequence database. true Sequence record lite format true 1.8 beta12orEarlier Data format for a molecular sequence record 'lite', typically molecular sequence and minimal metadata, such as an identifier of the sequence and/or a comment. EMBL format (XML) beta12orEarlier An XML format for EMBL entries. This is a placeholder for other more specific concepts. It should not normally be used for annotation. GenBank-like format (text) A text format resembling GenBank entry (plain text) format. This concept may be used for the non-standard GenBank-like text formats. beta12orEarlier Sequence feature table format (text) Text format for a sequence feature table. beta12orEarlier Strain data format Format of a report on organism strain data / cell line. beta12orEarlier true 1.0 CIP strain data format Format for a report of strain data as used for CIP database entries. true beta12orEarlier beta12orEarlier phylip property values true PHYLIP file format for phylogenetic property data. beta12orEarlier beta12orEarlier STRING entry format (HTML) beta12orEarlier true beta12orEarlier Entry format (HTML) for the STRING database of protein interaction. STRING entry format (XML) Entry format (XML) for the STRING database of protein interaction. beta12orEarlier GFF GFF feature format (of indeterminate version). beta12orEarlier GTF Gene Transfer Format (GTF), a restricted version of GFF. beta12orEarlier FASTA-HTML FASTA format wrapped in HTML elements. beta12orEarlier EMBL-HTML EMBL entry format wrapped in HTML elements. beta12orEarlier BioCyc enzyme report format true beta12orEarlier beta12orEarlier Format of an entry from the BioCyc enzyme database. ENZYME enzyme report format Format of an entry from the Enzyme nomenclature database (ENZYME). beta12orEarlier beta12orEarlier true PseudoCAP gene report format true beta12orEarlier beta12orEarlier Format of a report on a gene from the PseudoCAP database. GeneCards gene report format Format of a report on a gene from the GeneCards database. beta12orEarlier beta12orEarlier true Textual format http://filext.com/file-extension/TSV http://www.iana.org/assignments/media-types/text/plain Textual format. Data in text format can be compressed into binary format, or can be a value of an XML element or attribute. Markup formats are not considered textual (or more precisely, not plain-textual). txt http://filext.com/file-extension/TXT Plain text http://www.iana.org/assignments/media-types/media-types.xhtml#text beta12orEarlier HTML HTML format. beta12orEarlier http://filext.com/file-extension/HTML Hypertext Markup Language XML Data in XML format can be serialised into text, or binary format. eXtensible Markup Language (XML) format. beta12orEarlier http://filext.com/file-extension/XML Extensible Markup Language Binary format Only specific native binary formats are listed under 'Binary format' in EDAM. Generic binary formats - such as any data being zipped, or any XML data being serialised into the Efficient XML Interchange (EXI) format - are not modelled in EDAM. Refer to http://wsio.org/compression_004. beta12orEarlier Binary format. URI format beta13 true Typical textual representation of a URI. beta12orEarlier NCI-Nature pathway entry format beta12orEarlier true The format of an entry from the NCI-Nature pathways database. beta12orEarlier Format (typed) This concept exists only to assist EDAM maintenance and navigation in graphical browsers. It does not add semantic information. The concept branch under 'Format (typed)' provides an alternative organisation of the concepts nested under the other top-level branches ('Binary', 'HTML', 'RDF', 'Text' and 'XML'. All concepts under here are already included under those branches. beta12orEarlier A broad class of format distinguished by the scientific nature of the data that is identified. BioXSD BioXSD XML format beta12orEarlier BioXSD XML format of basic bioinformatics types of data (sequence records, alignments, feature records, references to resources, and more). RDF format beta12orEarlier A serialisation format conforming to the Resource Description Framework (RDF) model. GenBank-HTML beta12orEarlier Genbank entry format wrapped in HTML elements. Protein features (domains) format beta12orEarlier true beta12orEarlier Format of a report on protein features (domain composition). EMBL-like format beta12orEarlier A format resembling EMBL entry (plain text) format. This concept may be used for the many non-standard EMBL-like formats. FASTQ-like format A format resembling FASTQ short read format. This concept may be used for non-standard FASTQ short read-like formats. beta12orEarlier FASTA-like This concept may be used for the many non-standard FASTA-like formats. beta12orEarlier A format resembling FASTA format. uniprotkb-like format beta12orEarlier A sequence format resembling uniprotkb entry format. Sequence feature table format Format for a sequence feature table. beta12orEarlier OBO beta12orEarlier OBO ontology text format. OBO-XML beta12orEarlier OBO ontology XML format. Sequence record format (text) Data format for a molecular sequence record. beta12orEarlier Sequence record format (XML) beta12orEarlier Data format for a molecular sequence record. Sequence feature table format (XML) XML format for a sequence feature table. beta12orEarlier Alignment format (text) Text format for molecular sequence alignment information. beta12orEarlier Alignment format (XML) XML format for molecular sequence alignment information. beta12orEarlier Phylogenetic tree format (text) beta12orEarlier Text format for a phylogenetic tree. Phylogenetic tree format (XML) beta12orEarlier XML format for a phylogenetic tree. EMBL-like (XML) An XML format resembling EMBL entry format. This concept may be used for the any non-standard EMBL-like XML formats. beta12orEarlier GenBank-like format A format resembling GenBank entry (plain text) format. beta12orEarlier This concept may be used for the non-standard GenBank-like formats. STRING entry format beta12orEarlier Entry format for the STRING database of protein interaction. beta12orEarlier true Sequence assembly format (text) beta12orEarlier Text format for sequence assembly data. Amino acid identifier format beta13 Text format (representation) of amino acid residues. true beta12orEarlier completely unambiguous beta12orEarlier Alphabet for a molecular sequence without any unknown positions or ambiguity characters. completely unambiguous pure beta12orEarlier Alphabet for a molecular sequence without unknown positions, ambiguity or non-sequence characters. completely unambiguous pure nucleotide Alphabet for a nucleotide sequence (characters ACGTU only) without unknown positions, ambiguity or non-sequence characters . beta12orEarlier completely unambiguous pure dna beta12orEarlier Alphabet for a DNA sequence (characters ACGT only) without unknown positions, ambiguity or non-sequence characters. completely unambiguous pure rna sequence Alphabet for an RNA sequence (characters ACGU only) without unknown positions, ambiguity or non-sequence characters. beta12orEarlier Raw sequence format http://www.onto-med.de/ontologies/gfo.owl#Symbol_sequence beta12orEarlier Format of a raw molecular sequence (i.e. the alphabet used). BAM beta12orEarlier BAM format, the binary, BGZF-formatted compressed version of SAM format for alignment of nucleotide sequences (e.g. sequencing reads) to (a) reference sequence(s). May contain base-call and alignment qualities and other data. SAM The format supports short and long reads (up to 128Mbp) produced by different sequencing platforms and is used to hold mapped data within the GATK and across the Broad Institute, the Sanger Centre, and throughout the 1000 Genomes project. beta12orEarlier Sequence Alignment/Map (SAM) format for alignment of nucleotide sequences (e.g. sequencing reads) to (a) reference sequence(s). May contain base-call and alignment qualities and other data. SBML Systems Biology Markup Language (SBML), the standard XML format for models of biological processes such as for example metabolism, cell signaling, and gene regulation. beta12orEarlier completely unambiguous pure protein beta12orEarlier Alphabet for any protein sequence without unknown positions, ambiguity or non-sequence characters. Bibliographic reference format Format of a bibliographic reference. beta12orEarlier Sequence annotation track format Format of a sequence annotation track. beta12orEarlier Alignment format (pair only) beta12orEarlier Data format for molecular sequence alignment information that can hold sequence alignment(s) of only 2 sequences. Sequence variation annotation format Format of sequence variation annotation. beta12orEarlier markx0 variant Some variant of Pearson MARKX alignment format. beta12orEarlier mega variant Some variant of Mega format for (typically aligned) sequences. beta12orEarlier Phylip format variant beta12orEarlier Some variant of Phylip format for (aligned) sequences. AB1 beta12orEarlier AB1 binary format of raw DNA sequence reads (output of Applied Biosystems' sequencing analysis software). Contains an electropherogram and the DNA base sequence. AB1 uses the generic binary Applied Biosystems, Inc. Format (ABIF). ACE ACE sequence assembly format including contigs, base-call qualities, and other metadata (version Aug 1998 and onwards). beta12orEarlier BED beta12orEarlier BED detail format includes 2 additional columns (http://genome.ucsc.edu/FAQ/FAQformat#format1.7) and BED 15 includes 3 additional columns for experiment scores (http://genomewiki.ucsc.edu/index.php/Microarray_track). Browser Extensible Data (BED) format of sequence annotation track, typically to be displayed in a genome browser. bigBed beta12orEarlier bigBed format for large sequence annotation tracks, similar to textual BED format. WIG Wiggle format (WIG) of a sequence annotation track that consists of a value for each sequence position. Typically to be displayed in a genome browser. beta12orEarlier bigWig beta12orEarlier bigWig format for large sequence annotation tracks that consist of a value for each sequence position. Similar to textual WIG format. PSL PSL format of alignments, typically generated by BLAT or psLayout. Can be displayed in a genome browser like a sequence annotation track. beta12orEarlier MAF Multiple Alignment Format (MAF) supporting alignments of whole genomes with rearrangements, directions, multiple pieces to the alignment, and so forth. Typically generated by Multiz and TBA aligners; can be displayed in a genome browser like a sequence annotation track. This should not be confused with MIRA Assembly Format or Mutation Annotation Format. beta12orEarlier 2bit beta12orEarlier 2bit binary format of nucleotide sequences using 2 bits per nucleotide. In addition encodes unknown nucleotides and lower-case 'masking'. .nib beta12orEarlier .nib (nibble) binary format of a nucleotide sequence using 4 bits per nucleotide (including unknown) and its lower-case 'masking'. genePred genePred table format for gene prediction tracks. genePred format has 3 main variations (http://genome.ucsc.edu/FAQ/FAQformat#format9 http://www.broadinstitute.org/software/igv/genePred). They reflect UCSC Browser DB tables. beta12orEarlier pgSnp Personal Genome SNP (pgSnp) format for sequence variation tracks (indels and polymorphisms), supported by the UCSC Genome Browser. beta12orEarlier axt beta12orEarlier axt format of alignments, typically produced from BLASTZ. LAV beta12orEarlier LAV format of alignments generated by BLASTZ and LASTZ. Pileup beta12orEarlier Pileup format of alignment of sequences (e.g. sequencing reads) to (a) reference sequence(s). Contains aligned bases per base of the reference sequence(s). VCF beta12orEarlier Variant Call Format (VCF) for sequence variation (indels, polymorphisms, structural variation). SRF Sequence Read Format (SRF) of sequence trace data. Supports submission to the NCBI Short Read Archive. beta12orEarlier ZTR ZTR format for storing chromatogram data from DNA sequencing instruments. beta12orEarlier GVF Genome Variation Format (GVF). A GFF3-compatible format with defined header and attribute tags for sequence variation. beta12orEarlier BCF beta12orEarlier BCF, the binary version of Variant Call Format (VCF) for sequence variation (indels, polymorphisms, structural variation). Matrix format Format of a matrix (array) of numerical values. beta13 Protein domain classification format Format of data concerning the classification of the sequences and/or structures of protein structural domain(s). beta13 Raw SCOP domain classification format Format of raw SCOP domain classification data files. These are the parsable data files provided by SCOP. beta13 Raw CATH domain classification format These are the parsable data files provided by CATH. beta13 Format of raw CATH domain classification data files. CATH domain report format Format of summary of domain classification information for a CATH domain. beta13 The report (for example http://www.cathdb.info/domain/1cukA01) includes CATH codes for levels in the hierarchy for the domain, level descriptions and relevant data and links. SBRML 1.0 Systems Biology Result Markup Language (SBRML), the standard XML format for simulated or calculated results (e.g. trajectories) of systems biology models. BioPAX BioPAX is an exchange format for pathway data, with its data model defined in OWL. 1.0 EBI Application Result XML EBI Application Result XML is a format returned by sequence similarity search Web services at EBI. 1.0 PSI MI XML (MIF) 1.0 XML Molecular Interaction Format (MIF), standardised by HUPO PSI MI. MIF phyloXML phyloXML is a standardised XML format for phylogenetic trees, networks, and associated data. 1.0 NeXML 1.0 NeXML is a standardised XML format for rich phyloinformatic data. MAGE-ML 1.0 MAGE-ML XML format for microarray expression data, standardised by MGED (now FGED). MAGE-TAB MAGE-TAB textual format for microarray expression data, standardised by MGED (now FGED). 1.0 GCDML GCDML XML format for genome and metagenome metadata according to MIGS/MIMS/MIMARKS information standards, standardised by the Genomic Standards Consortium (GSC). 1.0 GTrack 1.0 GTrack is an optimised tabular format for genome/sequence feature tracks unifying the power of other tabular formats (e.g. GFF3, BED, WIG). Biological pathway or network report format Data format for a report of information derived from a biological pathway or network. beta12orEarlier Experiment annotation format beta12orEarlier Data format for annotation on a laboratory experiment. Cytoband format 1.2 Cytoband format for chromosome cytobands. Reflects a UCSC Browser DB table. CopasiML 1.2 CopasiML, the native format of COPASI. CellML CellML, the format for mathematical models of biological and other networks. 1.2 PSI MI TAB (MITAB) 1.2 Tabular Molecular Interaction format (MITAB), standardised by HUPO PSI MI. PSI-PAR Protein affinity format (PSI-PAR), standardised by HUPO PSI MI. It is compatible with PSI MI XML (MIF) and uses the same XML Schema. 1.2 mzML mzML is the successor and unifier of the mzData format developed by PSI and mzXML developed at the Seattle Proteome Center. 1.2 mzML format for raw spectrometer output data, standardised by HUPO PSI MSS. Mass spectrometry data format 1.2 Format for mass spectrometry data. TraML TraML (Transition Markup Language) is the format for mass spectrometry transitions, standardised by HUPO PSI MSS. 1.2 mzIdentML mzIdentML is the exchange format for peptides and proteins identified from mass spectra, standardised by HUPO PSI PI. It can be used for outputs of proteomics search engines. 1.2 mzQuantML mzQuantML is the format for quantitation values associated with peptides, proteins and small molecules from mass spectra, standardised by HUPO PSI PI. It can be used for outputs of quantitation software for proteomics. 1.2 GelML 1.2 GelML is the format for describing the process of gel electrophoresis, standardised by HUPO PSI PS. spML 1.2 spML is the format for describing proteomics sample processing, other than using gels, prior to mass spectrometric protein identification, standardised by HUPO PSI PS. It may also be applicable for metabolomics. OWL Functional Syntax A human-readable encoding for the Web Ontology Language (OWL). 1.2 Manchester OWL Syntax A syntax for writing OWL class expressions. 1.2 This format was influenced by the OWL Abstract Syntax and the DL style syntax. KRSS2 Syntax This format is used in Protege 4. A superset of the "Description-Logic Knowledge Representation System Specification from the KRSS Group of the ARPA Knowledge Sharing Effort". 1.2 Turtle The SPARQL Query Language incorporates a very similar syntax. 1.2 The Terse RDF Triple Language (Turtle) is a human-friendly serialization format for RDF (Resource Description Framework) graphs. N-Triples N-Triples should not be confused with Notation 3 which is a superset of Turtle. 1.2 A plain text serialisation format for RDF (Resource Description Framework) graphs, and a subset of the Turtle (Terse RDF Triple Language) format. Notation3 N3 A shorthand non-XML serialization of Resource Description Framework model, designed with human-readability in mind. RDF/XML RDF Resource Description Framework (RDF) XML format. 1.2 http://www.ebi.ac.uk/SWO/data/SWO_3000006 RDF/XML is a serialization syntax for OWL DL, but not for OWL Full. OWL/XML OWL ontology XML serialisation format. 1.2 OWL A2M The A2M format is used as the primary format for multiple alignments of protein or nucleic-acid sequences in the SAM suite of tools. It is a small modification of FASTA format for sequences and is compatible with most tools that read FASTA. 1.3 SFF Standard flowgram format Standard flowgram format (SFF) is a binary file format used to encode results of pyrosequencing from the 454 Life Sciences platform for high-throughput sequencing. 1.3 MAP The MAP file describes SNPs and is used by the Plink package. 1.3 Plink MAP PED Plink PED 1.3 The PED file describes individuals and genetic data and is used by the Plink package. Individual genetic data format Data format for a metadata on an individual and their genetic data. 1.3 PED/MAP The PED/MAP file describes data used by the Plink package. Plink PED/MAP 1.3 CT File format of a CT (Connectivity Table) file from the RNAstructure package. beta12orEarlier Connect format Connectivity Table file format SS beta12orEarlier XRNA old input style format. RNAML RNA Markup Language. beta12orEarlier GDE Format for the Genetic Data Environment (GDE). beta12orEarlier BLC 1.3 Block file format A multiple alignment in vertical format, as used in the AMPS (Alignment of Multiple Protein Sequences) pacakge. Data index format 1.3 BAI 1.3 BAM indexing format HMMER2 HMMER profile HMM file for HMMER versions 2.x 1.3 HMMER3 1.3 HMMER profile HMM file for HMMER versions 3.x PO EMBOSS simple sequence pair alignment format. 1.3 BLAST XML results format XML format as produced by the NCBI Blast package 1.3 CRAM Reference-based compression of alignment format http://www.ebi.ac.uk/ena/software/cram-usage#format_specification http://samtools.github.io/hts-specs/CRAMv2.1.pdf http://www.ebi.ac.uk/ena/software/cram-usage#format_specification http://samtools.github.io/hts-specs/CRAMv2.1.pdf 1.7 JSON 1.7 Javascript Object Notation format; a lightweight, text-based format to represent structured data using key-value pairs. EPS Encapsulated PostScript format 1.7 GIF 1.7 Graphics Interchange Format. xls Microsoft Excel spreadsheet format. Microsoft Excel format 1.7 TSV Tabular format http://filext.com/file-extension/CSV http://www.iana.org/assignments/media-types/text/csv Tabular data represented as tab-separated values in a text file. 1.7 http://filext.com/file-extension/TSV CSV Gene expression data format true 1.10 1.7 Format of a file of gene expression data, e.g. a gene expression matrix or profile. Cytoscape input file format Format of the cytoscape input file of gene expression ratios or values are specified over one or more experiments. 1.7 ebwt https://github.com/BenLangmead/bowtie/blob/master/MANUAL Bowtie index format 1.7 Bowtie format for indexed reference genome for "small" genomes. RSF http://www.molbiol.ox.ac.uk/tutorials/Seqlab_GCG.pdf RSF-format files contain one or more sequences that may or may not be related. In addition to the sequence data, each sequence can be annotated with descriptive sequence information (from the GCG manual). Rich sequence format. 1.7 GCG RSF GCG format variant 1.7 Some format based on the GCG format. BSML http://rothlab.ucdavis.edu/genhelp/chapter_2_using_sequences.html#_Creating_and_Editing_Single_Sequenc Bioinformatics Sequence Markup Language format. 1.7 ebwtl 1.7 https://github.com/BenLangmead/bowtie/blob/master/MANUAL Bowtie long index format Bowtie format for indexed reference genome for "large" genomes. Ensembl variation file format Ensembl standard format for variation data. 1.8 docx 1.8 Microsoft Word format doc Microsoft Word format. Document format Portable Document Format Microsoft Word format Format of documents including word processor, spreadsheet and presentation. 1.8 doc PDF 1.8 Portable Document Format Image format Format used for images and image metadata. 1.9 DICOM format 1.9 Medical image format corresponding to the Digital Imaging and Communications in Medicine (DICOM) standard. nii Medical image and metadata format of the Neuroimaging Informatics Technology Initiative. NIfTI-1 format 1.9 mhd Metalmage format 1.9 Text-based tagged file format for medical images generated using the MetaImage software package. nrrd 1.9 Nearly Raw Rasta Data format designed to support scientific visualization and image processing involving N-dimensional raster data. R file format File format used for scripts written in the R programming language for execution within the R software environment, typically for statistical computation and graphics. 1.9 SPSS 1.9 File format used for scripts for the Statistical Package for the Social Sciences. MHT MIME HTML format for Web pages, which can include external resources, including images, Flash animations and so on. EMBL entry format wrapped in HTML elements. 1.9 MHTML IDAT Proprietary file format for (raw) BeadArray data used by genomewide profiling platforms from Illumina Inc. This format is output directly from the scanner and stores summary intensities for each probe-type on an array. 1.10 JPG 1.10 Joint Picture Group file format for lossy graphics file. Sequence of segments with markers. Begins with byte of 0xFF and follows by marker type. rcc 1.10 Reporter Code Count-A data file (.csv) output by the Nanostring nCounter Digital Analyzer, which contains gene sample information, probe information and probe counts. arff ARFF (Attribute-Relation File Format) is an ASCII text file format that describes a list of instances sharing a set of attributes. 1.11 This file format is for machine learning. afg 1.11 AFG is a single text-based file assembly format that holds read and consensus information together bedgraph Holds a tab-delimited chromosome /start /end / datavalue dataset. 1.11 The bedGraph format allows display of continuous-valued data in track format. This display type is useful for probability scores and transcriptome data bedstrict Browser Extensible Data (BED) format of sequence annotation track that strictly does not contain non-standard fields beyond the first 3 columns. Galaxy allows BED files to contain non-standard fields beyond the first 3 columns, some other implementations do not. 1.11 bed6 Tab delimited data in strict BED format - no non-standard columns allowed; column count forced to 6 BED file format where each feature is described by chromosome, start, end, name, score, and strand. 1.11 bed12 1.11 Tab delimited data in strict BED format - no non-standard columns allowed; column count forced to 12 A BED file where each feature is described by all twelve columns. chrominfo 1.11 Tabular format of chromosome names and sizes used by Galaxy. Galaxy allows BED files to contain non-standard fields beyond the first 3 columns, some other implementations do not. customtrack 1.11 Custom Sequence annotation track format used by Galaxy. Used for tracks/track views within galaxy. csfasta Color space FASTA format sequence variant. 1.3 FASTA format extended for color space information. hdf5 An HDF5 file appears to the user as a directed graph. The nodes of this graph are the higher-level HDF5 objects that are exposed by the HDF5 APIs: Groups, Datasets, Named datatypes. H5py uses straightforward NumPy and Python metaphors, like dictionary and NumPy array syntax. 1.11 h5 Binary format used by Galaxy for hierarchical data. tiff The TIFF format is perhaps the most versatile and diverse bitmap format in existence. Its extensible nature and support for numerous data compression schemes allow developers to customize the TIFF format to fit any peculiar data storage needs. A versatile bitmap format. 1.11 bmp Standard bitmap storage format in the Microsoft Windows environment. 1.11 Although it is based on Windows internal bitmap data structures, it is supported by many non-Windows and non-PC applications. im IM is a format used by LabEye and other applications based on the IFUNC image processing library. IFUNC library reads and writes most uncompressed interchange versions of this format. 1.11 pcd PCD was developed by Kodak. A PCD file contains five different resolution (ranging from low to high) of a slide or film negative. Due to it PCD is often used by many photographers and graphics professionals for high-end printed applications. 1.11 Photo CD format, which is the highest resolution format for images on a CD. pcx 1.11 PCX is an image file format that uses a simple form of run-length encoding. It is lossless. ppm The PPM format is a lowest common denominator color image file format. 1.11 psd 1.11 PSD (Photoshop Document) is a proprietary file that allows the user to work with the images’ individual layers even after the file has been saved. xbm The XBM format was replaced by XPM for X11 in 1989. 1.11 X BitMap is a plain text binary image format used by the X Window System used for storing cursor and icon bitmaps used in the X GUI. xpm 1.11 Sequence of segments with markers. Begins with byte of 0xFF and follows by marker type. X PixMap (XPM) is an image file format used by the X Window System, it is intended primarily for creating icon pixmaps, and supports transparent pixels. rgb 1.11 RGB file format is the native raster graphics file format for Silicon Graphics workstations. pbm 1.11 The PBM format is a lowest common denominator monochrome file format. It serves as the common language of a large family of bitmap image conversion filters. pgm It is designed to be extremely easy to learn and write programs for. The PGM format is a lowest common denominator grayscale file format. 1.11 png 1.11 PNG is a file format for image compression. It iis expected to replace the Graphics Interchange Format (GIF). svg The SVG specification is an open standard developed by the World Wide Web Consortium (W3C) since 1999. Scalable Vector Graphics (SVG) is an XML-based vector image format for two-dimensional graphics with support for interactivity and animation. 1.11 rast Sun Raster is a raster graphics file format used on SunOS by Sun Microsystems 1.11 The SVG specification is an open standard developed by the World Wide Web Consortium (W3C) since 1999. Sequence quality report format (text) Textual report format for sequence quality for reports from sequencing machines. 1.11 qual http://en.wikipedia.org/wiki/Phred_quality_score 1.11 Phred quality scores are defined as a property which is logarithmically related to the base-calling error probabilities. FASTQ format subset for Phred sequencing quality score data only (no sequences). qualsolexa Solexa/Illumina 1.0 format can encode a Solexa/Illumina quality score from -5 to 62 using ASCII 59 to 126 (although in raw read data Solexa scores from -5 to 40 only are expected) 1.11 FASTQ format subset for Phred sequencing quality score data only (no sequences) for Solexa/Illumina 1.0 format. qualillumina Starting in Illumina 1.5 and before Illumina 1.8, the Phred scores 0 to 2 have a slightly different meaning. The values 0 and 1 are no longer used and the value 2, encoded by ASCII 66 "B", is used also at the end of reads as a Read Segment Quality Control Indicator. FASTQ format subset for Phred sequencing quality score data only (no sequences) from Illumina 1.5 and before Illumina 1.8. 1.11 http://en.wikipedia.org/wiki/Phred_quality_score qualsolid For SOLiD data, the sequence is in color space, except the first position. The quality values are those of the Sanger format. FASTQ format subset for Phred sequencing quality score data only (no sequences) for SOLiD data. 1.11 http://en.wikipedia.org/wiki/Phred_quality_score qual454 http://en.wikipedia.org/wiki/Phred_quality_score 1.11 FASTQ format subset for Phred sequencing quality score data only (no sequences) from 454 sequencers. ENCODE peak format 1.11 Human ENCODE peak format. Format that covers both the broad peak format and narrow peak format from ENCODE. ENCODE narrow peak format 1.11 Human ENCODE narrow peak format. Format that covers both the broad peak format and narrow peak format from ENCODE. ENCODE broad peak format 1.11 Human ENCODE broad peak format. bgzip BAM files are compressed using a variant of GZIP (GNU ZIP), into a format called BGZF (Blocked GNU Zip Format). Blocked GNU Zip format. 1.11 tabix TAB-delimited genome position file index format. 1.11 Graph format Data format for graph data. 1.11 xgmml XML-based format used to store graph descriptions within Galaxy. 1.11 sif 1.11 SIF (simple interaction file) Format - a network/pathway format used for instance in cytoscape. xlsx 1.11 MS Excel spreadsheet format consisting of a set of XML documents stored in a ZIP-compressed file. SQLite https://www.sqlite.org/fileformat2.html Data format used by the SQLite database. 1.11 GeminiSQLite https://gemini.readthedocs.org/en/latest/content/quick_start.html 1.11 Data format used by the SQLite database conformant to the Gemini schema. Index format Format of a data index of some type. 1.11 snpeffdb An index of a genome database, indexed for use by the snpeff tool. 1.11 Operation http://www.onto-med.de/ontologies/gfo.owl#Perpetuant Computational tool A function that processes a set of inputs and results in a set of outputs, or associates arguments (inputs) with values (outputs). Special cases are: a) An operation that consumes no input (has no input arguments). Such operation is either a constant function, or an operation depending only on the underlying state. b) An operation that may modify the underlying state but has no output. c) The singular-case operation with no input or output, that still may modify the underlying state. Function http://purl.org/biotop/biotop.owl#Function http://www.ifomis.org/bfo/1.1/snap#Function http://en.wikipedia.org/wiki/Function_(mathematics) Computational method http://semanticscience.org/resource/SIO_000017 http://www.ebi.ac.uk/swo/SWO_0000003 Mathematical operation sumo:Function beta12orEarlier Process Computational operation Computational subroutine http://semanticscience.org/resource/SIO_000649 http://www.ifomis.org/bfo/1.1/span#Process http://www.ifomis.org/bfo/1.1/snap#Continuant http://onto.eva.mpg.de/ontologies/gfo-bio.owl#Method Computational procedure Mathematical function Lambda abstraction Function (programming) http://www.onto-med.de/ontologies/gfo.owl#Process http://www.loa-cnr.it/ontologies/DOLCE-Lite.owl#quality http://wsio.org/operation_001 http://www.loa-cnr.it/ontologies/DOLCE-Lite.owl#process http://www.ifomis.org/bfo/1.1/snap#Quality http://www.onto-med.de/ontologies/gfo.owl#Function http://en.wikipedia.org/wiki/Function_(computer_science) http://en.wikipedia.org/wiki/Subroutine Process Process can have a function (as its quality/attribute), and can also perform an operation with inputs and outputs. Computational tool provides one or more operations. Computational tool Function Operation is a function that is computational. It typically has input(s) and output(s), which are always data. Query and retrieval beta12orEarlier Query Retrieval Search or query a data resource and retrieve entries and / or annotation. Database retrieval Search Data retrieval (database cross-reference) beta12orEarlier Search database to retrieve all relevant references to a particular entity or entry. true beta13 Annotation Annotate an entity (typically a biological or biomedical database entity) with terms from a controlled vocabulary. beta12orEarlier This is a broad concept and is used a placeholder for other, more specific concepts. Indexing Data indexing beta12orEarlier Generate an index of (typically a file of) biological data. Database indexing Data index analysis Database index analysis Analyse an index of biological data. beta12orEarlier true 1.6 Annotation retrieval (sequence) true beta12orEarlier Retrieve basic information about a molecular sequence. beta12orEarlier Sequence generation beta12orEarlier Generate a molecular sequence by some means. Sequence editing Edit or change a molecular sequence, either randomly or specifically. beta12orEarlier Sequence merging beta12orEarlier Merge two or more (typically overlapping) molecular sequences. Sequence splicing Sequence conversion Convert a molecular sequence from one type to another. beta12orEarlier Sequence complexity calculation beta12orEarlier Calculate sequence complexity, for example to find low-complexity regions in sequences. Sequence ambiguity calculation Calculate sequence ambiguity, for example identity regions in protein or nucleotide sequences with many ambiguity codes. beta12orEarlier Sequence composition calculation beta12orEarlier Calculate character or word composition or frequency of a molecular sequence. Repeat sequence analysis Find and/or analyse repeat sequences in (typically nucleotide) sequences. beta12orEarlier Repeat sequences include tandem repeats, inverted or palindromic repeats, DNA microsatellites (Simple Sequence Repeats or SSRs), interspersed repeats, maximal duplications and reverse, complemented and reverse complemented repeats etc. Repeat units can be exact or imperfect, in tandem or dispersed, of specified or unspecified length. Sequence motif discovery Motifs and patterns might be conserved or over-represented (occur with improbable frequency). beta12orEarlier Discover new motifs or conserved patterns in sequences or sequence alignments (de-novo discovery). Motif discovery Sequence signature recognition beta12orEarlier Motif search Sequence motif search Protein secondary database search Motif detection Sequence motif recognition Sequence signature detection Sequence profile search Find (scan for) known motifs, patterns and regular expressions in molecular sequence(s). Sequence motif detection Motif recognition Sequence motif comparison beta12orEarlier Find motifs shared by molecular sequences. Transcription regulatory sequence analysis beta12orEarlier beta13 Analyse the sequence, conformational or physicochemical properties of transcription regulatory elements in DNA sequences. For example transcription factor binding sites (TFBS) analysis to predict accessibility of DNA to binding factors. true Conserved transcription regulatory sequence identification For example cross-species comparison of transcription factor binding sites (TFBS). Methods might analyse co-regulated or co-expressed genes, or sets of oppositely expressed genes. beta12orEarlier Identify common, conserved (homologous) or synonymous transcriptional regulatory motifs (transcription factor binding sites). Protein property calculation (from structure) This might be a residue-level search for properties such as solvent accessibility, hydropathy, secondary structure, ligand-binding etc. Extract, calculate or predict non-positional (physical or chemical) properties of a protein from processing a protein (3D) structure. beta12orEarlier Protein structural property calculation Protein flexibility and motion analysis beta12orEarlier Analyse flexibility and motion in protein structure. Use this concept for analysis of flexible and rigid residues, local chain deformability, regions undergoing conformational change, molecular vibrations or fluctuational dynamics, domain motions or other large-scale structural transitions in a protein structure. Protein structural motif recognition Identify or screen for 3D structural motifs in protein structure(s). This includes conserved substructures and conserved geometry, such as spatial arrangement of secondary structure or protein backbone. Methods might use structure alignment, structural templates, searches for similar electrostatic potential and molecular surface shape, surface-mapping of phylogenetic information etc. beta12orEarlier Protein structural feature identification Protein domain recognition beta12orEarlier Identify structural domains in a protein structure from first principles (for example calculations on structural compactness). Protein architecture analysis beta12orEarlier Analyse the architecture (spatial arrangement of secondary structure) of protein structure(s). Residue interaction calculation WHATIF: SymShellTenXML WHATIF:ListContactsRelaxed WHATIF: SymShellTwoXML WHATIF:ListSideChainContactsRelaxed beta12orEarlier WHATIF:ListSideChainContactsNormal WHATIF:ListContactsNormal Calculate or extract inter-atomic, inter-residue or residue-atom contacts, distances and interactions in protein structure(s). WHATIF: SymShellFiveXML WHATIF: SymShellOneXML Torsion angle calculation beta12orEarlier Calculate, visualise or analyse phi/psi angles of a protein structure. Protein property calculation Calculate (or predict) physical or chemical properties of a protein, including any non-positional properties of the molecular sequence, from processing a protein sequence. This includes methods to render and visualise the properties of a protein sequence. beta12orEarlier Protein property rendering Peptide immunogenicity prediction beta12orEarlier This is usually done in the development of peptide-specific antibodies or multi-epitope vaccines. Methods might use sequence data (for example motifs) and / or structural data. Predict antigenicity, allergenicity / immunogenicity, allergic cross-reactivity etc of peptides and proteins. Sequence feature detection Sequence feature prediction Predict, recognise and identify positional features in molecular sequences such as key functional sites or regions. Sequence feature recognition beta12orEarlier Motif database search SO:0000110 Data retrieval (feature table) beta13 Extract a sequence feature table from a sequence database entry. true beta12orEarlier Feature table query 1.6 beta12orEarlier true Query the features (in a feature table) of molecular sequence(s). Sequence feature comparison beta12orEarlier Compare the feature tables of two or more molecular sequences. Feature comparison Feature table comparison Data retrieval (sequence alignment) beta12orEarlier true beta13 Display basic information about a sequence alignment. Sequence alignment analysis Analyse a molecular sequence alignment. beta12orEarlier Sequence alignment comparison Compare (typically by aligning) two molecular sequence alignments. beta12orEarlier See also 'Sequence profile alignment'. Sequence alignment conversion beta12orEarlier Convert a molecular sequence alignment from one type to another (for example amino acid to coding nucleotide sequence). Nucleic acid property processing beta12orEarlier true Process (read and / or write) physicochemical property data of nucleic acids. beta13 Nucleic acid property calculation beta12orEarlier Calculate or predict physical or chemical properties of nucleic acid molecules, including any non-positional properties of the molecular sequence. Splice transcript prediction beta12orEarlier Predict splicing alternatives or transcript isoforms from analysis of sequence data. Frameshift detection Detect frameshifts in DNA sequences, including frameshift sites and signals, and frameshift errors from sequencing projects. Frameshift error detection beta12orEarlier Methods include sequence alignment (if related sequences are available) and word-based sequence comparison. Vector sequence detection beta12orEarlier Detect vector sequences in nucleotide sequence, typically by comparison to a set of known vector sequences. Protein secondary structure prediction Methods might use amino acid composition, local sequence information, multiple sequence alignments, physicochemical features, estimated energy content, statistical algorithms, hidden Markov models, support vector machines, kernel machines, neural networks etc. Predict secondary structure of protein sequences. Secondary structure prediction (protein) beta12orEarlier Protein super-secondary structure prediction beta12orEarlier Predict super-secondary structure of protein sequence(s). Super-secondary structures include leucine zippers, coiled coils, Helix-Turn-Helix etc. Transmembrane protein prediction Predict and/or classify transmembrane proteins or transmembrane (helical) domains or regions in protein sequences. beta12orEarlier Transmembrane protein analysis beta12orEarlier Analyse transmembrane protein(s), typically by processing sequence and / or structural data, and write an informative report for example about the protein and its transmembrane domains / regions. Use this (or child) concept for analysis of transmembrane domains (buried and exposed faces), transmembrane helices, helix topology, orientation, inter-helical contacts, membrane dipping (re-entrant) loops and other secondary structure etc. Methods might use pattern discovery, hidden Markov models, sequence alignment, structural profiles, amino acid property analysis, comparison to known domains or some combination (hybrid methods). Structure prediction Predict tertiary structure of a molecular (biopolymer) sequence. beta12orEarlier Residue interaction prediction Methods usually involve multiple sequence alignment analysis. Predict contacts, non-covalent interactions and distance (constraints) between amino acids in protein sequences. beta12orEarlier Protein interaction raw data analysis Analyse experimental protein-protein interaction data from for example yeast two-hybrid analysis, protein microarrays, immunoaffinity chromatography followed by mass spectrometry, phage display etc. beta12orEarlier Protein-protein interaction prediction (from protein sequence) beta12orEarlier Identify or predict protein-protein interactions, interfaces, binding sites etc in protein sequences. Protein-protein interaction prediction (from protein structure) beta12orEarlier Identify or predict protein-protein interactions, interfaces, binding sites etc in protein structures. Protein interaction network analysis beta12orEarlier Analyse a network of protein interactions. Protein interaction network comparison beta12orEarlier Compare two or more networks of protein interactions. RNA secondary structure prediction Predict RNA secondary structure (for example knots, pseudoknots, alternative structures etc). beta12orEarlier Methods might use RNA motifs, predicted intermolecular contacts, or RNA sequence-structure compatibility (inverse RNA folding). Nucleic acid folding prediction beta12orEarlier Analyse some aspect of RNA/DNA folding, typically by processing sequence and/or structural data. Nucleic acid folding modelling Nucleic acid folding Data retrieval (restriction enzyme annotation) beta13 Restriction enzyme information retrieval true Retrieve information on restriction enzymes or restriction enzyme sites. beta12orEarlier Genetic marker identification true beta12orEarlier beta13 Identify genetic markers in DNA sequences. A genetic marker is any DNA sequence of known chromosomal location that is associated with and specific to a particular gene or trait. This includes short sequences surrounding a SNP, Sequence-Tagged Sites (STS) which are well suited for PCR amplification, a longer minisatellites sequence etc. Genetic mapping beta12orEarlier QTL mapping This includes mapping of the genetic architecture of dynamic complex traits (functional mapping), e.g. by characterization of the underlying quantitative trait loci (QTLs) or nucleotides (QTNs). Linkage mapping Genetic map generation Mapping involves ordering genetic loci along a chromosome and estimating the physical distance between loci. A genetic map shows the relative (not physical) position of known genes and genetic markers. Generate a genetic (linkage) map of a DNA sequence (typically a chromosome) showing the relative positions of genetic markers based on estimation of non-physical distances. Genetic map construction Functional mapping Linkage analysis beta12orEarlier For example, estimate how close two genes are on a chromosome by calculating how often they are transmitted together to an offspring, ascertain whether two genes are linked and parental linkage, calculate linkage map distance etc. Analyse genetic linkage. Codon usage table generation Calculate codon usage statistics and create a codon usage table. beta12orEarlier Codon usage table construction Codon usage table comparison beta12orEarlier Compare two or more codon usage tables. Codon usage analysis beta12orEarlier synon: Codon usage data analysis Process (read and / or write) codon usage data, e.g. analyse codon usage tables or codon usage in molecular sequences. synon: Codon usage table analysis Base position variability plotting Identify and plot third base position variability in a nucleotide sequence. beta12orEarlier Sequence word comparison Find exact character or word matches between molecular sequences without full sequence alignment. beta12orEarlier Sequence distance matrix generation Sequence distance matrix construction Phylogenetic distance matrix generation beta12orEarlier Calculate a sequence distance matrix or otherwise estimate genetic distances between molecular sequences. Sequence redundancy removal beta12orEarlier Compare two or more molecular sequences, identify and remove redundant sequences based on some criteria. Sequence clustering The clusters may be output or used internally for some other purpose. Sequence cluster construction beta12orEarlier Build clusters of similar sequences, typically using scores from pair-wise alignment or other comparison of the sequences. Sequence cluster generation Sequence alignment Sequence alignment construction beta12orEarlier Align (identify equivalent sites within) molecular sequences. Sequence alignment generation Sequence alignment computation Hybrid sequence alignment construction Hybrid sequence alignment true beta13 beta12orEarlier Align two or more molecular sequences of different types (for example genomic DNA to EST, cDNA or mRNA). Hybrid sequence alignment generation Structure-based sequence alignment Structure-based sequence alignment Sequence alignment generation (structure-based) Structure-based sequence alignment construction beta12orEarlier Sequence alignment (structure-based) Structure-based sequence alignment generation Align molecular sequences using sequence and structural information. Structure alignment Align (superimpose) molecular tertiary structures. Structure alignment generation Structure alignment construction beta12orEarlier Multiple structure alignment construction Multiple structure alignment generation Sequence profile generation Sequence profile construction beta12orEarlier Generate some type of sequence profile (for example a hidden Markov model) from a sequence alignment. 3D profile generation Structural profile generation Generate some type of structural (3D) profile or template from a structure or structure alignment. Structural profile construction beta12orEarlier Profile-to-profile alignment Sequence profile alignment beta12orEarlier See also 'Sequence alignment comparison'. Sequence profile alignment construction Align sequence profiles (representing sequence alignments). Sequence profile alignment generation 3D profile-to-3D profile alignment beta12orEarlier 3D profile alignment (multiple) 3D profile alignment Multiple 3D profile alignment construction Structural profile alignment construction (multiple) Structural profile alignment Structural profile alignment generation Structural profile alignment construction Align structural (3D) profiles or templates (representing structures or structure alignments). Sequence-to-profile alignment Sequence-profile alignment construction Sequence-profile alignment generation beta12orEarlier Align molecular sequence(s) to sequence profile(s). Sequence-profile alignment A sequence profile typically represents a sequence alignment. Methods might perform one-to-one, one-to-many or many-to-many comparisons. Sequence-to-3D-profile alignment beta12orEarlier Sequence-3D profile alignment construction Align molecular sequence(s) to structural (3D) profile(s) or template(s) (representing a structure or structure alignment). Sequence-3D profile alignment generation Methods might perform one-to-one, one-to-many or many-to-many comparisons. Sequence-3D profile alignment Protein threading beta12orEarlier Align molecular sequence to structure in 3D space (threading). Use this concept for methods that evaluate sequence-structure compatibility by assessing residue interactions in 3D. Methods might perform one-to-one, one-to-many or many-to-many comparisons. Sequence-structure alignment Protein fold recognition beta12orEarlier Protein domain prediction Methods use some type of mapping between sequence and fold, for example secondary structure prediction and alignment, profile comparison, sequence properties, homologous sequence search, kernel machines etc. Domains and folds might be taken from SCOP or CATH. Recognize (predict and identify) known protein structural domains or folds in protein sequence(s). Protein fold prediction Metadata retrieval Data retrieval (documentation) Search for and retrieve data concerning or describing some core data, as distinct from the primary data that is being described. Data retrieval (metadata) beta12orEarlier This includes documentation, general information and other metadata on entities such as databases, database entries and tools. Literature search beta12orEarlier Query the biomedical and informatics literature. Text mining Text data mining beta12orEarlier Process and analyse text (typically the biomedical and informatics literature) to extract information from it. Virtual PCR beta12orEarlier Perform in-silico (virtual) PCR. PCR primer design PCR primer prediction Primer design involves predicting or selecting primers that are specific to a provided PCR template. Primers can be designed with certain properties such as size of product desired, primer size etc. The output might be a minimal or overlapping primer set. Design or predict oligonucleotide primers for PCR and DNA amplification etc. beta12orEarlier Microarray probe design Predict and/or optimize oligonucleotide probes for DNA microarrays, for example for transcription profiling of genes, or for genomes and gene families. beta12orEarlier Microarray probe prediction Sequence assembly beta12orEarlier For example, assemble overlapping reads from paired-end sequencers into contigs (a contiguous sequence corresponding to read overlaps). Or assemble contigs, for example ESTs and genomic DNA fragments, depending on the detected fragment overlaps. Combine (align and merge) overlapping fragments of a DNA sequence to reconstruct the original sequence. Microarray data standardization and normalization beta12orEarlier Standardize or normalize microarray data. This includes statistical analysis, for example of variability amongst microarrays experiments, comparison of heterogeneous microarray platforms etc. Sequencing-based expression profile data processing Process (read and / or write) SAGE, MPSS or SBS experimental data. true beta12orEarlier beta12orEarlier Gene expression profile clustering beta12orEarlier Perform cluster analysis of gene expression (microarray) data, for example clustering of similar gene expression profiles. Gene expression profiling Expression profiling Gene expression profile construction Functional profiling Generate a gene expression profile or pattern, for example from microarray data. beta12orEarlier Gene expression profile generation Gene expression profile comparison beta12orEarlier Compare gene expression profiles or patterns. Functional profiling true beta12orEarlier Interpret (in functional terms) and annotate gene expression data. beta12orEarlier EST and cDNA sequence analysis Analyse EST or cDNA sequences. For example, identify full-length cDNAs from EST sequences or detect potential EST antisense transcripts. beta12orEarlier beta12orEarlier true Structural genomics target selection beta12orEarlier Identify and select targets for protein structural determination. beta12orEarlier Methods will typically navigate a graph of protein families of known structure. true Protein secondary structure assignment beta12orEarlier Assign secondary structure from protein coordinate or experimental data. Protein structure assignment beta12orEarlier Assign a protein tertiary structure (3D coordinates) from raw experimental data. Protein model validation Evaluate the quality or correctness a protein three-dimensional model. Model validation might involve checks for atomic packing, steric clashes (bumps), volume irregularities, agreement with electron density maps, number of amino acid residues, percentage of residues with missing or bad atoms, irregular Ramachandran Z-scores, irregular Chi-1 / Chi-2 normality scores, RMS-Z score on bonds and angles etc. WHATIF: CorrectedPDBasXML Protein structure validation WHATIF: UseFileDB The PDB file format has had difficulties, inconsistencies and errors. Corrections can include identifying a meaningful sequence, removal of alternate atoms, correction of nomenclature problems, removal of incomplete residues and spurious waters, addition or removal of water, modelling of missing side chains, optimisation of cysteine bonds, regularisation of bond lengths, bond angles and planarities etc. beta12orEarlier Molecular model refinement Protein model refinement WHATIF: CorrectedPDBasXML beta12orEarlier Refine (after evaluation) a model of a molecular structure (typically a protein structure) to reduce steric clashes, volume irregularities etc. Phylogenetic tree generation Phylogenetic trees are usually constructed from a set of sequences from which an alignment (or data matrix) is calculated. Phylogenetic tree construction Construct a phylogenetic tree. beta12orEarlier Phylogenetic tree analysis beta12orEarlier Analyse an existing phylogenetic tree or trees, typically to detect features or make predictions. Phylogenetic tree comparison beta12orEarlier Compare two or more phylogenetic trees. For example, to produce a consensus tree, subtrees, supertrees, calculate distances between trees or test topological similarity between trees (e.g. a congruence index) etc. Phylogenetic tree editing Edit a phylogenetic tree. beta12orEarlier Phylogenetic footprinting / shadowing A phylogenetic 'shadow' represents the additive differences between individual sequences. By masking or 'shadowing' variable positions a conserved sequence is produced with few or none of the variations, which is then compared to the sequences of interest to identify significant regions of conservation. beta12orEarlier Infer a phylogenetic tree by comparing orthologous sequences in different species, particularly many closely related species (phylogenetic shadowing). Protein folding simulation beta12orEarlier Simulate the folding of a protein. Protein folding pathway prediction Predict the folding pathway(s) or non-native structural intermediates of a protein. beta12orEarlier Protein SNP mapping beta12orEarlier Map and model the effects of single nucleotide polymorphisms (SNPs) on protein structure(s). Protein modelling (mutation) Methods might predict silent or pathological mutations. Protein mutation modelling Predict the effect of point mutation on a protein structure, in terms of strucural effects and protein folding, stability and function. beta12orEarlier Immunogen design true Design molecules that elicit an immune response (immunogens). beta12orEarlier beta12orEarlier Zinc finger prediction Predict and optimise zinc finger protein domains for DNA/RNA binding (for example for transcription factors and nucleases). beta12orEarlier Enzyme kinetics calculation beta12orEarlier Calculate Km, Vmax and derived data for an enzyme reaction. Formatting beta12orEarlier Reformat a file of data (or equivalent entity in memory). Format conversion File formatting Reformatting File reformatting File format conversion Format validation Test and validate the format and content of a data file. File format validation beta12orEarlier Visualisation beta12orEarlier Visualise, plot or render (graphically) biomolecular data such as molecular sequences or structures. Rendering Sequence database search Search a sequence database by sequence comparison and retrieve similar sequences. sequences matching a given sequence motif or pattern, such as a Prosite pattern or regular expression. beta12orEarlier This excludes direct retrieval methods (e.g. the dbfetch program). Structure database search beta12orEarlier Search a tertiary structure database, typically by sequence and/or structure comparison, or some other means, and retrieve structures and associated data. Protein secondary database search 1.8 beta12orEarlier true Search a secondary protein database (of classification information) to assign a protein sequence(s) to a known protein family or group. Motif database search beta12orEarlier Screen a sequence against a motif or pattern database. true 1.8 Sequence profile database search true beta12orEarlier Search a database of sequence profiles with a query sequence. 1.4 Transmembrane protein database search true beta12orEarlier Search a database of transmembrane proteins, for example for sequence or structural similarities. beta12orEarlier Sequence retrieval (by code) Query a database and retrieve sequences with a given entry code or accession number. true 1.6 beta12orEarlier Sequence retrieval (by keyword) true Query a database and retrieve sequences containing a given keyword. beta12orEarlier 1.6 Sequence similarity search Structure database search (by sequence) Sequence database search (by sequence) beta12orEarlier Search a sequence database and retrieve sequences that are similar to a query sequence. Sequence database search (by motif or pattern) 1.8 Search a sequence database and retrieve sequences matching a given sequence motif or pattern, such as a Prosite pattern or regular expression. beta12orEarlier true Sequence database search (by amino acid composition) true Search a sequence database and retrieve sequences of a given amino acid composition. 1.6 beta12orEarlier Sequence database search (by property) Search a sequence database and retrieve sequences with a specified property, typically a physicochemical or compositional property. beta12orEarlier Sequence database search (by sequence using word-based methods) beta12orEarlier Word-based methods (for example BLAST, gapped BLAST, MEGABLAST, WU-BLAST etc.) are usually quicker than alignment-based methods. They may or may not handle gaps. 1.6 true Sequence similarity search (word-based methods) Search a sequence database and retrieve sequences that are similar to a query sequence using a word-based method. Sequence database search (by sequence using profile-based methods) true Sequence similarity search (profile-based methods) Search a sequence database and retrieve sequences that are similar to a query sequence using a sequence profile-based method, or with a supplied profile as query. beta12orEarlier This includes tools based on PSI-BLAST. 1.6 Sequence database search (by sequence using local alignment-based methods) Search a sequence database for sequences that are similar to a query sequence using a local alignment-based method. 1.6 beta12orEarlier true Sequence similarity search (local alignment-based methods) This includes tools based on the Smith-Waterman algorithm or FASTA. Sequence database search (by sequence using global alignment-based methods) This includes tools based on the Needleman and Wunsch algorithm. Search sequence(s) or a sequence database for sequences that are similar to a query sequence using a global alignment-based method. 1.6 Sequence similarity search (global alignment-based methods) beta12orEarlier true Sequence database search (by sequence for primer sequences) true beta12orEarlier Search a DNA database (for example a database of conserved sequence tags) for matches to Sequence-Tagged Site (STS) primer sequences. 1.6 STSs are genetic markers that are easily detected by the polymerase chain reaction (PCR) using specific primers. Sequence similarity search (primer sequences) Sequence database search (by molecular weight) Search sequence(s) or a sequence database for sequences which match a set of peptide masses, for example a peptide mass fingerprint from mass spectrometry. 1.6 Protein fingerprinting true beta12orEarlier Peptide mass fingerprinting Sequence database search (by isoelectric point) 1.6 beta12orEarlier Search sequence(s) or a sequence database for sequences of a given isoelectric point. true Structure retrieval (by code) Query a tertiary structure database and retrieve entries with a given entry code or accession number. 1.6 beta12orEarlier true Structure retrieval (by keyword) true 1.6 Query a tertiary structure database and retrieve entries containing a given keyword. beta12orEarlier Structure database search (by sequence) beta12orEarlier true Search a tertiary structure database and retrieve structures with a sequence similar to a query sequence. 1.8 Structural similarity search beta12orEarlier Search a database of molecular structure and retrieve structures that are similar to a query structure. Structure database search (by structure) Structure retrieval by structure Sequence annotation beta12orEarlier Annotate a molecular sequence record with terms from a controlled vocabulary. Genome annotation beta12orEarlier Annotate a genome sequence with terms from a controlled vocabulary. Nucleic acid sequence reverse and complement beta12orEarlier Generate the reverse and / or complement of a nucleotide sequence. Random sequence generation Generate a random sequence, for example, with a specific character composition. beta12orEarlier Nucleic acid restriction digest beta12orEarlier Generate digest fragments for a nucleotide sequence containing restriction sites. Protein sequence cleavage beta12orEarlier Cleave a protein sequence into peptide fragments (by enzymatic or chemical cleavage) and calculate the fragment masses. Sequence mutation and randomization beta12orEarlier Mutate a molecular sequence a specified amount or shuffle it to produce a randomized sequence with the same overall composition. Sequence masking Mask characters in a molecular sequence (replacing those characters with a mask character). For example, SNPs or repeats in a DNA sequence might be masked. beta12orEarlier Sequence cutting Cut (remove) characters or a region from a molecular sequence. beta12orEarlier Restriction site creation Create (or remove) restriction sites in sequences, for example using silent mutations. beta12orEarlier DNA translation beta12orEarlier Translate a DNA sequence into protein. DNA transcription beta12orEarlier Transcribe a nucleotide sequence into mRNA sequence(s). Sequence composition calculation (nucleic acid) true Calculate base frequency or word composition of a nucleotide sequence. 1.8 beta12orEarlier Sequence composition calculation (protein) 1.8 Calculate amino acid frequency or word composition of a protein sequence. beta12orEarlier true Repeat sequence detection beta12orEarlier Find (and possibly render) short repetitive subsequences (repeat sequences) in (typically nucleotide) sequences. Repeat sequence organisation analysis beta12orEarlier Analyse repeat sequence organization such as periodicity. Protein hydropathy calculation (from structure) Analyse the hydrophobic, hydrophilic or charge properties of a protein structure. beta12orEarlier Protein solvent accessibility calculation beta12orEarlier Calculate solvent accessible or buried surface areas in protein structures. Protein hydropathy cluster calculation beta12orEarlier Identify clusters of hydrophobic or charged residues in a protein structure. Protein dipole moment calculation beta12orEarlier Calculate whether a protein structure has an unusually large net charge (dipole moment). Protein surface and interior calculation beta12orEarlier Identify the protein surface and interior, surface accessible pockets, interior inaccessible cavities etc. Protein binding site prediction (from structure) Identify or predict catalytic residues, active sites or other ligand-binding sites in protein structures. beta12orEarlier Ligand-binding and active site prediction (from structure) Binding site prediction (from structure) Protein-nucleic acid binding site analysis Analyse RNA or DNA-binding sites in protein structure. beta12orEarlier Protein peeling beta12orEarlier Decompose a structure into compact or globular fragments (protein peeling). Protein distance matrix calculation beta12orEarlier Calculate a matrix of distance between residues (for example the C-alpha atoms) in a protein structure. Protein contact map calculation beta12orEarlier Calculate a residue contact map (typically all-versus-all inter-residue contacts) for a protein structure. Protein residue cluster calculation Cluster of contacting residues might be key structural residues. Calculate clusters of contacting residues in protein structures. beta12orEarlier Hydrogen bond calculation WHATIF:ShowHydrogenBonds WHATIF:HasHydrogenBonds The output might include the atoms involved in the bond, bond geometric parameters and bond enthalpy. beta12orEarlier WHATIF:ShowHydrogenBondsM Identify potential hydrogen bonds between amino acids and other groups. Residue non-canonical interaction detection beta12orEarlier Calculate non-canonical atomic interactions in protein structures. Ramachandran plot calculation Calculate a Ramachandran plot of a protein structure. beta12orEarlier Ramachandran plot validation beta12orEarlier Validate a Ramachandran plot of a protein structure. Protein molecular weight calculation Calculate the molecular weight of a protein sequence or fragments. beta12orEarlier Protein extinction coefficient calculation beta12orEarlier Predict extinction coefficients or optical density of a protein sequence. Protein pH-dependent property calculation Calculate pH-dependent properties from pKa calculations of a protein sequence. beta12orEarlier Protein hydropathy calculation (from sequence) Hydropathy calculation on a protein sequence. beta12orEarlier Protein titration curve plotting beta12orEarlier Plot a protein titration curve. Protein isoelectric point calculation beta12orEarlier Calculate isoelectric point of a protein sequence. Protein hydrogen exchange rate calculation Estimate hydrogen exchange rate of a protein sequence. beta12orEarlier Protein hydrophobic region calculation Calculate hydrophobic or hydrophilic / charged regions of a protein sequence. beta12orEarlier Protein aliphatic index calculation beta12orEarlier Calculate aliphatic index (relative volume occupied by aliphatic side chains) of a protein. Protein hydrophobic moment plotting beta12orEarlier Hydrophobic moment is a peptides hydrophobicity measured for different angles of rotation. Calculate the hydrophobic moment of a peptide sequence and recognize amphiphilicity. Protein globularity prediction Predict the stability or globularity of a protein sequence, whether it is intrinsically unfolded etc. beta12orEarlier Protein solubility prediction Predict the solubility or atomic solvation energy of a protein sequence. beta12orEarlier Protein crystallizability prediction beta12orEarlier Predict crystallizability of a protein sequence. Protein signal peptide detection (eukaryotes) beta12orEarlier Detect or predict signal peptides (and typically predict subcellular localization) of eukaryotic proteins. Protein signal peptide detection (bacteria) Detect or predict signal peptides (and typically predict subcellular localization) of bacterial proteins. beta12orEarlier MHC peptide immunogenicity prediction Predict MHC class I or class II binding peptides, promiscuous binding peptides, immunogenicity etc. beta12orEarlier Protein feature prediction (from sequence) Methods typically involve scanning for known motifs, patterns and regular expressions. beta12orEarlier true Sequence feature detection (protein) 1.6 Predict, recognise and identify positional features in protein sequences such as functional sites or regions and secondary structure. Nucleic acid feature detection Sequence feature detection (nucleic acid) Predict, recognise and identify features in nucleotide sequences such as functional sites or regions, typically by scanning for known motifs, patterns and regular expressions. Methods typically involve scanning for known motifs, patterns and regular expressions. beta12orEarlier Nucleic acid feature recognition Nucleic acid feature prediction Epitope mapping beta12orEarlier Predict antigenic determinant sites (epitopes) in protein sequences. Epitope mapping is commonly done during vaccine design. Protein post-translation modification site prediction Predict post-translation modification sites in protein sequences. beta12orEarlier Methods might predict sites of methylation, N-terminal myristoylation, N-terminal acetylation, sumoylation, palmitoylation, phosphorylation, sulfation, glycosylation, glycosylphosphatidylinositol (GPI) modification sites (GPI lipid anchor signals) etc. Protein signal peptide detection beta12orEarlier Methods might use sequence motifs and features, amino acid composition, profiles, machine-learned classifiers, etc. Detect or predict signal peptides and signal peptide cleavage sites in protein sequences. Protein binding site prediction (from sequence) Binding site prediction (from sequence) Predict catalytic residues, active sites or other ligand-binding sites in protein sequences. Ligand-binding and active site prediction (from sequence) Protein binding site detection beta12orEarlier Protein-nucleic acid binding prediction beta12orEarlier Predict RNA and DNA-binding binding sites in protein sequences. Protein folding site prediction Predict protein sites that are key to protein folding, such as possible sites of nucleation or stabilization. beta12orEarlier Protein cleavage site prediction beta12orEarlier Detect or predict cleavage sites (enzymatic or chemical) in protein sequences. Epitope mapping (MHC Class I) 1.8 true beta12orEarlier Predict epitopes that bind to MHC class I molecules. Epitope mapping (MHC Class II) Predict epitopes that bind to MHC class II molecules. 1.8 true beta12orEarlier Whole gene prediction beta12orEarlier Detect, predict and identify whole gene structure in DNA sequences. This includes protein coding regions, exon-intron structure, regulatory regions etc. Gene component prediction Methods for gene prediction might be ab initio, based on phylogenetic comparisons, use motifs, sequence features, support vector machine, alignment etc. beta12orEarlier Detect, predict and identify genetic elements such as promoters, coding regions, splice sites, etc in DNA sequences. Transposon prediction beta12orEarlier Detect or predict transposons, retrotransposons / retrotransposition signatures etc. PolyA signal detection Detect polyA signals in nucleotide sequences. beta12orEarlier Quadruplex formation site detection beta12orEarlier Quadruplex structure prediction Detect quadruplex-forming motifs in nucleotide sequences. Quadruplex (4-stranded) structures are formed by guanine-rich regions and are implicated in various important biological processes and as therapeutic targets. CpG island and isochore detection An isochore is long region (> 3 KB) of DNA with very uniform GC content, in contrast to the rest of the genome. Isochores tend tends to have more genes, higher local melting or denaturation temperatures, and different flexibility. Methods might calculate fractional GC content or variation of GC content, predict methylation status of CpG islands etc. This includes methods that visualise CpG rich regions in a nucleotide sequence, for example plot isochores in a genome sequence. beta12orEarlier Find CpG rich regions in a nucleotide sequence or isochores in genome sequences. CpG island and isochores rendering CpG island and isochores detection Restriction site recognition beta12orEarlier Find and identify restriction enzyme cleavage sites (restriction sites) in (typically) DNA sequences, for example to generate a restriction map. Nucleosome formation or exclusion sequence prediction beta12orEarlier Identify or predict nucleosome exclusion sequences (nucleosome free regions) in DNA. Splice site prediction beta12orEarlier Identify, predict or analyse splice sites in nucleotide sequences. Methods might require a pre-mRNA or genomic DNA sequence. Integrated gene prediction Predict whole gene structure using a combination of multiple methods to achieve better predictions. beta12orEarlier Operon prediction Find operons (operators, promoters and genes) in bacteria genes. beta12orEarlier Coding region prediction Predict protein-coding regions (CDS or exon) or open reading frames in nucleotide sequences. ORF prediction ORF finding beta12orEarlier Selenocysteine insertion sequence (SECIS) prediction Predict selenocysteine insertion sequence (SECIS) in a DNA sequence. SECIS elements are around 60 nucleotides in length with a stem-loop structure directs the cell to translate UGA codons as selenocysteines. beta12orEarlier Regulatory element prediction Identify or predict transcription regulatory motifs, patterns, elements or regions in DNA sequences. Translational regulatory element prediction Transcription regulatory element prediction This includes promoters, enhancers, silencers and boundary elements / insulators, regulatory protein or transcription factor binding sites etc. Methods might be specific to a particular genome and use motifs, word-based / grammatical methods, position-specific frequency matrices, discriminative pattern analysis etc. beta12orEarlier Translation initiation site prediction Predict translation initiation sites, possibly by searching a database of sites. beta12orEarlier Promoter prediction Identify or predict whole promoters or promoter elements (transcription start sites, RNA polymerase binding site, transcription factor binding sites, promoter enhancers etc) in DNA sequences. Methods might recognize CG content, CpG islands, splice sites, polyA signals etc. beta12orEarlier Transcription regulatory element prediction (DNA-cis) beta12orEarlier Cis-regulatory elements (cis-elements) regulate the expression of genes located on the same strand. Cis-elements are found in the 5' promoter region of the gene, in an intron, or in the 3' untranslated region. Cis-elements are often binding sites of one or more trans-acting factors. Identify, predict or analyse cis-regulatory elements (TATA box, Pribnow box, SOS box, CAAT box, CCAAT box, operator etc.) in DNA sequences. Transcription regulatory element prediction (RNA-cis) Cis-regulatory elements (cis-elements) regulate genes located on the same strand from which the element was transcribed. A riboswitch is a region of an mRNA molecule that bind a small target molecule that regulates the gene's activity. Identify, predict or analyse cis-regulatory elements (for example riboswitches) in RNA sequences. beta12orEarlier Transcription regulatory element prediction (trans) beta12orEarlier Trans-regulatory elements regulate genes distant from the gene from which they were transcribed. Identify or predict functional RNA sequences with a gene regulatory role (trans-regulatory elements) or targets. Functional RNA identification Matrix/scaffold attachment site prediction MAR/SAR sites often flank a gene or gene cluster and are found nearby cis-regulatory sequences. They might contribute to transcription regulation. Identify matrix/scaffold attachment regions (MARs/SARs) in DNA sequences. beta12orEarlier Transcription factor binding site prediction beta12orEarlier Identify or predict transcription factor binding sites in DNA sequences. Exonic splicing enhancer prediction An exonic splicing enhancer (ESE) is 6-base DNA sequence motif in an exon that enhances or directs splicing of pre-mRNA or hetero-nuclear RNA (hnRNA) into mRNA. Identify or predict exonic splicing enhancers (ESE) in exons. beta12orEarlier Sequence alignment validation Evaluation might be purely sequence-based or use structural information. Sequence alignment quality evaluation Evaluate molecular sequence alignment accuracy. beta12orEarlier Sequence alignment analysis (conservation) beta12orEarlier Analyse character conservation in a molecular sequence alignment, for example to derive a consensus sequence. Residue conservation analysis Use this concept for methods that calculate substitution rates, estimate relative site variability, identify sites with biased properties, derive a consensus sequence, or identify highly conserved or very poorly conserved sites, regions, blocks etc. Sequence alignment analysis (site correlation) Analyse correlations between sites in a molecular sequence alignment. This is typically done to identify possible covarying positions and predict contacts or structural constraints in protein structures. beta12orEarlier Chimeric sequence detection beta12orEarlier A chimera includes regions from two or more phylogenetically distinct sequences. They are usually artifacts of PCR and are thought to occur when a prematurely terminated amplicon reanneals to another DNA strand and is subsequently copied to completion in later PCR cycles. Detects chimeric sequences (chimeras) from a sequence alignment. Sequence alignment analysis (chimeric sequence detection) Recombination detection Sequence alignment analysis (recombination detection) beta12orEarlier Detect recombination (hotspots and coldspots) and identify recombination breakpoints in a sequence alignment. Tools might use a genetic algorithm, quartet-mapping, bootscanning, graphical methods, random forest model and so on. Indel detection beta12orEarlier Sequence alignment analysis (indel detection) Tools might use a genetic algorithm, quartet-mapping, bootscanning, graphical methods, random forest model and so on. Identify insertion, deletion and duplication events from a sequence alignment. Nucleosome formation potential prediction true beta12orEarlier Predict nucleosome formation potential of DNA sequences. beta12orEarlier Nucleic acid thermodynamic property calculation Calculate a thermodynamic property of DNA or DNA/RNA, such as melting temperature, enthalpy and entropy. beta12orEarlier Nucleic acid melting profile plotting Calculate and plot a DNA or DNA/RNA melting profile. A melting profile is used to visualise and analyse partly melted DNA conformations. beta12orEarlier Nucleic acid stitch profile plotting A stitch profile represents the alternative conformations that partly melted DNA can adopt in a temperature range. beta12orEarlier Calculate and plot a DNA or DNA/RNA stitch profile. Nucleic acid melting curve plotting Calculate and plot a DNA or DNA/RNA melting curve. beta12orEarlier Nucleic acid probability profile plotting beta12orEarlier Calculate and plot a DNA or DNA/RNA probability profile. Nucleic acid temperature profile plotting Calculate and plot a DNA or DNA/RNA temperature profile. beta12orEarlier Nucleic acid curvature calculation Calculate curvature and flexibility / stiffness of a nucleotide sequence. beta12orEarlier This includes properties such as. microRNA detection Identify or predict microRNA sequences (miRNA) and precursors or microRNA targets / binding sites in a DNA sequence. beta12orEarlier tRNA gene prediction Identify or predict tRNA genes in genomic sequences (tRNA). beta12orEarlier siRNA binding specificity prediction beta12orEarlier Assess binding specificity of putative siRNA sequence(s), for example for a functional assay, typically with respect to designing specific siRNA sequences. Protein secondary structure prediction (integrated) Predict secondary structure of protein sequence(s) using multiple methods to achieve better predictions. beta12orEarlier Protein secondary structure prediction (helices) beta12orEarlier Predict helical secondary structure of protein sequences. Protein secondary structure prediction (turns) Predict turn structure (for example beta hairpin turns) of protein sequences. beta12orEarlier Protein secondary structure prediction (coils) beta12orEarlier Predict open coils, non-regular secondary structure and intrinsically disordered / unstructured regions of protein sequences. Protein secondary structure prediction (disulfide bonds) beta12orEarlier Predict cysteine bonding state and disulfide bond partners in protein sequences. GPCR prediction beta12orEarlier G protein-coupled receptor (GPCR) prediction Predict G protein-coupled receptors (GPCR). GPCR analysis Analyse G-protein coupled receptor proteins (GPCRs). beta12orEarlier G protein-coupled receptor (GPCR) analysis Protein structure prediction beta12orEarlier Predict tertiary structure (backbone and side-chain conformation) of protein sequences. Nucleic acid structure prediction beta12orEarlier Methods might identify thermodynamically stable or evolutionarily conserved structures. Predict tertiary structure of DNA or RNA. Ab initio structure prediction Predict tertiary structure of protein sequence(s) without homologs of known structure. de novo structure prediction beta12orEarlier Protein modelling Comparative modelling beta12orEarlier Build a three-dimensional protein model based on known (for example homologs) structures. The model might be of a whole, part or aspect of protein structure. Molecular modelling methods might use sequence-structure alignment, structural templates, molecular dynamics, energy minimization etc. Homology modelling Homology structure modelling Protein structure comparative modelling Molecular docking Model the structure of a protein in complex with a small molecule or another macromolecule. beta12orEarlier This includes protein-protein interactions, protein-nucleic acid, protein-ligand binding etc. Methods might predict whether the molecules are likely to bind in vivo, their conformation when bound, the strength of the interaction, possible mutations to achieve bonding and so on. Docking simulation Protein docking Protein modelling (backbone) Model protein backbone conformation. Methods might require a preliminary C(alpha) trace. beta12orEarlier Protein modelling (side chains) beta12orEarlier Methods might use a residue rotamer library. Model, analyse or edit amino acid side chain conformation in protein structure, optimize side-chain packing, hydrogen bonding etc. Protein modelling (loops) beta12orEarlier Model loop conformation in protein structures. Protein-ligand docking beta12orEarlier Methods aim to predict the position and orientation of a ligand bound to a protein receptor or enzyme. Ligand-binding simulation Model protein-ligand (for example protein-peptide) binding using comparative modelling or other techniques. Virtual ligand screening Structured RNA prediction and optimisation Nucleic acid folding family identification RNA inverse folding beta12orEarlier Predict or optimise RNA sequences (sequence pools) with likely secondary and tertiary structure for in vitro selection. SNP detection Find single nucleotide polymorphisms (SNPs) between sequences. Single nucleotide polymorphism detection beta12orEarlier This includes functional SNPs for large-scale genotyping purposes, disease-associated non-synonymous SNPs etc. Radiation Hybrid Mapping Generate a physical (radiation hybrid) map of genetic markers in a DNA sequence using provided radiation hybrid (RH) scores for one or more markers. beta12orEarlier Functional mapping beta12orEarlier true This can involve characterization of the underlying quantitative trait loci (QTLs) or nucleotides (QTNs). Map the genetic architecture of dynamic complex traits. beta12orEarlier Haplotype mapping Haplotype map generation Haplotype inference Infer haplotypes, either alleles at multiple loci that are transmitted together on the same chromosome, or a set of single nucleotide polymorphisms (SNPs) on a single chromatid that are statistically associated. beta12orEarlier Haplotype inference can help in population genetic studies and the identification of complex disease genes, , and is typically based on aligned single nucleotide polymorphism (SNP) fragments. Haplotype comparison is a useful way to characterize the genetic variation between individuals. An individual's haplotype describes which nucleotide base occurs at each position for a set of common SNPs. Tools might use combinatorial functions (for example parsimony) or a likelihood function or model with optimization such as minimum error correction (MEC) model, expectation-maximization algorithm (EM), genetic algorithm or Markov chain Monte Carlo (MCMC). Haplotype reconstruction Linkage disequilibrium calculation beta12orEarlier Linkage disequilibrium is identified where a combination of alleles (or genetic markers) occurs more or less frequently in a population than expected by chance formation of haplotypes. Calculate linkage disequilibrium; the non-random association of alleles or polymorphisms at two or more loci (not necessarily on the same chromosome). Genetic code prediction beta12orEarlier Predict genetic code from analysis of codon usage data. Dotplot plotting beta12orEarlier Draw a dotplot of sequence similarities identified from word-matching or character comparison. Pairwise sequence alignment Pairwise sequence alignment generation Pairwise sequence alignment Methods might perform one-to-one, one-to-many or many-to-many comparisons. Align exactly two molecular sequences. Pairwise sequence alignment construction beta12orEarlier Multiple sequence alignment Multiple sequence alignment construction Align two or more molecular sequences. This includes methods that use an existing alignment, for example to incorporate sequences into an alignment, or combine several multiple alignments into a single, improved alignment. Multiple sequence alignment beta12orEarlier Multiple sequence alignment generation Pairwise sequence alignment generation (local) beta12orEarlier Local pairwise sequence alignment construction Locally align exactly two molecular sequences. Pairwise sequence alignment (local) true Local alignment methods identify regions of local similarity. 1.6 Pairwise sequence alignment construction (local) Pairwise sequence alignment generation (global) Pairwise sequence alignment construction (global) Global pairwise sequence alignment construction 1.6 true Globally align exactly two molecular sequences. beta12orEarlier Global alignment methods identify similarity across the entire length of the sequences. Pairwise sequence alignment (global) Local sequence alignment Multiple sequence alignment (local) Local multiple sequence alignment construction beta12orEarlier Local alignment methods identify regions of local similarity. Multiple sequence alignment construction (local) Sequence alignment generation (local) Sequence alignment (local) Locally align two or more molecular sequences. Global sequence alignment Global multiple sequence alignment construction Multiple sequence alignment (global) beta12orEarlier Sequence alignment (global) Multiple sequence alignment construction (global) Globally align two or more molecular sequences. Sequence alignment generation (global) Global alignment methods identify similarity across the entire length of the sequences. Constrained sequence alignment beta12orEarlier Align two or more molecular sequences with user-defined constraints. Multiple sequence alignment construction (constrained) Sequence alignment generation (constrained) Multiple sequence alignment (constrained) Sequence alignment (constrained) Constrained multiple sequence alignment construction Consensus-based sequence alignment Consensus multiple sequence alignment construction Sequence alignment (consensus) beta12orEarlier Align two or more molecular sequences using multiple methods to achieve higher quality. Sequence alignment generation (consensus) Multiple sequence alignment construction (consensus) Multiple sequence alignment (consensus) Tree-based sequence alignment Sequence alignment generation (phylogenetic tree-based) This is supposed to give a more biologically meaningful alignment than standard alignments. beta12orEarlier Phylogenetic tree-based multiple sequence alignment construction Align multiple sequences using relative gap costs calculated from neighbors in a supplied phylogenetic tree. Sequence alignment (phylogenetic tree-based) Multiple sequence alignment construction (phylogenetic tree-based) Multiple sequence alignment (phylogenetic tree-based) Secondary structure alignment generation beta12orEarlier 1.6 Secondary structure alignment construction Secondary structure alignment true Align molecular secondary structure (represented as a 1D string). Protein secondary structure alignment generation Protein secondary structure alignment construction Align protein secondary structures. beta12orEarlier Secondary structure alignment (protein) Protein secondary structure alignment RNA secondary structure alignment RNA secondary structure alignment generation RNA secondary structure alignment Align RNA secondary structures. RNA secondary structure alignment construction Secondary structure alignment (RNA) beta12orEarlier Pairwise structure alignment beta12orEarlier Pairwise structure alignment generation Pairwise structure alignment construction Align (superimpose) exactly two molecular tertiary structures. Multiple structure alignment construction Align (superimpose) two or more molecular tertiary structures. This includes methods that use an existing alignment. 1.6 true Multiple structure alignment beta12orEarlier Structure alignment (protein) beta13 true beta12orEarlier Align protein tertiary structures. Structure alignment (RNA) beta13 true Align RNA tertiary structures. beta12orEarlier Pairwise structure alignment generation (local) Locally align (superimpose) exactly two molecular tertiary structures. Pairwise structure alignment (local) Local alignment methods identify regions of local similarity, common substructures etc. Pairwise structure alignment construction (local) 1.6 true Local pairwise structure alignment construction beta12orEarlier Pairwise structure alignment generation (global) Global pairwise structure alignment construction Global alignment methods identify similarity across the entire structures. true beta12orEarlier 1.6 Pairwise structure alignment construction (global) Globally align (superimpose) exactly two molecular tertiary structures. Pairwise structure alignment (global) Local structure alignment Local multiple structure alignment construction Local alignment methods identify regions of local similarity, common substructures etc. Structure alignment construction (local) beta12orEarlier Locally align (superimpose) two or more molecular tertiary structures. Multiple structure alignment construction (local) Multiple structure alignment (local) Structure alignment generation (local) Global structure alignment Structure alignment construction (global) Multiple structure alignment (global) Structure alignment generation (global) Multiple structure alignment construction (global) beta12orEarlier Global alignment methods identify similarity across the entire structures. Global multiple structure alignment construction Globally align (superimpose) two or more molecular tertiary structures. Profile-to-profile alignment (pairwise) Sequence alignment generation (pairwise profile) Methods might perform one-to-one, one-to-many or many-to-many comparisons. Pairwise sequence profile alignment construction Sequence profile alignment construction (pairwise) Sequence profile alignment (pairwise) beta12orEarlier Align exactly two molecular profiles. Sequence profile alignment generation (pairwise) Sequence alignment generation (multiple profile) Align two or more molecular profiles. 1.6 true Sequence profile alignment generation (multiple) beta12orEarlier Sequence profile alignment (multiple) Sequence profile alignment construction (multiple) Multiple sequence profile alignment construction 3D profile-to-3D profile alignment (pairwise) Methods might perform one-to-one, one-to-many or many-to-many comparisons. Pairwise structural (3D) profile alignment construction Structural (3D) profile alignment (pairwise) Structural profile alignment construction (pairwise) Align exactly two molecular Structural (3D) profiles. beta12orEarlier Structural profile alignment generation (pairwise) Structural profile alignment generation (multiple) true Structural profile alignment construction (multiple) Align two or more molecular 3D profiles. Multiple structural (3D) profile alignment construction beta12orEarlier Structural (3D) profile alignment (multiple) 1.6 Data retrieval (tool metadata) Data retrieval (tool annotation) 1.6 Search and retrieve names of or documentation on bioinformatics tools, for example by keyword or which perform a particular function. beta12orEarlier true Tool information retrieval Data retrieval (database metadata) beta12orEarlier true Data retrieval (database annotation) Search and retrieve names of or documentation on bioinformatics databases or query terms, for example by keyword. Database information retrieval 1.6 PCR primer design (for large scale sequencing) Predict primers for large scale sequencing. beta12orEarlier PCR primer design (for genotyping polymorphisms) beta12orEarlier Predict primers for genotyping polymorphisms, for example single nucleotide polymorphisms (SNPs). PCR primer design (for gene transcription profiling) Predict primers for gene transcription profiling. beta12orEarlier PCR primer design (for conserved primers) Predict primers that are conserved across multiple genomes or species. beta12orEarlier PCR primer design (based on gene structure) Predict primers based on gene structure, promoters, exon-exon junctions etc. beta12orEarlier PCR primer design (for methylation PCRs) beta12orEarlier Predict primers for methylation PCRs. Sequence assembly (mapping assembly) Sequence assembly by combining fragments using an existing backbone sequence, typically a reference genome. beta12orEarlier The final sequence will resemble the backbone sequence. Mapping assemblers are usually much faster and less memory intensive than de-novo assemblers. Sequence assembly (de-novo assembly) Sequence assembly by combining fragments without the aid of a reference sequence or genome. De-novo assemblers are much slower and more memory intensive than mapping assemblers. beta12orEarlier Sequence assembly (genome assembly) Sequence assembly capable on a very large scale such as assembly of whole genomes. beta12orEarlier Sequence assembly (EST assembly) beta12orEarlier Sequence assembly for EST sequences (transcribed mRNA). Assemblers must handle (or be complicated by) alternative splicing, trans-splicing, single-nucleotide polymorphism (SNP), recoding, and post-transcriptional modification. Tag mapping Tag mapping might assign experimentally obtained tags to known transcripts or annotate potential virtual tags in a genome. Tag to gene assignment Make gene to tag assignments (tag mapping) of SAGE, MPSS and SBS data, by annotating tags with ontology concepts. beta12orEarlier SAGE data processing beta12orEarlier Serial analysis of gene expression data processing beta12orEarlier Process (read and / or write) serial analysis of gene expression (SAGE) data. true MPSS data processing beta12orEarlier Process (read and / or write) massively parallel signature sequencing (MPSS) data. true Massively parallel signature sequencing data processing beta12orEarlier SBS data processing beta12orEarlier Sequencing by synthesis data processing beta12orEarlier Process (read and / or write) sequencing by synthesis (SBS) data. true Heat map generation beta12orEarlier The heat map usually uses a coloring scheme to represent clusters. They can show how expression of mRNA by a set of genes was influenced by experimental conditions. Heat map construction Generate a heat map of gene expression from microarray data. Gene expression profile analysis true Functional profiling beta12orEarlier Analyse one or more gene expression profiles, typically to interpret them in functional terms. 1.6 Gene expression profile pathway mapping beta12orEarlier Map a gene expression profile to known biological pathways, for example, to identify or reconstruct a pathway. Protein secondary structure assignment (from coordinate data) beta12orEarlier Assign secondary structure from protein coordinate data. Protein secondary structure assignment (from CD data) Assign secondary structure from circular dichroism (CD) spectroscopic data. beta12orEarlier Protein structure assignment (from X-ray crystallographic data) true 1.7 Assign a protein tertiary structure (3D coordinates) from raw X-ray crystallography data. beta12orEarlier Protein structure assignment (from NMR data) beta12orEarlier Assign a protein tertiary structure (3D coordinates) from raw NMR spectroscopy data. true 1.7 Phylogenetic tree generation (data centric) Phylogenetic tree construction (data centric) beta12orEarlier Construct a phylogenetic tree from a specific type of data. Phylogenetic tree generation (method centric) Phylogenetic tree construction (method centric) Construct a phylogenetic tree using a specific method. beta12orEarlier Phylogenetic tree generation (from molecular sequences) Phylogenetic tree construction from molecular sequences. beta12orEarlier Phylogenetic tree construction (from molecular sequences) Methods typically compare multiple molecular sequence and estimate evolutionary distances and relationships to infer gene families or make functional predictions. Phylogenetic tree generation (from continuous quantitative characters) Phylogenetic tree construction (from continuous quantitative characters) beta12orEarlier Phylogenetic tree construction from continuous quantitative character data. Phylogenetic tree generation (from gene frequencies) Phylogenetic tree construction (from gene frequencies) Phylogenetic tree construction from gene frequency data. beta12orEarlier Phylogenetic tree construction (from polymorphism data) Phylogenetic tree construction from polymorphism data including microsatellites, RFLP (restriction fragment length polymorphisms), RAPD (random-amplified polymorphic DNA) and AFLP (amplified fragment length polymorphisms) data. Phylogenetic tree generation (from polymorphism data) beta12orEarlier Phylogenetic species tree construction Construct a phylogenetic species tree, for example, from a genome-wide sequence comparison. Phylogenetic species tree generation beta12orEarlier Phylogenetic tree generation (parsimony methods) Phylogenetic tree construction (parsimony methods) Construct a phylogenetic tree by computing a sequence alignment and searching for the tree with the fewest number of character-state changes from the alignment. This includes evolutionary parsimony (invariants) methods. beta12orEarlier Phylogenetic tree generation (minimum distance methods) This includes neighbor joining (NJ) clustering method. beta12orEarlier Phylogenetic tree construction (minimum distance methods) Construct a phylogenetic tree by computing (or using precomputed) distances between sequences and searching for the tree with minimal discrepancies between pairwise distances. Phylogenetic tree generation (maximum likelihood and Bayesian methods) Phylogenetic tree construction (maximum likelihood and Bayesian methods) Construct a phylogenetic tree by relating sequence data to a hypothetical tree topology using a model of sequence evolution. Maximum likelihood methods search for a tree that maximizes a likelihood function, i.e. that is most likely given the data and model. Bayesian analysis estimate the probability of tree for branch lengths and topology, typically using a Monte Carlo algorithm. beta12orEarlier Phylogenetic tree generation (quartet methods) beta12orEarlier Phylogenetic tree construction (quartet methods) Construct a phylogenetic tree by computing four-taxon trees (4-trees) and searching for the phylogeny that matches most closely. Phylogenetic tree generation (AI methods) Construct a phylogenetic tree by using artificial-intelligence methods, for example genetic algorithms. Phylogenetic tree construction (AI methods) beta12orEarlier DNA substitution modelling Sequence alignment analysis (phylogenetic modelling) beta12orEarlier Identify a plausible model of DNA substitution that explains a DNA sequence alignment. Phylogenetic tree analysis (shape) Phylogenetic tree topology analysis Analyse the shape (topology) of a phylogenetic tree. beta12orEarlier Phylogenetic tree bootstrapping Apply bootstrapping or other measures to estimate confidence of a phylogenetic tree. beta12orEarlier Phylogenetic tree analysis (gene family prediction) Predict families of genes and gene function based on their position in a phylogenetic tree. beta12orEarlier Phylogenetic tree analysis (natural selection) beta12orEarlier Stabilizing/purifying (directional) selection favors a single phenotype and tends to decrease genetic diversity as a population stabilizes on a particular trait, selecting out trait extremes or deleterious mutations. In contrast, balancing selection maintain genetic polymorphisms (or multiple alleles), whereas disruptive (or diversifying) selection favors individuals at both extremes of a trait. Analyse a phylogenetic tree to identify allele frequency distribution and change that is subject to evolutionary pressures (natural selection, genetic drift, mutation and gene flow). Identify type of natural selection (such as stabilizing, balancing or disruptive). Phylogenetic tree generation (consensus) Compare two or more phylogenetic trees to produce a consensus tree. Methods typically test for topological similarity between trees using for example a congruence index. beta12orEarlier Phylogenetic tree construction (consensus) Phylogenetic sub/super tree detection beta12orEarlier Compare two or more phylogenetic trees to detect subtrees or supertrees. Phylogenetic tree distances calculation beta12orEarlier Compare two or more phylogenetic trees to calculate distances between trees. Phylogenetic tree annotation beta12orEarlier http://www.evolutionaryontology.org/cdao.owl#CDAOAnnotation Annotate a phylogenetic tree with terms from a controlled vocabulary. Immunogenicity prediction beta12orEarlier Peptide immunogen prediction Predict and optimise peptide ligands that elicit an immunological response. DNA vaccine design beta12orEarlier Predict or optimise DNA to elicit (via DNA vaccination) an immunological response. Sequence formatting beta12orEarlier Reformat (a file or other report of) molecular sequence(s). Sequence file format conversion Sequence alignment formatting Reformat (a file or other report of) molecular sequence alignment(s). beta12orEarlier Codon usage table formatting Reformat a codon usage table. beta12orEarlier Sequence visualisation beta12orEarlier Visualise, format or render a molecular sequence, possibly with sequence features or properties shown. Sequence rendering Sequence alignment visualisation Sequence alignment rendering Visualise, format or print a molecular sequence alignment. beta12orEarlier Sequence cluster visualisation Sequence cluster rendering beta12orEarlier Visualise, format or render sequence clusters. Phylogenetic tree visualisation Render or visualise a phylogenetic tree. Phylogenetic tree rendering beta12orEarlier RNA secondary structure visualisation RNA secondary structure rendering Visualise RNA secondary structure, knots, pseudoknots etc. beta12orEarlier Protein secondary structure rendering Protein secondary structure visualisation Render and visualise protein secondary structure. beta12orEarlier Structure visualisation Structure rendering Visualise or render a molecular tertiary structure, for example a high-quality static picture or animation. beta12orEarlier Microarray data rendering Visualise microarray data. beta12orEarlier Protein interaction network rendering Protein interaction network visualisation beta12orEarlier Identify and analyse networks of protein interactions. Map drawing beta12orEarlier DNA map drawing Map rendering Draw or visualise a DNA map. Sequence motif rendering Render a sequence with motifs. true beta12orEarlier beta12orEarlier Restriction map drawing Draw or visualise restriction maps in DNA sequences. beta12orEarlier DNA linear map rendering beta12orEarlier beta12orEarlier true Draw a linear maps of DNA. Plasmid map drawing beta12orEarlier DNA circular map rendering Draw a circular maps of DNA, for example a plasmid map. Operon drawing Visualise operon structure etc. beta12orEarlier Operon rendering Nucleic acid folding family identification true beta12orEarlier Identify folding families of related RNAs. beta12orEarlier Nucleic acid folding energy calculation beta12orEarlier Compute energies of nucleic acid folding, e.g. minimum folding energies for DNA or RNA sequences or energy landscape of RNA mutants. Annotation retrieval beta12orEarlier Use this concepts for tools which retrieve pre-existing annotations, not for example prediction methods that might make annotations. Retrieve existing annotation (or documentation), typically annotation on a database entity. beta12orEarlier true Protein function prediction beta12orEarlier Predict general functional properties of a protein. For functional properties that can be mapped to a sequence, use 'Sequence feature detection (protein)' instead. Protein function comparison Compare the functional properties of two or more proteins. beta12orEarlier Sequence submission Submit a molecular sequence to a database. beta12orEarlier 1.6 true Gene regulatory network analysis beta12orEarlier Analyse a known network of gene regulation. Loading Data loading WHATIF:UploadPDB Prepare or load a user-specified data file so that it is available for use. beta12orEarlier Sequence retrieval This includes direct retrieval methods (e.g. the dbfetch program) but not those that perform calculations on the sequence. Data retrieval (sequences) 1.6 Query a sequence data resource (typically a database) and retrieve sequences and / or annotation. beta12orEarlier true Structure retrieval true WHATIF:EchoPDB beta12orEarlier WHATIF:DownloadPDB This includes direct retrieval methods but not those that perform calculations on the sequence or structure. Query a tertiary structure data resource (typically a database) and retrieve structures, structure-related data and annotation. 1.6 Surface rendering beta12orEarlier WHATIF:GetSurfaceDots Calculate the positions of dots that are homogeneously distributed over the surface of a molecule. A dot has three coordinates (x,y,z) and (typically) a color. Protein atom surface calculation (accessible) beta12orEarlier WHATIF:AtomAccessibilitySolventPlus WHATIF:AtomAccessibilitySolvent Calculate the solvent accessibility ('accessible surface') for each atom in a structure. Waters are not considered. Protein atom surface calculation (accessible molecular) beta12orEarlier Calculate the solvent accessibility ('accessible molecular surface') for each atom in a structure. Waters are not considered. WHATIF:AtomAccessibilityMolecular WHATIF:AtomAccessibilityMolecularPlus Protein residue surface calculation (accessible) WHATIF:ResidueAccessibilitySolvent beta12orEarlier Solvent accessibility might be calculated for the backbone, sidechain and total (backbone plus sidechain). Calculate the solvent accessibility ('accessible surface') for each residue in a structure. Protein residue surface calculation (vacuum accessible) Solvent accessibility might be calculated for the backbone, sidechain and total (backbone plus sidechain). Calculate the solvent accessibility ('vacuum accessible surface') for each residue in a structure. This is the accessibility of the residue when taken out of the protein together with the backbone atoms of any residue it is covalently bound to. WHATIF:ResidueAccessibilityVacuum beta12orEarlier Protein residue surface calculation (accessible molecular) Calculate the solvent accessibility ('accessible molecular surface') for each residue in a structure. WHATIF:ResidueAccessibilityMolecular Solvent accessibility might be calculated for the backbone, sidechain and total (backbone plus sidechain). beta12orEarlier Protein residue surface calculation (vacuum molecular) Solvent accessibility might be calculated for the backbone, sidechain and total (backbone plus sidechain). beta12orEarlier Calculate the solvent accessibility ('vacuum molecular surface') for each residue in a structure. This is the accessibility of the residue when taken out of the protein together with the backbone atoms of any residue it is covalently bound to. WHATIF:ResidueAccessibilityVacuumMolecular Protein surface calculation (accessible molecular) WHATIF:TotAccessibilityMolecular beta12orEarlier Calculate the solvent accessibility ('accessible molecular surface') for a structure as a whole. Protein surface calculation (accessible) WHATIF:TotAccessibilitySolvent Calculate the solvent accessibility ('accessible surface') for a structure as a whole. beta12orEarlier Backbone torsion angle calculation beta12orEarlier WHATIF:ResidueTorsionsBB Calculate for each residue in a protein structure all its backbone torsion angles. Full torsion angle calculation beta12orEarlier Calculate for each residue in a protein structure all its torsion angles. WHATIF:ResidueTorsions Cysteine torsion angle calculation beta12orEarlier Calculate for each cysteine (bridge) all its torsion angles. WHATIF:CysteineTorsions Tau angle calculation WHATIF:ShowTauAngle beta12orEarlier Tau is the backbone angle N-Calpha-C (angle over the C-alpha). For each amino acid in a protein structure calculate the backbone angle tau. Cysteine bridge detection WHATIF:ShowCysteineBridge Detect cysteine bridges (from coordinate data) in a protein structure. beta12orEarlier Free cysteine detection beta12orEarlier A free cysteine is neither involved in a cysteine bridge, nor functions as a ligand to a metal. Detect free cysteines in a protein structure. WHATIF:ShowCysteineFree Metal-bound cysteine detection beta12orEarlier WHATIF:ShowCysteineMetal Detect cysteines that are bound to metal in a protein structure. Residue contact calculation (residue-nucleic acid) beta12orEarlier WHATIF:ShowProteiNucleicContacts Calculate protein residue contacts with nucleic acids in a structure. WHATIF:HasNucleicContacts Residue contact calculation (residue-metal) WHATIF:HasMetalContacts beta12orEarlier Calculate protein residue contacts with metal in a structure. WHATIF:HasMetalContactsPlus Residue contact calculation (residue-negative ion) Calculate ion contacts in a structure (all ions for all side chain atoms). WHATIF:HasNegativeIonContactsPlus beta12orEarlier WHATIF:HasNegativeIonContacts Residue bump detection WHATIF:ShowBumps beta12orEarlier Detect 'bumps' between residues in a structure, i.e. those with pairs of atoms whose Van der Waals' radii interpenetrate more than a defined distance. Residue symmetry contact calculation Calculate the number of symmetry contacts made by residues in a protein structure. WHATIF:SymmetryContact A symmetry contact is a contact between two atoms in different asymmetric unit. beta12orEarlier Residue contact calculation (residue-ligand) beta12orEarlier Calculate contacts between residues and ligands in a protein structure. WHATIF:ShowDrugContactsShort WHATIF:ShowLigandContacts WHATIF:ShowDrugContacts Salt bridge calculation Salt bridges are interactions between oppositely charged atoms in different residues. The output might include the inter-atomic distance. WHATIF:HasSaltBridgePlus WHATIF:ShowSaltBridges beta12orEarlier WHATIF:HasSaltBridge WHATIF:ShowSaltBridgesH Calculate (and possibly score) salt bridges in a protein structure. Rotamer likelihood prediction WHATIF:ShowLikelyRotamers WHATIF:ShowLikelyRotamers500 Predict rotamer likelihoods for all 20 amino acid types at each position in a protein structure. WHATIF:ShowLikelyRotamers800 WHATIF:ShowLikelyRotamers600 WHATIF:ShowLikelyRotamers900 Output typically includes, for each residue position, the likelihoods for the 20 amino acid types with estimated reliability of the 20 likelihoods. WHATIF:ShowLikelyRotamers700 WHATIF:ShowLikelyRotamers400 WHATIF:ShowLikelyRotamers300 WHATIF:ShowLikelyRotamers200 WHATIF:ShowLikelyRotamers100 beta12orEarlier Proline mutation value calculation Calculate for each position in a protein structure the chance that a proline, when introduced at this position, would increase the stability of the whole protein. WHATIF:ProlineMutationValue beta12orEarlier Residue packing validation beta12orEarlier Identify poorly packed residues in protein structures. WHATIF: PackingQuality Dihedral angle validation WHATIF: ImproperQualitySum Identify for each residue in a protein structure any improper dihedral (phi/psi) angles. beta12orEarlier WHATIF: ImproperQualityMax PDB file sequence retrieval Extract a molecular sequence from a PDB file. beta12orEarlier WHATIF: PDB_sequence true beta12orEarlier HET group detection Identify HET groups in PDB files. WHATIF: HETGroupNames beta12orEarlier A HET group usually corresponds to ligands, lipids, but might also (not consistently) include groups that are attached to amino acids. Each HET group is supposed to have a unique three letter code and a unique name which might be given in the output. DSSP secondary structure assignment Determine for residue the DSSP determined secondary structure in three-state (HSC). beta12orEarlier WHATIF: ResidueDSSP beta12orEarlier true Structure formatting Reformat (a file or other report of) tertiary structure data. beta12orEarlier WHATIF: PDBasXML Protein cysteine and disulfide bond assignment Assign cysteine bonding state and disulfide bond partners in protein structures. beta12orEarlier Residue validation Identify poor quality amino acid positions in protein structures. beta12orEarlier WHATIF: UseResidueDB The scoring function to identify poor quality residues may consider residues with bad atoms or atoms with high B-factor, residues in the N- or C-terminal position, adjacent to an unstructured residue, non-canonical residues, glycine and proline (or adjacent to these such residues). Structure retrieval (water) beta12orEarlier 1.6 WHATIF:MovedWaterPDB true Query a tertiary structure database and retrieve water molecules. siRNA duplex prediction beta12orEarlier Identify or predict siRNA duplexes in RNA. Sequence alignment refinement Refine an existing sequence alignment. beta12orEarlier Listfile processing 1.6 Process an EMBOSS listfile (list of EMBOSS Uniform Sequence Addresses). true beta12orEarlier Sequence file editing beta12orEarlier Perform basic (non-analytical) operations on a report or file of sequences (which might include features), such as file concatenation, removal or ordering of sequences, creation of subset or a new file of sequences. Sequence alignment file processing beta12orEarlier Perform basic (non-analytical) operations on a sequence alignment file, such as copying or removal and ordering of sequences. 1.6 true Small molecule data processing beta13 true beta12orEarlier Process (read and / or write) physicochemical property data for small molecules. Data retrieval (ontology annotation) beta13 Ontology information retrieval true Search and retrieve documentation on a bioinformatics ontology. beta12orEarlier Data retrieval (ontology concept) Query an ontology and retrieve concepts or relations. true beta13 beta12orEarlier Ontology retrieval Representative sequence identification Identify a representative sequence from a set of sequences, typically using scores from pair-wise alignment or other comparison of the sequences. beta12orEarlier Structure file processing Perform basic (non-analytical) operations on a file of molecular tertiary structural data. 1.6 beta12orEarlier true Data retrieval (sequence profile) Query a profile data resource and retrieve one or more profile(s) and / or associated annotation. true This includes direct retrieval methods that retrieve a profile by, e.g. the profile name. beta13 beta12orEarlier Statistical calculation Statistical analysis Perform a statistical data operation of some type, e.g. calibration or validation. beta12orEarlier true 3D-1D scoring matrix generation beta12orEarlier 3D-1D scoring matrix construction A 3D-1D scoring matrix scores the probability of amino acids occurring in different structural environments. Calculate a 3D-1D scoring matrix from analysis of protein sequence and structural data. Transmembrane protein visualisation Visualise transmembrane proteins, typically the transmembrane regions within a sequence. beta12orEarlier Transmembrane protein rendering Demonstration beta12orEarlier true An operation performing purely illustrative (pedagogical) purposes. beta13 Data retrieval (pathway or network) beta12orEarlier true Query a biological pathways database and retrieve annotation on one or more pathways. beta13 Data retrieval (identifier) beta12orEarlier Query a database and retrieve one or more data identifiers. beta13 true Nucleic acid density plotting beta12orEarlier Calculate a density plot (of base composition) for a nucleotide sequence. Sequence analysis Analyse one or more known molecular sequences. beta12orEarlier Sequence analysis (general) Sequence motif processing true 1.6 Process (read and / or write) molecular sequence motifs. beta12orEarlier Protein interaction data processing 1.6 Process (read and / or write) protein interaction data. true beta12orEarlier Protein structure analysis Structure analysis (protein) beta12orEarlier Analyse protein tertiary structural data. Annotation processing true beta12orEarlier beta12orEarlier Process (read and / or write) annotation of some type, typically annotation on an entry from a biological or biomedical database entity. Sequence feature analysis beta12orEarlier true Analyse features in molecular sequences. beta12orEarlier Utility operation Basic (non-analytical) operations of some data, either a file or equivalent entity in memory. File processing beta12orEarlier Report handling File handling Data file processing Gene expression analysis Analyse gene expression and regulation data. beta12orEarlier true beta12orEarlier Structural profile processing beta12orEarlier 1.6 Process (read and / or write) one or more structural (3D) profile(s) or template(s) of some type. 3D profile processing true Data index processing Database index processing true Process (read and / or write) an index of (typically a file of) biological data. 1.6 beta12orEarlier Sequence profile processing true beta12orEarlier Process (read and / or write) some type of sequence profile. 1.6 Protein function analysis This is a broad concept and is used a placeholder for other, more specific concepts. beta12orEarlier Analyse protein function, typically by processing protein sequence and/or structural data, and generate an informative report. Protein folding analysis This is a broad concept and is used a placeholder for other, more specific concepts. Analyse protein folding, typically by processing sequence and / or structural data, and write an informative report. Protein folding modelling beta12orEarlier Protein secondary structure analysis Analyse known protein secondary structure data. beta12orEarlier Secondary structure analysis (protein) Physicochemical property data processing beta13 true Process (read and / or write) data on the physicochemical property of a molecule. beta12orEarlier Primer and probe design Primer and probe prediction beta12orEarlier Predict oligonucleotide primers or probes. Operation (typed) Computation Calculation Processing Process (read and / or write) data of a specific type, for example applying analytical methods. beta12orEarlier Database search beta12orEarlier Typically the query is compared to each entry and high scoring matches (hits) are returned. For example, a BLAST search of a sequence database. Search a database (or other data resource) with a supplied query and retrieve entries (or parts of entries) that are similar to the query. Data retrieval Information retrieval beta12orEarlier Retrieve an entry (or part of an entry) from a data resource that matches a supplied query. This might include some primary data and annotation. The query is a data identifier or other indexed term. For example, retrieve a sequence record with the specified accession number, or matching supplied keywords. Prediction and recognition beta12orEarlier Recognition Prediction Predict, recognise, detect or identify some properties of a biomolecule. Detection Comparison beta12orEarlier Compare two or more things to identify similarities. Optimisation and refinement beta12orEarlier Refine or optimise some data model. Modelling and simulation beta12orEarlier Model or simulate some biological entity or system. Data handling true beta12orEarlier Perform basic operations on some data or a database. beta12orEarlier Validation beta12orEarlier Validation and standardisation Validate some data. Mapping This is a broad concept and is used a placeholder for other, more specific concepts. Map properties to positions on an biological entity (typically a molecular sequence or structure), or assemble such an entity from constituent parts. beta12orEarlier Design beta12orEarlier Design a biological entity (typically a molecular sequence or structure) with specific properties. true Microarray data processing beta12orEarlier Process (read and / or write) microarray data. beta12orEarlier true Codon usage table processing Process (read and / or write) a codon usage table. beta12orEarlier Data retrieval (codon usage table) Retrieve a codon usage table and / or associated annotation. beta12orEarlier true beta13 Gene expression profile processing 1.6 Process (read and / or write) a gene expression profile. true beta12orEarlier Functional enrichment beta12orEarlier Gene expression profile annotation The Gene Ontology (GO) is invariably used, the input is a set of Gene IDs and the output of the analysis is typically a ranked list of GO terms, each associated with a p-value. Analyse a set of genes (genes corresponding to an expression profile, or any other set) with respect to concepts from an ontology of gene functions. GO term enrichment Gene regulatory network prediction Predict a network of gene regulation. beta12orEarlier Pathway or network processing Generate, analyse or handle a biological pathway or network. beta12orEarlier RNA secondary structure analysis beta12orEarlier Process (read and / or write) RNA secondary structure data. Structure processing (RNA) Process (read and / or write) RNA tertiary structure data. beta12orEarlier beta13 true RNA structure prediction beta12orEarlier Predict RNA tertiary structure. DNA structure prediction Predict DNA tertiary structure. beta12orEarlier Phylogenetic tree processing beta12orEarlier Process (read and / or write) a phylogenetic tree. Protein secondary structure processing Process (read and / or write) protein secondary structure data. 1.6 true beta12orEarlier Protein interaction network processing true beta12orEarlier Process (read and / or write) a network of protein interactions. 1.6 Sequence processing Sequence processing (general) Process (read and / or write) one or more molecular sequences and associated annotation. true beta12orEarlier 1.6 Sequence processing (protein) Process (read and / or write) a protein sequence and associated annotation. beta12orEarlier true 1.6 Sequence processing (nucleic acid) 1.6 true beta12orEarlier Process (read and / or write) a nucleotide sequence and associated annotation. Sequence comparison Compare two or more molecular sequences. beta12orEarlier Sequence cluster processing Process (read and / or write) a sequence cluster. true beta12orEarlier 1.6 Feature table processing Process (read and / or write) a sequence feature table. 1.6 true beta12orEarlier Gene prediction Gene and gene component prediction beta12orEarlier Detect, predict and identify genes or components of genes in DNA sequences. Gene finding GPCR classification beta12orEarlier G protein-coupled receptor (GPCR) classification Classify G-protein coupled receptors (GPCRs) into families and subfamilies. GPCR coupling selectivity prediction Predict G-protein coupled receptor (GPCR) coupling selectivity. beta12orEarlier Structure processing (protein) true 1.6 beta12orEarlier Process (read and / or write) a protein tertiary structure. Protein atom surface calculation Waters are not considered. Calculate the solvent accessibility for each atom in a structure. beta12orEarlier Protein residue surface calculation beta12orEarlier Calculate the solvent accessibility for each residue in a structure. Protein surface calculation beta12orEarlier Calculate the solvent accessibility of a structure as a whole. Sequence alignment processing beta12orEarlier true Process (read and / or write) a molecular sequence alignment. 1.6 Protein-protein interaction prediction Identify or predict protein-protein interactions, interfaces, binding sites etc. beta12orEarlier Structure processing true 1.6 Process (read and / or write) a molecular tertiary structure. beta12orEarlier Map annotation Annotate a DNA map of some type with terms from a controlled vocabulary. true beta12orEarlier 1.6 Data retrieval (protein annotation) Retrieve information on a protein. beta13 true Protein information retrieval beta12orEarlier Data retrieval (phylogenetic tree) beta12orEarlier beta13 Retrieve a phylogenetic tree from a data resource. true Data retrieval (protein interaction annotation) Retrieve information on a protein interaction. true beta13 beta12orEarlier Data retrieval (protein family annotation) beta12orEarlier Protein family information retrieval beta13 Retrieve information on a protein family. true Data retrieval (RNA family annotation) true Retrieve information on an RNA family. RNA family information retrieval beta12orEarlier beta13 Data retrieval (gene annotation) beta12orEarlier Gene information retrieval Retrieve information on a specific gene. true beta13 Data retrieval (genotype and phenotype annotation) Retrieve information on a specific genotype or phenotype. Genotype and phenotype information retrieval beta12orEarlier beta13 true Protein architecture comparison Compare the architecture of two or more protein structures. beta12orEarlier Protein architecture recognition beta12orEarlier Includes methods that try to suggest the most likely biological unit for a given protein X-ray crystal structure based on crystal symmetry and scoring of putative protein-protein interfaces. Identify the architecture of a protein structure. Molecular dynamics simulation Simulate molecular (typically protein) conformation using a computational model of physical forces and computer simulation. beta12orEarlier Nucleic acid sequence analysis Analyse a nucleic acid sequence (using methods that are only applicable to nucleic acid sequences). beta12orEarlier Sequence analysis (nucleic acid) Protein sequence analysis Analyse a protein sequence (using methods that are only applicable to protein sequences). Sequence analysis (protein) beta12orEarlier Structure analysis beta12orEarlier Analyse known molecular tertiary structures. Nucleic acid structure analysis Analyse nucleic acid tertiary structural data. beta12orEarlier Secondary structure processing 1.6 Process (read and / or write) a molecular secondary structure. true beta12orEarlier Structure comparison beta12orEarlier Compare two or more molecular tertiary structures. Helical wheel drawing Helical wheel rendering beta12orEarlier Render a helical wheel representation of protein secondary structure. Topology diagram drawing Topology diagram rendering beta12orEarlier Render a topology diagram of protein secondary structure. Protein structure comparison beta12orEarlier Structure comparison (protein) Methods might identify structural neighbors, find structural similarities or define a structural core. Compare protein tertiary structures. Protein secondary structure comparison Compare protein secondary structures. beta12orEarlier Secondary structure comparison (protein) Protein secondary structure Protein subcellular localization prediction The prediction might include subcellular localization (nuclear, cytoplasmic, mitochondrial, chloroplast, plastid, membrane etc) or export (extracellular proteins) of a protein. Predict the subcellular localization of a protein sequence. Protein targeting prediction beta12orEarlier Residue contact calculation (residue-residue) beta12orEarlier Calculate contacts between residues in a protein structure. Hydrogen bond calculation (inter-residue) Identify potential hydrogen bonds between amino acid residues. beta12orEarlier Protein interaction prediction Predict the interactions of proteins with other molecules. beta12orEarlier Codon usage data processing beta12orEarlier beta13 Process (read and / or write) codon usage data. true Gene expression data analysis beta12orEarlier Gene expression profile analysis Gene expression (microarray) data processing Microarray data processing Gene expression data processing Process (read and / or write) gene expression (typically microarray) data, including analysis of one or more gene expression profiles, typically to interpret them in functional terms. Gene regulatory network processing 1.6 beta12orEarlier Process (read and / or write) a network of gene regulation. true Pathway or network analysis Analyse a known biological pathway or network. Pathway analysis Network analysis beta12orEarlier Sequencing-based expression profile data analysis Analyse SAGE, MPSS or SBS experimental data, typically to identify or quantify mRNA transcripts. beta12orEarlier beta12orEarlier true Splicing model analysis Analyse, characterize and model alternative splicing events from comparing multiple nucleic acid sequences. Splicing analysis beta12orEarlier Microarray raw data analysis beta12orEarlier beta12orEarlier true Analyse raw microarray data. Nucleic acid analysis Process (read and / or write) nucleic acid sequence or structural data. Nucleic acid data processing beta12orEarlier Protein analysis beta12orEarlier Protein data processing Process (read and / or write) protein sequence or structural data. Sequence data processing beta12orEarlier Process (read and / or write) molecular sequence data. beta13 true Structural data processing Process (read and / or write) molecular structural data. beta13 true beta12orEarlier Text processing true beta12orEarlier Process (read and / or write) text. 1.6 Protein sequence alignment analysis Analyse a protein sequence alignment, typically to detect features or make predictions. beta12orEarlier Sequence alignment analysis (protein) Nucleic acid sequence alignment analysis beta12orEarlier Sequence alignment analysis (nucleic acid) Analyse a protein sequence alignment, typically to detect features or make predictions. Nucleic acid sequence comparison Sequence comparison (nucleic acid) Compare two or more nucleic acid sequences. beta12orEarlier Protein sequence comparison beta12orEarlier Sequence comparison (protein) Compare two or more protein sequences. DNA back-translation beta12orEarlier Back-translate a protein sequence into DNA. Sequence editing (nucleic acid) 1.8 true Edit or change a nucleic acid sequence, either randomly or specifically. beta12orEarlier Sequence editing (protein) Edit or change a protein sequence, either randomly or specifically. beta12orEarlier true 1.8 Sequence generation (nucleic acid) Generate a nucleic acid sequence by some means. beta12orEarlier Sequence generation (protein) Generate a protein sequence by some means. beta12orEarlier Nucleic acid sequence visualisation Visualise, format or render a nucleic acid sequence. true Various nucleic acid sequence analysis methods might generate a sequence rendering but are not (for brevity) listed under here. 1.8 beta12orEarlier Protein sequence visualisation true beta12orEarlier Visualise, format or render a protein sequence. 1.8 Various protein sequence analysis methods might generate a sequence rendering but are not (for brevity) listed under here. Nucleic acid structure comparison Compare nucleic acid tertiary structures. beta12orEarlier Structure comparison (nucleic acid) Structure processing (nucleic acid) 1.6 beta12orEarlier true Process (read and / or write) nucleic acid tertiary structure data. DNA mapping beta12orEarlier Generate a map of a DNA sequence annotated with positional or non-positional features of some type. Map data processing DNA map data processing Process (read and / or write) a DNA map of some type. beta12orEarlier true 1.6 Protein hydropathy calculation beta12orEarlier Analyse the hydrophobic, hydrophilic or charge properties of a protein (from analysis of sequence or structural information). Protein binding site prediction Ligand-binding and active site prediction beta12orEarlier Binding site prediction Identify or predict catalytic residues, active sites or other ligand-binding sites in protein sequences or structures. Sequence tagged site (STS) mapping beta12orEarlier Sequence mapping An STS is a short subsequence of known sequence and location that occurs only once in the chromosome or genome that is being mapped. Sources of STSs include 1. expressed sequence tags (ESTs), simple sequence length polymorphisms (SSLPs), and random genomic sequences from cloned genomic DNA or database sequences. Generate a physical DNA map (sequence map) from analysis of sequence tagged sites (STS). Alignment Compare two or more entities, typically the sequence or structure (or derivatives) of macromolecules, to identify equivalent subunits. Alignment Alignment generation beta12orEarlier Alignment construction Protein fragment weight comparison beta12orEarlier Calculate the molecular weight of a protein (or fragments) and compare it another protein or reference data. Protein property comparison Compare the physicochemical properties of two or more proteins (or reference data). beta12orEarlier Secondary structure comparison Compare two or more molecular secondary structures. beta12orEarlier Hopp and Woods plotting beta12orEarlier Generate a Hopp and Woods plot of antigenicity of a protein. Microarray cluster textual view generation beta12orEarlier Visualise gene clusters with gene names. Microarray wave graph plotting Microarray wave graph rendering Microarray cluster temporal graph rendering beta12orEarlier This view can be rendered as a pie graph. The distance matrix is sorted by cluster number and typically represented as a diagonal matrix with distance values displayed in different color shades. Visualise clustered gene expression data as a set of waves, where each wave corresponds to a gene across samples on the X-axis. Microarray dendrograph plotting Microarray dendrograph rendering Generate a dendrograph of raw, preprocessed or clustered microarray data. beta12orEarlier Microarray checks view rendering Microarray view rendering Microarray proximity map plotting beta12orEarlier Microarray distance map rendering Generate a plot of distances (distance matrix) between genes. Microarray proximity map rendering Microarray tree or dendrogram rendering Microarray 2-way dendrogram rendering beta12orEarlier Visualise clustered gene expression data using a gene tree, array tree and color coded band of gene expression. Microarray matrix tree plot rendering Microarray principal component plotting beta12orEarlier Microarray principal component rendering Generate a line graph drawn as sum of principal components (Eigen value) and individual expression values. Microarray scatter plot plotting Generate a scatter plot of microarray data, typically after principal component analysis. beta12orEarlier Microarray scatter plot rendering Whole microarray graph plotting Visualise gene expression data where each band (or line graph) corresponds to a sample. beta12orEarlier Whole microarray graph rendering Microarray tree-map rendering beta12orEarlier Visualise gene expression data after hierarchical clustering for representing hierarchical relationships. Microarray Box-Whisker plot plotting beta12orEarlier Visualise raw and pre-processed gene expression data, via a plot showing over- and under-expression along with mean, upper and lower quartiles. Physical mapping beta12orEarlier Generate a physical (sequence) map of a DNA sequence showing the physical distance (base pairs) between features or landmarks such as restriction sites, cloned DNA fragments, genes and other genetic markers. Analysis Apply analytical methods to existing data of a specific type. For non-analytical operations, see the 'Processing' branch. beta12orEarlier Alignment analysis Process or analyse an alignment of molecular sequences or structures. true beta12orEarlier 1.8 Article analysis Analyse a body of scientific text (typically a full text article from a scientific journal.) beta12orEarlier Article analysis Molecular interaction analysis Analyse the interactions of two or more molecules (or parts of molecules) that are known to interact. beta12orEarlier beta13 true Protein interaction analysis beta12orEarlier Analyse known protein-protein, protein-DNA/RNA or protein-ligand interactions. Residue contact calculation Calculate contacts between residues and some other group in a protein structure. beta12orEarlier Alignment processing true Process (read and / or write) an alignment of two or more molecular sequences, structures or derived data. 1.6 beta12orEarlier Structure alignment processing Process (read and / or write) a molecular tertiary (3D) structure alignment. 1.6 beta12orEarlier true Codon usage bias calculation Calculate codon usage bias. beta12orEarlier Codon usage bias plotting beta12orEarlier Generate a codon usage bias plot. Codon usage fraction calculation Calculate the differences in codon usage fractions between two sequences, sets of sequences, codon usage tables etc. beta12orEarlier Classification beta12orEarlier Assign molecular sequences, structures or other biological data to a specific group or category according to qualities it shares with that group or category. Molecular interaction data processing beta13 true beta12orEarlier Process (read and / or write) molecular interaction data. Sequence classification beta12orEarlier Assign molecular sequence(s) to a group or category. Structure classification Assign molecular structure(s) to a group or category. beta12orEarlier Protein comparison Compare two or more proteins (or some aspect) to identify similarities. beta12orEarlier Nucleic acid comparison beta12orEarlier Compare two or more nucleic acids to identify similarities. Prediction and recognition (protein) beta12orEarlier Predict, recognise, detect or identify some properties of proteins. Prediction and recognition (nucleic acid) beta12orEarlier Predict, recognise, detect or identify some properties of nucleic acids. Structure editing beta13 Edit, convert or otherwise change a molecular tertiary structure, either randomly or specifically. Sequence alignment editing Edit, convert or otherwise change a molecular sequence alignment, either randomly or specifically. beta13 Pathway or network visualisation Render (visualise) a biological pathway or network. Pathway or network rendering beta13 Protein function prediction (from sequence) beta13 true Predict general (non-positional) functional properties of a protein from analysing its sequence. For functional properties that are positional, use 'Protein site detection' instead. 1.6 Protein sequence feature detection Protein site recognition Predict, recognise and identify functional or other key sites within protein sequences, typically by scanning for known motifs, patterns and regular expressions. Protein site prediction Sequence profile database search Protein site detection Protein secondary database search Sequence feature detection (protein) beta13 Protein property calculation (from sequence) beta13 Calculate (or predict) physical or chemical properties of a protein, including any non-positional properties of the molecular sequence, from processing a protein sequence. Protein feature prediction (from structure) beta13 1.6 true Predict, recognise and identify positional features in proteins from analysing protein structure. Protein feature detection Features includes functional sites or regions, secondary structure, structural domains and so on. Methods might use fingerprints, motifs, profiles, hidden Markov models, sequence alignment etc to provide a mapping of a query protein sequence to a discriminatory element. This includes methods that search a secondary protein database (Prosite, Blocks, ProDom, Prints, Pfam etc.) to assign a protein sequence(s) to a known protein family or group. Predict, recognise and identify positional features in proteins from analysing protein sequences or structures. beta13 Protein feature recognition Protein feature prediction Database search (by sequence) Sequence screening true 1.6 Screen a molecular sequence(s) against a database (of some type) to identify similarities between the sequence and database entries. beta13 Protein interaction network prediction beta13 Predict a network of protein interactions. Nucleic acid design beta13 Design (or predict) nucleic acid sequences with specific chemical or physical properties. Editing beta13 Edit a data entity, either randomly or specifically. Sequence assembly validation 1.1 Evaluate a DNA sequence assembly, typically for purposes of quality control. Genome alignment Align two or more (tpyically huge) molecular sequences that represent genomes. Genome alignment construction 1.1 Genome alignment Localized reassembly Reconstruction of a sequence assembly in a localised area. 1.1 Sequence assembly visualisation Assembly rendering Sequence assembly rendering Render and visualise a DNA sequence assembly. 1.1 Assembly visualisation Base-calling Phred base calling 1.1 Identify base (nucleobase) sequence from a fluorescence 'trace' data generated by an automated DNA sequencer. Base calling Phred base-calling Bisulfite mapping 1.1 Bisulfite mapping follows high-throughput sequencing of DNA which has undergone bisulfite treatment followed by PCR amplification; unmethylated cytosines are specifically converted to thymine, allowing the methylation status of cytosine in the DNA to be detected. The mapping of methylation sites in a DNA (genome) sequence. Bisulfite sequence alignment Bisulfite sequence mapping Sequence contamination filtering beta12orEarlier Identify and filter a (typically large) sequence data set to remove sequences from contaminants in the sample that was sequenced. Trim ends 1.1 Trim sequences (typically from an automated DNA sequencer) to remove misleading ends. For example trim polyA tails, introns and primer sequence flanking the sequence of amplified exons, or other unwanted sequence. Trim vector Trim sequences (typically from an automated DNA sequencer) to remove sequence-specific end regions, typically contamination from vector sequences. 1.1 Trim to reference 1.1 Trim sequences (typically from an automated DNA sequencer) to remove the sequence ends that extend beyond an assembled reference sequence. Sequence trimming 1.1 Cut (remove) the end from a molecular sequence. Genome feature comparison Genomic elements that might be compared include genes, indels, single nucleotide polymorphisms (SNPs), retrotransposons, tandem repeats and so on. Compare the features of two genome sequences. 1.1 Sequencing error detection Short read error correction Short-read error correction beta12orEarlier Detect errors in DNA sequences generated from sequencing projects). Genotyping 1.1 Methods might consider cytogenetic analyses, copy number polymorphism (and calculate copy number calls for copy-number variation(CNV) regions), single nucleotide polymorphism (SNP), , rare copy number variation (CNV) identification, loss of heterozygosity data and so on. Analyse DNA sequence data to identify differences between the genetic composition (genotype) of an individual compared to other individual's or a reference sequence. Genetic variation analysis 1.1 Sequence variation analysis Genetic variation annotation provides contextual interpretation of coding SNP consequences in transcripts. It allows comparisons to be made between variation data in different populations or strains for the same transcript. Genetic variation annotation Analyse a genetic variation, for example to annotate its location, alleles, classification, and effects on individual transcripts predicted for a gene model. Read mapping Short oligonucleotide alignment Oligonucleotide mapping Oligonucleotide alignment generation Short read mapping Oligonucleotide alignment construction The purpose of read mapping is to identify the location of sequenced fragments within a reference genome and assumes that there is, in fact, at least local similarity between the fragment and reference sequences. Oligonucleotide alignment Read alignment 1.1 Short read alignment Align short oligonucleotide sequences (reads) to a larger (genomic) sequence. Short sequence read mapping Split read mapping A varient of oligonucleotide mapping where a read is mapped to two separate locations because of possible structural variation. 1.1 DNA barcoding Analyse DNA sequences in order to identify a DNA barcode; short fragment(s) of DNA that are useful to diagnose the taxa of biological organisms. 1.1 Sample barcoding SNP calling Identify single nucleotide change in base positions in sequencing data that differ from a reference genome and which might, especially by reference to population frequency or functional data, indicate a polymorphism. Operations usually score confidence in the prediction or some other statistical measure of evidence. 1.1 Mutation detection Polymorphism detection Detect mutations in multiple DNA sequences, for example, from the alignment and comparison of the fluorescent traces produced by DNA sequencing hardware. 1.1 Chromatogram visualisation Visualise, format or render an image of a Chromatogram. Chromatogram viewing 1.1 Methylation analysis 1.1 Determine cytosine methylation states in nucleic acid sequences. Methylation calling 1.1 Determine cytosine methylation status of specific positions in a nucleic acid sequences. Methylation level analysis (global) 1.1 Global methylation analysis Measure the overall level of methyl cytosines in a genome from analysis of experimental data, typically from chromatographic methods and methyl accepting capacity assay. Methylation level analysis (gene-specific) Gene-specific methylation analysis Many different techniques are available for this. Measure the level of methyl cytosines in specific genes. 1.1 Genome visualisation 1.1 Genome visualization Visualise, format or render a nucleic acid sequence that is part of (and in context of) a complete genome sequence. Genome rendering Genome visualisation Genome viewing Genome browsing Genome comparison Compare the sequence or features of two or more genomes, for example, to find matching regions. 1.1 Genomic region matching Genome indexing Many sequence alignment tasks involving many or very large sequences rely on a precomputed index of the sequence to accelerate the alignment. Generate an index of a genome sequence. 1.1 Genome indexing (Burrows-Wheeler) The Burrows-Wheeler Transform (BWT) is a permutation of the genome based on a suffix array algorithm. Generate an index of a genome sequence using the Burrows-Wheeler algorithm. 1.1 Genome indexing (suffix arrays) 1.1 Generate an index of a genome sequence using a suffix arrays algorithm. suffix arrays A suffix array consists of the lexicographically sorted list of suffixes of a genome. Spectral analysis Spectral analysis 1.1 Spectrum analysis Analyse a spectrum from a mass spectrometry (or other) experiment. Mass spectrum analysis Peak detection 1.1 Peak finding Peak assignment Identify peaks in a spectrum from a mass spectrometry, NMR, or some other spectrum-generating experiment. Scaffolding Scaffold construction Link together a non-contiguous series of genomic sequences into a scaffold, consisting of sequences separated by gaps of known length. The sequences that are linked are typically typically contigs; contiguous sequences corresponding to read overlaps. 1.1 Scaffold may be positioned along a chromosome physical map to create a "golden path". Scaffold generation Scaffold gap completion Fill the gaps in a sequence assembly (scaffold) by merging in additional sequences. Different techniques are used to generate gap sequences to connect contigs, depending on the size of the gap. For small (5-20kb) gaps, PCR amplification and sequencing is used. For large (>20kb) gaps, fragments are cloned (e.g. in BAC (Bacterial artificial chromosomes) vectors) and then sequenced. 1.1 Sequencing quality control Raw sequence data quality control. Analyse raw sequence data from a sequencing pipeline and identify problems. Sequencing QC 1.1 Read pre-processing Sequence read pre-processing This is a broad concept and is used a placeholder for other, more specific concepts. For example process paired end reads to trim low quality ends remove short sequences, identify sequence inserts, detect chimeric reads, or remove low quality sequnces including vector, adaptor, low complexity and contaminant sequences. Sequences might come from genomic DNA library, EST libraries, SSH library and so on. Pre-process sequence reads to ensure (or improve) quality and reliability. 1.1 Species frequency estimation Estimate the frequencies of different species from analysis of the molecular sequences, typically of DNA recovered from environmental samples. 1.1 Peak calling Chip-sequencing combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to generate a set of reads, which are aligned to a genome sequence. The enriched areas contain the binding sites of DNA-associated proteins. For example, a transcription factor binding site. ChIP-on-chip in contrast combines chromatin immunoprecipitation ('ChIP') with microarray ('chip'). Identify putative protein-binding regions in a genome sequence from analysis of Chip-sequencing data or ChIP-on-chip data. Protein binding peak detection 1.1 Differential expression analysis Identify (typically from analysis of microarray or RNA-seq data) genes whose expression levels are significantly different between two sample groups. Differentially expressed gene identification Differential expression analysis is used, for example, to identify which genes are up-regulated (increased expression) or down-regulated (decreased expression) between a group treated with a drug and a control groups. 1.1 Gene set testing 1.1 Gene sets can be defined beforehand by biological function, chromosome locations and so on. Analyse gene expression patterns (typically from DNA microarray datasets) to identify sets of genes that are associated with a specific trait, condition, clinical outcome etc. Variant classification Classify variants based on their potential effect on genes, especially functional effects on the expressed proteins. 1.1 Variants are typically classified by their position (intronic, exonic, etc.) in a gene transcript and (for variants in coding exons) by their effect on the protein sequence (synonymous, non-synonymous, frameshifting, etc.) Variant prioritization Variant prioritization can be used for example to produce a list of variants responsible for 'knocking out' genes in specific genomes. Methods amino acid substitution, aggregative approaches, probabilistic approach, inheritance and unified likelihood-frameworks. Identify biologically interesting variants by prioritizing individual variants, for example, homozygous variants absent in control genomes. 1.1 Variant calling Variant mapping 1.1 Identify and map genomic alterations, including single nucleotide polymorphisms, short indels and structural variants, in a genome sequence. Methods often utilise a database of aligned reads. Structural variation discovery Detect large regions in a genome subject to copy-number variation, or other structural variations in genome(s). 1.1 Methods might involve analysis of whole-genome array comparative genome hybridization or single-nucleotide polymorphism arrays, paired-end mapping of sequencing data, or from analysis of short reads from new sequencing technologies. Exome analysis 1.1 Targeted exome capture Exome sequencing is considered a cheap alternative to whole genome sequencing. Exome sequence analysis Anaylse sequencing data from experiments aiming to selectively sequence the coding regions of the genome. Read depth analysis 1.1 Analyse mapping density (read depth) of (typically) short reads from sequencing platforms, for example, to detect deletions and duplications. Gene expression QTL analysis expression quantitative trait loci profiling 1.1 eQTL profiling Combine classical quantitative trait loci (QTL) analysis with gene expression profiling, for example, to describe describe cis- and trans-controlling elements for the expression of phenotype associated genes. expression QTL profiling Copy number estimation Methods typically implement some statistical model for hypothesis testing, and methods estimate total copy number, i.e. do not distinguish the two inherited chromosomes quantities (specific copy number). Transcript copy number estimation 1.1 Estimate the number of copies of loci of particular gene(s) in DNA sequences typically from gene-expression profiling technology based on microarray hybridization-based experiments. For example, estimate copy number (or marker dosage) of a dominant marker in samples from polyploid plant cells or tissues, or chromosomal gains and losses in tumors. Primer removal 1.2 Remove forward and/or reverse primers from nucleic acid sequences (typically PCR products). Transcriptome assembly Infer a transcriptome sequence by analysis of short sequence reads. 1.2 Transcriptome assembly (de novo) de novo transcriptome assembly true 1.6 1.2 Infer a transcriptome sequence without the aid of a reference genome, i.e. by comparing short sequences (reads) to each other. Transcriptome assembly (mapping) Infer a transcriptome sequence by mapping short reads to a reference genome. 1.6 1.2 true Sequence coordinate conversion 1.3 Convert one set of sequence coordinates to another, e.g. convert coordinates of one assembly to another, cDNA to genomic, CDS to genomic, protein translation to genomic etc. Document similarity calculation Calculate similarity between 2 or more documents. 1.3 Document clustering Cluster (group) documents on the basis of their calculated similarity. 1.3 Named entity recognition Entity identification Entity chunking Entity extraction Recognise named entities (text tokens) within documents. 1.3 ID mapping Identifier mapping The mapping can be achieved by comparing identifier values or some other means, e.g. exact matches to a provided sequence. 1.3 Accession mapping Map data identifiers to one another for example to establish a link between two biological databases for the purposes of data integration. Anonymisation Process data in such a way that makes it hard to trace to the person which the data concerns. 1.3 Data anonymisation ID retrieval id retrieval Data retrieval (accession) Data retrieval (ID) Identifier retrieval Data retrieval (id) Accession retrieval Search for and retrieve a data identifier of some kind, e.g. a database entry accession. 1.3 Sequence checksum generation Generate a checksum of a molecular sequence. 1.4 Bibliography generation Bibliography construction Construct a bibliography from the scientific literature. 1.4 Protein quaternary structure prediction 1.4 Predict the structure of a multi-subunit protein and particularly how the subunits fit together. Protein surface analysis 1.4 Analyse the surface properties of proteins. Ontology comparison 1.4 Compare two or more ontologies, e.g. identify differences. Ontology comparison 1.4 Compare two or more ontologies, e.g. identify differences. 1.9 Format detection Recognition of which format the given data is in. 1.4 Format identification Format recognition 'Format recognition' is not a bioinformatics-specific operation, but of great relevance in bioinformatics. Should be removed from EDAM if/when captured satisfactorily in a suitable domain-generic ontology. Format inference The has_input "Data" (data_0006) may cause visualisation or other problems although ontologically correct. But on the other hand it may be useful to distinguish from nullary operations without inputs. Splitting File splitting Split a file containing multiple data items into many files, each containing one item 1.4 Generation Construction beta12orEarlier For non-analytical operations, see the 'Processing' branch. Construct some data entity. Nucleic acid sequence feature detection Nucleic acid site prediction Predict, recognise and identify functional or other key sites within nucleic acid sequences, typically by scanning for known motifs, patterns and regular expressions. Nucleic acid site recognition 1.6 Nucleic acid site detection Deposition Deposit some data in a database or some other type of repository or software system. 1.6 Database submission Submission Data submission Data deposition Database deposition For non-analytical operations, see the 'Processing' branch. Clustering 1.6 Group together some data entities on the basis of similarities such that entities in the same group (cluster) are more similar to each other than to those in other groups (clusters). Assembly 1.6 Construct some entity (typically a molecule sequence) from component pieces. Conversion 1.6 Non-analytical data conversion. Standardization and normalization 1.6 Standardize or normalize data. Aggregation Combine multiple files or data items into a single file or object. 1.6 Article comparison Compare two or more scientific articles. 1.6 Calculation Mathemetical determination of the value of something, typically a properly of a molecule. 1.6 Pathway or network prediction 1.6 Predict a molecular pathway or network. Genome assembly 1.6 The process of assembling many short DNA sequences together such thay they represent the original chromosomes from which the DNA originated. Plotting Generate a graph, or other visual representation, of data, showing the relationship between two or more variables. 1.6 Image analysis 1.7 The analysis of a image (typically a digital image) of some type in order to extract information from it. Image processing Diffraction data analysis 1.7 Analysis of data from a diffraction experiment. Cell migration analysis 1.7 Analysis of cell migration images in order to study cell migration, typically in order to study the processes that play a role in the disease progression. Diffraction data reduction 1.7 Processing of diffraction data into a corrected, ordered, and simplified form. Neurite measurement Measurement of neurites; projections (axons or dendrites) from the cell body of a neuron, from analysis of neuron images. 1.7 Diffraction data integration 1.7 Diffraction summation integration Diffraction profile fitting The evaluation of diffraction intensities and integration of diffraction maxima from a diffraction experiment. Phasing Phase a macromolecular crystal structure, for example by using molecular replacement or experimental phasing methods. 1.7 Molecular replacement 1.7 A technique used to construct an atomic model of an unknown structure from diffraction data, based upon an atomic model of a known structure, either a related protein or the same protein from a different crystal form. The technique solves the phase problem, i.e. retrieve information concern phases of the structure. Rigid body refinement 1.7 Rigid body refinement usually follows molecular replacement in the assignment of a structure from diffraction data. A method used to refine a structure by moving the whole molecule or parts of it as a rigid unit, rather than moving individual atoms. Single particle analysis An image processing technique that combines and analyze multiple images of a particulate sample, in order to produce an image with clearer features that are more easily interpreted. 1.7 Single particle analysis is used to improve the information that can be obtained by relatively low resolution techniques, , e.g. an image of a protein or virus from transmission electron microscopy (TEM). Single particle alignment and classification Compare (align and classify) multiple particle images from a micrograph in order to produce a representative image of the particle. 1.7 A micrograph can include particles in multiple different orientations and/or conformations. Particles are compared and organised into sets based on their similarity. Typically iterations of classification and alignment and are performed to optimise the final image; average images produced by classification are used as a reference image for subsequent alignment of the whole image set. Functional clustering 1.7 Clustering of molecular sequences on the basis of their function, typically using information from an ontology of gene function, or some other measure of functional phenotype. Functional sequence clustering Taxonomic classification 1.7 Classifiication of molecular sequences by assignment to some taxonomic hierarchy. Virulence prediction Pathogenicity prediction The prediction of the degree of pathogenicity of a microorganism from analysis of molecular sequences. 1.7 Gene expression correlation analysis 1.7 Gene co-expression network analysis Analyse the correlation patterns among genes across across a variety of experiments, microarray samples etc. Correlation 1.7 Identify a correlation, i.e. a statistical relationship between two random variables or two sets of data. RNA structure covariance model generation Compute the covariance model for (a family of) RNA secondary structures. 1.7 RNA secondary structure prediction (shape-based) RNA shape prediction Predict RNA secondary structure by analysis, e.g. probabilistic analysis, of the shape of RNA folds. 1.7 Nucleic acid alignment folding prediction (alignment-based) 1.7 Prediction of nucleic-acid folding using sequence alignments as a source of data. k-mer counting Count k-mers (substrings of length k) in DNA sequence data. 1.7 k-mer counting is used in genome and transcriptome assembly, metagenomic sequencing, and for error correction of sequence reads. Phylogenetic tree reconstruction Reconstructing the inner node labels of a phylogenetic tree from its leafes. Note that this is somewhat different from simply analysing an existing tree or constructing a completely new one. 1.7 Probabilistic data generation Generate some data from a choosen probibalistic model, possibly to evaluate algorithms. 1.7 Probabilistic sequence generation 1.7 Generate sequences from some probabilistic model, e.g. a model that simulates evolution. Antimicrobial resistance prediction 1.7 Identify or predict causes for antibiotic resistance from molecular sequence analysis. Enrichment A relevant ontology will be used. The input is typically a set of identifiers or other data, and the output of the analysis is typically a ranked list of ontology terms, each associated with a p-value. Term enrichment 1.8 Analyse a dataset with respect to concepts from an ontology. Chemical class enrichment 1.8 Analyse a dataset with respect to concepts from an ontology of chemical structure. Incident curve plotting 1.8 Plot an incident curve such as a survival curve, death curve, mortality curve. Variant pattern analysis Methods often utilise a database of aligned reads. Identify and map patterns of genomic variations. 1.8 Mathematical modelling Model some biological system using mathematical techniques including dynamical systems, statistical models, differential equations, and game theoretic models. beta12orEarlier Microscope image visualisation Visualise images resulting from various types of microscopy. 1.9 Microscopy image visualisation Image annotation 1.9 Annotate an image of some sort, typically with terms from a controlled vocabulary. Imputation Data imputation Replace missing data with substituted values, usually by using some statistical or other mathematical approach. true 1.9 Ontology visualisation 1.9 Visualise, format or render data from an ontology, typically a tree of terms. Ontology browsing Maximum occurence analysis A method for making numerical assessments about the maximum percent of time that a conformer of a flexible macromolecule can exist and still be compatible with the experimental data. beta12orEarlier Database comparison 1.9 Data model comparison Compare the models or schemas used by two or more databases, or any other general comparison of databases rather than a detailed comparison of the entries themselves. Schema comparison Network simulation Simulate the bevaviour of a biological pathway or network. Pathway simulation Network topology simulation 1.9 RNA-seq read count analysis Analyze read counts from RNA-seq experiments. 1.9 Chemical redundancy removal 1.9 Identify and remove redudancy from a set of small molecule structures. RNA-seq time series data analysis 1.9 Analyze time series data from an RNA-seq experiment. Simulated gene expression data generation 1.9 Simulate gene expression data, e.g. for purposes of benchmarking. Topic http://purl.org/biotop/biotop.owl#Quality http://bioontology.org/ontologies/ResearchArea.owl#Area_of_Research http://www.onto-med.de/ontologies/gfo.owl#Category http://www.ifomis.org/bfo/1.1/snap#Quality http://www.onto-med.de/ontologies/gfo.owl#Perpetuant A category denoting a rather broad domain or field of interest, of study, application, work, data, or technology. Topics have no clearly defined borders between each other. http://www.loa-cnr.it/ontologies/DOLCE-Lite.owl#quality beta12orEarlier http://www.ifomis.org/bfo/1.1/snap#Continuant sumo:FieldOfStudy http://onto.eva.mpg.de/ontologies/gfo-bio.owl#Method Nucleic acids The processing and analysis of nucleic acid sequence, structural and other data. Nucleic acid bioinformatics Nucleic acid analysis Nucleic acid informatics http://purl.bioontology.org/ontology/MSH/D017423 Nucleic acid properties Nucleic acid physicochemistry http://purl.bioontology.org/ontology/MSH/D017422 beta12orEarlier Proteins Protein bioinformatics Protein informatics Protein databases Protein analysis http://purl.bioontology.org/ontology/MSH/D020539 Archival, processing and analysis of protein data, typically molecular sequence and structural data. beta12orEarlier Metabolites Metabolite structures This concept excludes macromolecules such as proteins and nucleic acids. The structures of reactants or products of metabolism, for example small molecules such as including vitamins, polyols, nucleotides and amino acids. beta12orEarlier Sequence analysis beta12orEarlier Sequence databases Sequences http://purl.bioontology.org/ontology/MSH/D017421 The archival, processing and analysis of molecular sequences (monomer composition of polymers) including molecular sequence data resources, sequence sites, alignments, motifs and profiles. Structure analysis Computational structural biology The curation, processing and analysis of the structure of biological molecules, typically proteins and nucleic acids and other macromolecules. http://purl.bioontology.org/ontology/MSH/D015394 Structure analysis Structural bioinformatics Structure databases This includes related concepts such as structural properties, alignments and structural motifs. Structure data resources beta12orEarlier Structure prediction beta12orEarlier The prediction of molecular (secondary or tertiary) structure. Alignment beta12orEarlier true The alignment (equivalence between sites) of molecular sequences, structures or profiles (representing a sequence or structure alignment). beta12orEarlier Phylogeny Phylogeny reconstruction Phylogenetic stratigraphy beta12orEarlier Phylogenetic dating Phylogenetic clocks http://purl.bioontology.org/ontology/MSH/D010802 The study of evolutionary relationships amongst organisms. Phylogenetic simulation This includes diverse phylogenetic methods, including phylogenetic tree construction, typically from molecular sequence or morphological data, methods that simulate DNA sequence evolution, a phylogenetic tree or the underlying data, or which estimate or use molecular clock and stratigraphic (age) data, methods for studying gene evolution etc. Functional genomics beta12orEarlier The study of gene or protein functions and their interactions in totality in a given organism, tissue, cell etc. Ontology and terminology Terminology beta12orEarlier http://purl.bioontology.org/ontology/MSH/D002965 Applied ontology Ontology The conceptualisation, categorisation and nomenclature (naming) of entities or phenomena within biology or bioinformatics. This includes formal ontologies, controlled vocabularies, structured glossary, symbols and terminology or other related resource. Ontologies Information retrieval beta12orEarlier Data retrieval The search and query of data sources (typically databases or ontologies) in order to retrieve entries or other information. This includes, for example, search, query and retrieval of molecular sequences and associated data. Data search VT 1.3.3 Information retrieval Data query Bioinformatics This includes data processing in general, including basic handling of files and databases, datatypes, workflows and annotation. VT 1.5.6 Bioinformatics The archival, curation, processing and analysis of complex biological data. http://purl.bioontology.org/ontology/MSH/D016247 beta12orEarlier Data visualisation Data rendering Rendering (drawing on a computer screen) or visualisation of molecular sequences, structures or other biomolecular data. VT 1.2.5 Computer graphics beta12orEarlier Computer graphics Nucleic acid thermodynamics true The study of the thermodynamic properties of a nucleic acid. 1.3 Nucleic acid structure analysis Includes secondary and tertiary nucleic acid structural data, nucleic acid thermodynamic, thermal and conformational properties including DNA or DNA/RNA denaturation (melting) etc. DNA melting Nucleic acid denaturation RNA alignment The archival, curation, processing and analysis of nucleic acid structural information, such as whole structures, structural features and alignments, and associated annotation. RNA structure alignment beta12orEarlier Nucleic acid structure Nucleic acid thermodynamics RNA structure RNA beta12orEarlier RNA sequences and structures. Nucleic acid restriction 1.3 beta12orEarlier Topic for the study of restriction enzymes, their cleavage sites and the restriction of nucleic acids. true Mapping Genetic linkage Linkage Linkage mapping Synteny DNA mapping beta12orEarlier The mapping of complete (typically nucleotide) sequences. This includes resources that aim to identify, map or analyse genetic markers in DNA sequences, for example to produce a genetic (linkage) map of a chromosome or genome or to analyse genetic linkage and synteny. It also includes resources for physical (sequence) maps of a DNA sequence showing the physical distance (base pairs) between features or landmarks such as restriction sites, cloned DNA fragments, genes and other genetic markers. Genetic codes and codon usage beta12orEarlier true 1.3 Codon usage analysis The study of codon usage in nucleotide sequence(s), genetic codes and so on. Protein expression Translation The translation of mRNA into protein and subsequent protein processing in the cell. beta12orEarlier Gene finding 1.3 This includes the study of promoters, coding regions, splice sites, etc. Methods for gene prediction might be ab initio, based on phylogenetic comparisons, use motifs, sequence features, support vector machine, alignment etc. Gene discovery Methods that aims to identify, predict, model or analyse genes or gene structure in DNA sequences. beta12orEarlier Gene prediction true Transcription 1.3 The transcription of DNA into mRNA. beta12orEarlier true Promoters true beta12orEarlier Promoters in DNA sequences (region of DNA that facilitates the transcription of a particular gene by binding RNA polymerase and transcription factor proteins). beta13 Nucleic acid folding beta12orEarlier The folding (in 3D space) of nucleic acid molecules. true beta12orEarlier Gene structure This includes the study of promoters, coding regions etc. beta12orEarlier Gene features Gene structure, regions which make an RNA product and features such as promoters, coding regions, gene fusion, splice sites etc. Proteomics beta12orEarlier Protein and peptide identification, especially in the study of whole proteomes of organisms. Protein and peptide identification Peptide identification Proteomics includes any methods (especially high-throughput) that separate, characterize and identify expressed proteins such as mass spectrometry, two-dimensional gel electrophoresis and protein microarrays, as well as in-silico methods that perform proteolytic or mass calculations on a protein sequence and other analyses of protein expression data, for example in different cells or tissues. http://purl.bioontology.org/ontology/MSH/D040901 Protein expression Structural genomics beta12orEarlier The elucidation of the three dimensional structure for all (available) proteins in a given organism. Protein properties The study of the physical and biochemical properties of peptides and proteins, for example the hydrophobic, hydrophilic and charge properties of a protein. Protein hydropathy Protein physicochemistry beta12orEarlier Protein interactions Protein-protein, protein-DNA/RNA and protein-ligand interactions, including analysis of known interactions and prediction of putative interactions. Protein-nucleic acid interactions Protein-RNA interaction This includes experimental (e.g. yeast two-hybrid) and computational analysis techniques. Protein-protein interactions Protein-ligand interactions beta12orEarlier Protein-DNA interaction Protein folding, stability and design Protein folding Protein stability beta12orEarlier Protein stability, folding (in 3D space) and protein sequence-structure-function relationships. This includes for example study of inter-atomic or inter-residue interactions in protein (3D) structures, the effect of mutation, and the design of proteins with specific properties, typically by designing changes (via site-directed mutagenesis) to an existing protein. Protein residue interactions Protein design Rational protein design Two-dimensional gel electrophoresis Two-dimensional gel electrophoresis image and related data. beta13 beta12orEarlier true Mass spectrometry beta12orEarlier An analytical chemistry technique that measures the mass-to-charge ratio and abundance of irons in the gas phase. Protein microarrays Protein microarray data. true beta12orEarlier beta13 Protein hydropathy beta12orEarlier true The study of the hydrophobic, hydrophilic and charge properties of a protein. 1.3 Protein targeting and localization Protein targeting Protein sorting The study of how proteins are transported within and without the cell, including signal peptides, protein subcellular localization and export. Protein localization beta12orEarlier Protein cleavage sites and proteolysis true beta12orEarlier 1.3 Enzyme or chemical cleavage sites and proteolytic or mass calculations on a protein sequence. Protein structure comparison The comparison of two or more protein structures. beta12orEarlier true Use this concept for methods that are exclusively for protein structure. beta12orEarlier Protein residue interactions The processing and analysis of inter-atomic or inter-residue interactions in protein (3D) structures. Protein residue interactions true 1.3 beta12orEarlier Protein-protein interactions Protein interaction networks true Protein-protein interactions, individual interactions and networks, protein complexes, protein functional coupling etc. beta12orEarlier 1.3 Protein-ligand interactions beta12orEarlier true 1.3 Protein-ligand (small molecule) interactions. Protein-nucleic acid interactions beta12orEarlier 1.3 Protein-DNA/RNA interactions. true Protein design 1.3 beta12orEarlier The design of proteins with specific properties, typically by designing changes (via site-directed mutagenesis) to an existing protein. true G protein-coupled receptors (GPCR) G-protein coupled receptors (GPCRs). true beta12orEarlier beta12orEarlier Carbohydrates beta12orEarlier Carbohydrates, typically including structural information. Lipids beta12orEarlier Lipids and their structures. Small molecules Small molecules of biological significance, typically archival, curation, processing and analysis of structural information. Small molecules include organic molecules, metal-organic compounds, small polypeptides, small polysaccharides and oligonucleotides. Structural data is usually included. CHEBI:23367 beta12orEarlier Sequence editing beta12orEarlier true beta12orEarlier Edit, convert or otherwise change a molecular sequence, either randomly or specifically. Sequence composition, complexity and repeats Sequence complexity Repeat sequences The archival, processing and analysis of the basic character composition of molecular sequences, for example character or word frequency, ambiguity, complexity, particularly regions of low complexity, and repeats or the repetitive nature of molecular sequences. beta12orEarlier Sequence repeats Low complexity sequences Sequence composition Sequence motifs beta12orEarlier Motifs true 1.3 Conserved patterns (motifs) in molecular sequences, that (typically) describe functional or other key sites. Sequence comparison The comparison might be on the basis of sequence, physico-chemical or some other properties of the sequences. beta12orEarlier The comparison of two or more molecular sequences, for example sequence alignment and clustering. Sequence sites, features and motifs Sequence features The archival, detection, prediction and analysis of positional features such as functional and other key sites, in molecular sequences and the conserved patterns (motifs, profiles etc.) that may be used to describe them. Functional sites Sequence motifs Sequence profiles Sequence sites HMMs beta12orEarlier Sequence database search beta12orEarlier Search and retrieve molecular sequences that are similar to a sequence-based query (typically a simple sequence). beta12orEarlier true The query is a sequence-based entity such as another sequence, a motif or profile. Sequence clustering This includes systems that generate, process and analyse sequence clusters. beta12orEarlier true 1.7 The comparison and grouping together of molecular sequences on the basis of their similarities. Sequence clusters Protein structural motifs and surfaces This includes conformation of conserved substructures, conserved geometry (spatial arrangement) of secondary structure or protein backbone, solvent-exposed surfaces, internal cavities, the analysis of shape, hydropathy, electrostatic patches, role and functions etc. Protein structural features Structural motifs Protein 3D motifs beta12orEarlier Protein structural motifs Structural features or common 3D motifs within protein structures, including the surface of a protein structure, such as biological interfaces with other molecules. Protein surfaces Structural (3D) profiles The processing, analysis or use of some type of structural (3D) profile or template; a computational entity (typically a numerical matrix) that is derived from and represents a structure or structure alignment. true beta12orEarlier 1.3 Structural profiles Protein structure prediction beta12orEarlier The prediction, modelling, recognition or design of protein secondary or tertiary structure or other structural features. Nucleic acid structure prediction The folding of nucleic acid molecules and the prediction or design of nucleic acid (typically RNA) sequences with specific conformations. DNA structure prediction Nucleic acid design RNA structure prediction beta12orEarlier Nucleic acid folding Ab initio structure prediction 1.7 The prediction of three-dimensional structure of a (typically protein) sequence from first principles, using a physics-based or empirical scoring function and without using explicit structural templates. true beta12orEarlier Homology modelling 1.4 The modelling of the three-dimensional structure of a protein using known sequence and structural data. true beta12orEarlier Molecular dynamics This includes resources concerning flexibility and motion in protein and other molecular structures. Protein dynamics Molecular flexibility Molecular motions beta12orEarlier The study and simulation of molecular (typically protein) conformation using a computational model of physical forces and computer simulation. Molecular docking beta12orEarlier The modelling the structure of proteins in complex with small molecules or other macromolecules. Protein secondary structure prediction beta12orEarlier 1.3 The prediction of secondary or supersecondary structure of protein sequences. true Protein tertiary structure prediction 1.3 true The prediction of tertiary structure of protein sequences. beta12orEarlier Protein fold recognition For example threading, or the alignment of molecular sequences to structures, structural (3D) profiles or templates (representing a structure or structure alignment). The recognition (prediction and assignment) of known protein structural domains or folds in protein sequence(s). beta12orEarlier Sequence alignment This includes the generation of alignments (the identification of equivalent sites), the analysis of alignments, editing, visualisation, alignment databases, the alignment (equivalence between sites) of sequence profiles (representing sequence alignments) and so on. beta12orEarlier 1.7 The alignment of molecular sequences or sequence profiles (representing sequence alignments). true Structure alignment The superimposition of molecular tertiary structures or structural (3D) profiles (representing a structure or structure alignment). This includes the generation, storage, analysis, rendering etc. of structure alignments. true 1.7 beta12orEarlier Threading Sequence-structure alignment 1.3 beta12orEarlier The alignment of molecular sequences to structures, structural (3D) profiles or templates (representing a structure or structure alignment). true Sequence profiles and HMMs true Sequence profiles; typically a positional, numerical matrix representing a sequence alignment. beta12orEarlier 1.3 Sequence profiles include position-specific scoring matrix (position weight matrix), hidden Markov models etc. Phylogeny reconstruction The reconstruction of a phylogeny (evolutionary relatedness amongst organisms), for example, by building a phylogenetic tree. 1.3 true Currently too specific for the topic sub-ontology (but might be unobsoleted). beta12orEarlier Phylogenomics beta12orEarlier The integrated study of evolutionary relationships and whole genome data, for example, in the analysis of species trees, horizontal gene transfer and evolutionary reconstruction. Virtual PCR beta13 Polymerase chain reaction beta12orEarlier Simulated polymerase chain reaction (PCR). PCR true Sequence assembly Assembly The assembly of fragments of a DNA sequence to reconstruct the original sequence. beta12orEarlier This covers for example the alignment of sequences of (typically millions) of short reads to a reference genome. Genetic variation http://purl.bioontology.org/ontology/MSH/D014644 Stable, naturally occuring mutations in a nucleotide sequence including alleles, naturally occurring mutations such as single base nucleotide substitutions, deletions and insertions, RFLPs and other polymorphisms. DNA variation Mutation Polymorphism beta12orEarlier Microarrays true http://purl.bioontology.org/ontology/MSH/D046228 Microarrays, for example, to process microarray data or design probes and experiments. 1.3 DNA microarrays beta12orEarlier Pharmacology Computational pharmacology beta12orEarlier Pharmacoinformatics The study of drugs and their effects or responses in living systems. VT 3.1.7 Pharmacology and pharmacy Gene expression This includes the study of codon usage in nucleotide sequence(s), genetic codes and so on. Gene expression profiling Expression profiling beta12orEarlier http://edamontology.org/topic_0197 Gene expression levels are analysed by identifying, quantifying or comparing mRNA transcripts, for example using microarrays, RNA-seq, northern blots, gene-indexed expression profiles etc. http://purl.bioontology.org/ontology/MSH/D015870 Gene expression analysis DNA microarrays The analysis of levels and patterns of synthesis of gene products (proteins and functional RNA) including interpretation in functional terms of gene expression data. Codon usage Gene regulation beta12orEarlier The regulation of gene expression. Pharmacogenomics beta12orEarlier The influence of genotype on drug response, for example by correlating gene expression or single-nucleotide polymorphisms with drug efficacy or toxicity. Medicinal chemistry VT 3.1.4 Medicinal chemistry The design and chemical synthesis of bioactive molecules, for example drugs or potential drug compounds, for medicinal purposes. This includes methods that search compound collections, generate or analyse drug 3D conformations, identify drug targets with structural docking etc. Drug design beta12orEarlier Fish beta12orEarlier true 1.3 Information on a specific fish genome including molecular sequences, genes and annotation. Flies 1.3 true beta12orEarlier Information on a specific fly genome including molecular sequences, genes and annotation. Mice or rats Information on a specific mouse or rat genome including molecular sequences, genes and annotation. The resource may be specific to a group of mice / rats or all mice / rats. beta12orEarlier Worms true 1.3 beta12orEarlier Information on a specific worm genome including molecular sequences, genes and annotation. Literature analysis beta12orEarlier 1.3 The processing and analysis of the bioinformatics literature and bibliographic data, such as literature search and query. true Data mining beta12orEarlier Text data mining The analysis of the biomedical and informatics literature. Literature analysis Text mining Literature mining Data deposition, annotation and curation Deposition and curation of database accessions, including annotation, typically with terms from a controlled vocabulary. Database curation beta12orEarlier Document, record and content management Document management File management This includes editing, reformatting, conversion, transformation, validation, debugging, indexing and so on. Content management The management and manipulation of digital documents, including database records, files and reports. VT 1.3.6 Multimedia, hypermedia Record management beta12orEarlier Sequence annotation beta12orEarlier beta12orEarlier true Annotation of a molecular sequence. Genome annotation Annotation of a genome. beta12orEarlier true beta12orEarlier NMR ROESY NOESY Nuclear Overhauser Effect Spectroscopy An analytical technique that exploits the magenetic properties of certain atomic nuclei to provide information on the structure, dynamics, reaction state and chemical environment of molecules. HOESY beta12orEarlier Heteronuclear Overhauser Effect Spectroscopy Nuclear magnetic resonance spectroscopy Spectroscopy NMR spectroscopy Rotational Frame Nuclear Overhauser Effect Spectroscopy Sequence classification beta12orEarlier The classification of molecular sequences based on some measure of their similarity. Methods including sequence motifs, profile and other diagnostic elements which (typically) represent conserved patterns (of residues or properties) in molecular sequences. Protein classification 1.3 true beta12orEarlier primarily the classification of proteins (from sequence or structural data) into clusters, groups, families etc. Sequence motif or profile beta12orEarlier true Sequence motifs, or sequence profiles derived from an alignment of molecular sequences of a particular type. This includes comparison, discovery, recognition etc. of sequence motifs. beta12orEarlier Protein modifications GO:0006464 Protein chemical modifications, e.g. post-translational modifications. Protein post-translational modification MOD:00000 EDAM does not describe all possible protein modifications. For fine-grained annotation of protein modification use the Gene Ontology (children of concept GO:0006464) and/or the Protein Modifications ontology (children of concept MOD:00000) beta12orEarlier Molecular interactions, pathways and networks Biological networks Network or pathway analysis beta13 Molecular interactions Biological models Molecular interactions, biological pathways, networks and other models. Biological pathways http://edamontology.org/topic_3076 Informatics The study and practice of information processing and use of computer information systems. VT 1.3.99 Other Knowledge management VT 1.3.4 Information management beta12orEarlier Information management VT 1.3.5 Knowledge management VT 1.3.3 Information retrieval VT 1.3 Information sciences Information science Literature data resources Data resources for the biological or biomedical literature, either a primary source of literature or some derivative. true 1.3 beta12orEarlier Laboratory information management Laboratory management and resources, for example, catalogues of biological resources for use in the lab including cell lines, viruses, plasmids, phages, DNA probes and primers and so on. beta12orEarlier Laboratory resources Cell and tissue culture Tissue culture 1.3 true General cell culture or data on a specific cell lines. Cell culture beta12orEarlier Ecology The ecological and environmental sciences and especially the application of information technology (ecoinformatics). http://purl.bioontology.org/ontology/MSH/D004777 Ecological informatics VT 1.5.15 Ecology Computational ecology beta12orEarlier Ecoinformatics Environmental science Electron microscopy SEM Scanning electron microscopy TEM The study of matter by studying the interference pattern from firing electrons at a sample, to analyse structures at resolutions higher than can be achieved using light. Transmission electron microscopy beta12orEarlier Electron crystallography Electron diffraction experiment Single particle electron microscopy Cell cycle beta13 beta12orEarlier true The cell cycle including key genes and proteins. Peptides and amino acids beta12orEarlier The physicochemical, biochemical or structural properties of amino acids or peptides. Amino acids Peptides Organelles Cell membrane Cytoplasm Organelle genes and proteins Smooth endoplasmic reticulum beta12orEarlier Lysosome Centriole Ribosome Nucleus true A specific organelle, or organelles in general, typically the genes and proteins (or genome and proteome). Mitochondria Golgi apparatus Rough endoplasmic reticulum 1.3 Ribosomes beta12orEarlier Ribosomes, typically of ribosome-related genes and proteins. Ribosome genes and proteins 1.3 true Scents A database about scents. beta12orEarlier beta13 true Drugs and target structures Drug structures beta12orEarlier The structures of drugs, drug target, their interactions and binding affinities. Target structures Model organisms This may include information on the genome (including molecular sequences and map, genes and annotation), proteome, as well as more general information about an organism. beta12orEarlier A specific organism, or group of organisms, used to study a particular aspect of biology. Organisms Genomics http://purl.bioontology.org/ontology/MSH/D023281 beta12orEarlier Whole genomes of one or more organisms, or genomes in general, such as meta-information on genomes, genome projects, gene names etc. Gene families Particular gene(s), gene family or other gene group or system and their encoded proteins. beta12orEarlier Gene family Gene system Genes, gene family or system Gene and protein families Chromosomes beta12orEarlier Study of chromosomes. Genotype and phenotype Genotype and phenotype resources The study of genetic constitution of a living entity, such as an individual, and organism, a cell and so on, typically with respect to a particular observable phenotypic traits, or resources concerning such traits, which might be an aspect of biochemistry, physiology, morphology, anatomy, development and so on. Genotyping Phenotyping beta12orEarlier Gene expression and microarray true beta12orEarlier beta12orEarlier Gene expression e.g. microarray data, northern blots, gene-indexed expression profiles etc. Sequence design Probes This includes the design of primers for PCR and DNA amplification or the design of molecular probes. http://purl.bioontology.org/ontology/MSH/D015335 Gene design Molecular probes (e.g. a peptide probe or DNA microarray probe) or primers (e.g. for PCR). Probe design in silico cloning Primer design Primers beta12orEarlier Pathology Diseases, including diseases in general and the genes, gene variations and proteins involved in one or more specific diseases. beta12orEarlier Diseases VT 3.1.6 Pathology Specific protein resources 1.3 A particular protein, protein family or other group of proteins. true Specific protein beta12orEarlier Taxonomy beta12orEarlier VT 1.5.25 Taxonomy Organism classification, identification and naming. Protein sequence analysis beta12orEarlier Archival, processing and analysis of protein sequences and sequence-based entities such as alignments, motifs and profiles. 1.8 true Nucleic acid sequence analysis beta12orEarlier 1.8 true The archival, processing and analysis of nucleotide sequences and and sequence-based entities such as alignments, motifs and profiles. Repeat sequences true The repetitive nature of molecular sequences. beta12orEarlier 1.3 Low complexity sequences true The (character) complexity of molecular sequences, particularly regions of low complexity. 1.3 beta12orEarlier Proteome A specific proteome including protein sequences and annotation. beta12orEarlier beta13 true DNA DNA analysis beta12orEarlier DNA sequences and structure, including processes such as methylation and replication. The DNA sequences might be coding or non-coding sequences. Coding RNA EST cDNA mRNA This includes expressed sequence tag (EST) or complementary DNA (cDNA) sequences. Protein-coding regions including coding sequences (CDS), exons, translation initiation sites and open reading frames beta12orEarlier Functional, regulatory and non-coding RNA ncRNA Non-coding RNA Functional RNA Non-coding or functional RNA sequences, including regulatory RNA sequences, ribosomal RNA (rRNA) and transfer RNA (tRNA). Regulatory RNA Non-coding RNA includes piwi-interacting RNA (piRNA), small nuclear RNA (snRNA) and small nucleolar RNA (snoRNA). Regulatory RNA includes microRNA (miRNA) - short single stranded RNA molecules that regulate gene expression, and small interfering RNA (siRNA). beta12orEarlier rRNA 1.3 One or more ribosomal RNA (rRNA) sequences. true tRNA 1.3 true One or more transfer RNA (tRNA) sequences. Protein secondary structure true beta12orEarlier 1.8 Protein secondary structure or secondary structure alignments. This includes assignment, analysis, comparison, prediction, rendering etc. of secondary structure data. RNA structure 1.3 RNA secondary or tertiary structure and alignments. beta12orEarlier true Protein tertiary structure 1.8 true Protein tertiary structures. beta12orEarlier Nucleic acid classification Classification of nucleic acid sequences and structures. 1.3 true beta12orEarlier Protein families beta12orEarlier Protein sequence classification Protein secondary databases A protein families database might include the classifier (e.g. a sequence profile) used to build the classification. Primarily the classification of proteins (from sequence or structural data) into clusters, groups, families etc., curation of a particular protein or protein family, or any other proteins that have been classified as members of a common group. Protein domains and folds beta12orEarlier Protein folds Protein tertiary structural domains and folds. Protein domains Nucleic acid sequence alignment beta12orEarlier true 1.3 Nucleotide sequence alignments. Protein sequence alignment 1.3 Protein sequence alignments. beta12orEarlier true A sequence profile typically represents a sequence alignment. Nucleic acid sites and features beta12orEarlier 1.3 true The archival, detection, prediction and analysis of positional features such as functional sites in nucleotide sequences. Protein sites and features beta12orEarlier The detection, identification and analysis of positional features in proteins, such as functional sites. 1.3 true Transcription factors and regulatory sites Transcription factor proteins either promote (as an activator) or block (as a repressor) the binding to DNA of RNA polymerase. Regulatory sites including transcription factor binding site as well as promoters, enhancers, silencers and boundary elements / insulators. Proteins that bind to DNA and control transcription of DNA to mRNA (transcription factors) and also transcriptional regulatory sites, elements and regions (such as promoters, enhancers, silencers and boundary elements / insulators) in nucleotide sequences. Transcriptional regulatory sites TFBS Transcription factors beta12orEarlier Transcription factor binding sites Phosphorylation sites 1.0 Protein phosphorylation and phosphorylation sites in protein sequences. true beta12orEarlier Metabolic pathways beta12orEarlier Metabolic pathways. Signaling pathways Signaling pathways. Signal transduction pathways beta12orEarlier Protein and peptide identification 1.3 beta12orEarlier true Workflows Biological or biomedical analytical workflows or pipelines. beta12orEarlier true 1.0 Data types and objects Structuring data into basic types and (computational) objects. beta12orEarlier 1.0 true Theoretical biology 1.3 true Mitochondria beta12orEarlier true Mitochondria, typically of mitochondrial genes and proteins. 1.3 Plants The resource may be specific to a plant, a group of plants or all plants. Plant science Plants, e.g. information on a specific plant genome including molecular sequences, genes and annotation. Plant biology Botany VT 1.5.22 Plant science Plant VT 1.5.10 Botany beta12orEarlier Viruses Virology VT 1.5.28 Virology beta12orEarlier Viruses, e.g. sequence and structural data, interactions of viral proteins, or a viral genome including molecular sequences, genes and annotation. The resource may be specific to a virus, a group of viruses or all viruses. Fungi Mycology beta12orEarlier The resource may be specific to a fungus, a group of fungi or all fungi. Yeast VT 1.5.21 Mycology Fungi and molds, e.g. information on a specific fungal genome including molecular sequences, genes and annotation. Pathogens Pathogens, e.g. information on a specific vertebrate genome including molecular sequences, genes and annotation. beta12orEarlier The resource may be specific to a pathogen, a group of pathogens or all pathogens. Arabidopsis beta12orEarlier Arabidopsis-specific data. 1.3 true Rice Rice-specific data. true 1.3 beta12orEarlier Genetic mapping and linkage Linkage mapping beta12orEarlier 1.3 true Genetic linkage Informatics resources that aim to identify, map or analyse genetic markers in DNA sequences, for example to produce a genetic (linkage) map of a chromosome or genome or to analyse genetic linkage and synteny. Comparative genomics The study (typically comparison) of the sequence, structure or function of multiple genomes. beta12orEarlier Mobile genetic elements Transposons beta12orEarlier Mobile genetic elements, such as transposons, Plasmids, Bacteriophage elements and Group II introns. Human disease Human diseases, typically describing the genes, mutations and proteins implicated in disease. beta13 true beta12orEarlier Immunology VT 3.1.3 Immunology Immunoinformatics http://purl.bioontology.org/ontology/MSH/D007120 http://purl.bioontology.org/ontology/MSH/D007125 beta12orEarlier Computational immunology The application of information technology to immunology such as immunological processes, immunological genes, proteins and peptide ligands, antigens and so on. Membrane and lipoproteins Lipoproteins (protein-lipid assemblies), and proteins or region of a protein that spans or are associated with a membrane. beta12orEarlier Membrane proteins Lipoproteins Transmembrane proteins Enzymes Proteins that catalyze chemical reaction, the kinetics of enzyme-catalysed reactions, enzyme nomenclature etc. beta12orEarlier Enzymology Primers PCR primers and hybridization oligos in a nucleic acid sequence. Nucleic acid features (primers) beta12orEarlier Primer binding sites PolyA signal or sites beta12orEarlier Nucleic acid features (PolyA signal or site) PolyA signal A polyA signal is required for endonuclease cleavage of an RNA transcript that is followed by polyadenylation. A polyA site is a site on an RNA transcript to which adenine residues will be added during post-transcriptional polyadenylation. PolyA site Regions or sites in a eukaryotic and eukaryotic viral RNA sequence which directs endonuclease cleavage or polyadenylation of an RNA transcript. CpG island and isochores beta12orEarlier Nucleic acid features (CpG island and isochore) CpG rich regions (isochores) in a nucleotide sequence. Restriction sites Restriction enzyme recognition sites (restriction sites) in a nucleic acid sequence. Nucleic acid features (restriction sites) beta12orEarlier Nucleic acid restriction sites (report) Splice sites Nucleic acid features (splice sites) Nucleic acid report (RNA splicing) beta12orEarlier Splice sites in a nucleotide sequence or alternative RNA splicing events. Nucleic acid report (RNA splice model) Matrix/scaffold attachment sites Nucleic acid features (matrix/scaffold attachment sites) beta12orEarlier Matrix/scaffold attachment regions (MARs/SARs) in a DNA sequence. Operon Gene features (operon) beta12orEarlier Nucleic acid features (operon) The report for a query sequence or gene might include the predicted operon leader and trailer gene, gene composition of the operon and associated information, as well as information on the query. Operons (operators, promoters and genes) from a bacterial genome. Promoters Whole promoters or promoter elements (transcription start sites, RNA polymerase binding site, transcription factor binding sites, promoter enhancers etc) in a DNA sequence. beta12orEarlier Nucleic acid features (promoters) Structural biology Structural assignment Structure determination This includes experimental methods for biomolecular structure determination, such as X-ray crystallography, nuclear magnetic resonance (NMR), circular dichroism (CD) spectroscopy, microscopy etc., including the assignment or modelling of molecular structure from such data. 1.3 This includes Informatics concerning data generated from the use of microscopes, including optical, electron and scanning probe microscopy. Includes methods for digitizing microscope images and viewing the produced virtual slides and associated data on a computer screen. The molecular structure of biological molecules, particularly macromolecules such as proteins and nucleic acids. VT 1.5.24 Structural biology Structural determination Protein membrane regions 1.8 Protein features (membrane regions) This might include the location and size of the membrane spanning segments and intervening loop regions, transmembrane region IN/OUT orientation relative to the membrane, plus the following data for each amino acid: A Z-coordinate (the distance to the membrane center), the free energy of membrane insertion (calculated in a sliding window over the sequence) and a reliability score. The z-coordinate implies information about re-entrant helices, interfacial helices, the tilt of a transmembrane helix and loop lengths. Intramembrane regions Trans- or intra-membrane regions of a protein, typically describing physicochemical properties of the secondary structure elements. Protein transmembrane regions Transmembrane regions Structure comparison This might involve comparison of secondary or tertiary (3D) structural information. The comparison of two or more molecular structures, for example structure alignment and clustering. beta12orEarlier Function analysis Protein function prediction The study of gene and protein function including the prediction of functional properties of a protein. Protein function analysis beta12orEarlier Prokaryotes and archae The resource may be specific to a prokaryote, a group of prokaryotes or all prokaryotes. VT 1.5.2 Bacteriology Bacteriology beta12orEarlier Specific bacteria or archaea, e.g. information on a specific prokaryote genome including molecular sequences, genes and annotation. Protein databases true 1.3 Protein data resources. beta12orEarlier Protein data resources Structure determination Experimental methods for biomolecular structure determination, such as X-ray crystallography, nuclear magnetic resonance (NMR), circular dichroism (CD) spectroscopy, microscopy etc., including the assignment or modelling of molecular structure from such data. beta12orEarlier true 1.3 Cell biology beta12orEarlier VT 1.5.11 Cell biology Cells, such as key genes and proteins involved in the cell cycle. Classification beta13 beta12orEarlier Topic focused on identifying, grouping, or naming things in a structured way according to some schema based on observable relationships. true Lipoproteins true 1.3 beta12orEarlier Lipoproteins (protein-lipid assemblies). Phylogeny visualisation true Visualise a phylogeny, for example, render a phylogenetic tree. beta12orEarlier beta12orEarlier Cheminformatics The application of information technology to chemistry in biological research environment. Chemical informatics beta12orEarlier Chemoinformatics Systems biology http://en.wikipedia.org/wiki/Systems_biology This includes databases of models and methods to construct or analyse a model. Biological models http://purl.bioontology.org/ontology/MSH/D049490 beta12orEarlier Biological modelling Biological system modelling The holistic modelling and analysis of complex biological systems and the interactions therein. Statistics and probability Biostatistics The application of statistical methods to biological problems. http://en.wikipedia.org/wiki/Biostatistics beta12orEarlier http://purl.bioontology.org/ontology/MSH/D056808 Structure database search The query is a structure-based entity such as another structure, a 3D (structural) motif, 3D profile or template. beta12orEarlier Search for and retrieve molecular structures that are similar to a structure-based query (typically another structure or part of a structure). beta12orEarlier true Molecular modelling Homology modeling Comparative modeling Comparative modelling beta12orEarlier Homology modelling Molecular modeling The construction, analysis, evaluation, refinement etc. of models of a molecules properties or behaviour. Protein function prediction 1.2 beta12orEarlier true The prediction of functional properties of a protein. SNP Single nucleotide polymorphisms (SNP) and associated data, for example, the discovery and annotation of SNPs. beta12orEarlier Single nucleotide polymorphism A SNP is a DNA sequence variation where a single nucleotide differs between members of a species or paired chromosomes in an individual. Transmembrane protein prediction Predict transmembrane domains and topology in protein sequences. beta12orEarlier beta12orEarlier true Nucleic acid structure comparison The comparison two or more nucleic acid (typically RNA) secondary or tertiary structures. beta12orEarlier true beta12orEarlier Use this concept for methods that are exclusively for nucleic acid structures. Exons Gene features (exon) beta12orEarlier Exons in a nucleotide sequences. Gene transcription features GC signals (report) CAAT signals (report) -35 signals (report) Gene transcriptional features This includes promoters, CAAT signals, TATA signals, -35 signals, -10 signals, GC signals, primer binding sites for initiation of transcription or reverse transcription, enhancer, attenuator, terminators and ribosome binding sites. Enhancers (report) Terminators (report) Transcription of DNA into RNA including the regulation of transcription. Ribosome binding sites (report) -10 signals (report) beta12orEarlier TATA signals (report) Attenuators (report) DNA mutation Mutation annotation beta12orEarlier DNA mutation. Nucleic acid features (mutation) Oncology beta12orEarlier VT 3.2.16 Oncology Cancer The study of cancer, for example, genes and proteins implicated in cancer. Cancer biology Toxins and targets Toxins Targets beta12orEarlier Structural and associated data for toxic chemical substances. Introns Gene features (intron) Nucleic acid features (intron) Introns in a nucleotide sequences. beta12orEarlier Tool topic beta12orEarlier A topic concerning primarily bioinformatics software tools, typically the broad function or purpose of a tool. true beta12orEarlier Study topic A general area of bioinformatics study, typically the broad scope or category of content of a bioinformatics journal or conference proceeding. beta12orEarlier true beta12orEarlier Nomenclature true 1.3 beta12orEarlier Biological nomenclature (naming), symbols and terminology. Disease genes and proteins 1.3 true beta12orEarlier The genes, gene variations and proteins involved in one or more specific diseases. Protein structure analysis Protein structure Protein secondary or tertiary structural data and/or associated annotation. http://edamontology.org/topic_3040 beta12orEarlier Humans beta12orEarlier true The human genome, including molecular sequences, genes, annotation, maps and viewers, the human proteome or human beings in general. Gene resources Gene resource beta12orEarlier 1.3 Informatics resource (typically a database) primarily focussed on genes. Gene database true Yeast beta12orEarlier Yeast, e.g. information on a specific yeast genome including molecular sequences, genes and annotation. true 1.3 Eukaryotes Eukaryote Eukaryotes or data concerning eukaryotes, e.g. information on a specific eukaryote genome including molecular sequences, genes and annotation. The resource may be specific to a eukaryote, a group of eukaryotes or all eukaryotes. beta12orEarlier Invertebrates The resource may be specific to an invertebrate, a group of invertebrates or all invertebrates. beta12orEarlier Invertebrates, e.g. information on a specific invertebrate genome including molecular sequences, genes and annotation. Vertebrates The resource may be specific to a vertebrate, a group of vertebrates or all vertebrates. Vertebrates, e.g. information on a specific vertebrate genome including molecular sequences, genes and annotation. beta12orEarlier Unicellular eukaryotes Unicellular eukaryotes, e.g. information on a unicellular eukaryote genome including molecular sequences, genes and annotation. beta12orEarlier The resource may be specific to a unicellular eukaryote, a group of unicellular eukaryotes or all unicellular eukaryotes. Protein structure alignment Protein secondary or tertiary structure alignments. beta12orEarlier true 1.3 X-ray diffraction The study of matter and their structure by means of the diffraction of X-rays, typically the diffraction pattern caused by the regularly spaced atoms of a crystalline sample. beta12orEarlier X-ray microscopy Crystallography X-ray crystallography Ontologies, nomenclature and classification true Conceptualisation, categorisation and naming of entities or phenomena within biology or bioinformatics. 1.3 http://purl.bioontology.org/ontology/MSH/D002965 beta12orEarlier Immunoproteins, genes and antigens Immunopeptides Immunity-related genes, proteins and their ligands. Antigens This includes T cell receptors (TR), major histocompatibility complex (MHC), immunoglobulin superfamily (IgSF) / antibodies, major histocompatibility complex superfamily (MhcSF), etc." beta12orEarlier Immunoproteins Immunogenes Molecules CHEBI:23367 beta12orEarlier beta12orEarlier Specific molecules, including large molecules built from repeating subunits (macromolecules) and small molecules of biological significance. true Toxicology Toxins and the adverse effects of these chemical substances on living organisms. VT 3.1.9 Toxicology Toxicoinformatics Toxicology beta12orEarlier Computational toxicology High-throughput sequencing Next-generation sequencing beta13 true beta12orEarlier Parallelized sequencing processes that are capable of sequencing many thousands of sequences simultaneously. Structural clustering The comparison and grouping together of molecular structures on the basis of similarity; generate, process or analyse structural clusters. 1.7 Structure classification true beta12orEarlier Gene regulatory networks Gene regulatory networks. beta12orEarlier Disease (specific) Informatics resources dedicated to one or more specific diseases (not diseases in general). beta12orEarlier true beta12orEarlier VNTR Nucleic acid features (VNTR) Variable number of tandem repeat polymorphism Variable number of tandem repeat (VNTR) polymorphism in a DNA sequence. beta12orEarlier VNTR annotation VNTRs occur in non-coding regions of DNA and consists sub-sequence that is repeated a multiple (and varied) number of times. Microsatellites beta12orEarlier Nucleic acid features (microsatellite) A microsatellite polymorphism is a very short subsequence that is repeated a variable number of times between individuals. These repeats consist of the nucleotides cytosine and adenosine. Microsatellite annotation Microsatellite polymorphism in a DNA sequence. RFLP Restriction fragment length polymorphisms (RFLP) in a DNA sequence. An RFLP is defined by the presence or absence of a specific restriction site of a bacterial restriction enzyme. RFLP annotation beta12orEarlier Nucleic acid features (RFLP) DNA polymorphism Nucleic acid features (polymorphism) DNA polymorphism. Polymorphism annotation beta12orEarlier Nucleic acid design Topic for the design of nucleic acid sequences with specific conformations. 1.3 beta12orEarlier true Primer or probe design 1.3 true beta13 The design of primers for PCR and DNA amplification or the design of molecular probes. Structure databases beta13 true 1.2 Structure data resources Molecular secondary or tertiary (3D) structural data resources, typically of proteins and nucleic acids. Nucleic acid structure true beta13 Nucleic acid (secondary or tertiary) structure, such as whole structures, structural features and associated annotation. 1.2 Sequence databases Molecular sequence data resources, including sequence sites, alignments, motifs and profiles. true beta13 Sequence data resources Sequence data Sequence data resource 1.3 Nucleic acid sequences Nucleotide sequences and associated concepts such as sequence sites, alignments, motifs and profiles. beta13 1.3 true Nucleotide sequences Protein sequences Protein sequences and associated concepts such as sequence sites, alignments, motifs and profiles. beta13 1.3 true Protein interaction networks 1.3 true Molecular biology VT 1.5.4 Biochemistry and molecular biology beta13 The molecular basis of biological activity, particularly the macromolecules (e.g. proteins and nucleic acids) that are essential to life. Mammals true beta13 1.3 Mammals, e.g. information on a specific mammal genome including molecular sequences, genes and annotation. Biodiversity The degree of variation of life forms within a given ecosystem, biome or an entire planet. beta13 VT 1.5.5 Biodiversity conservation http://purl.bioontology.org/ontology/MSH/D044822 Sequence clusters and classification This includes the results of sequence clustering, ortholog identification, assignment to families, annotation etc. The comparison, grouping together and classification of macromolecules on the basis of sequence similarity. Sequence families 1.3 true Sequence clusters beta13 Genetics http://purl.bioontology.org/ontology/MSH/D005823 The study of genes, genetic variation and heredity in living organisms. beta13 Heredity Quantitative genetics beta13 The genes and genetic mechanisms such as Mendelian inheritance that underly continuous phenotypic traits (such as height or weight). Population genetics The distribution of allele frequencies in a population of organisms and its change subject to evolutionary processes including natural selection, genetic drift, mutation and gene flow. beta13 Regulatory RNA 1.3 Regulatory RNA sequences including microRNA (miRNA) and small interfering RNA (siRNA). true beta13 Documentation and help The documentation of resources such as tools, services and databases and how to get help. Help beta13 Documentation Genetic organisation The structural and functional organisation of genes and other genetic elements. 1.3 beta13 true Medical informatics Health informatics Clinical informatics Biomedical informatics Translational medicine The application of information technology to health, disease and biomedicine. Healthcare informatics beta13 Health and disease Molecular medicine Developmental biology VT 1.5.14 Developmental biology beta13 How organisms grow and develop. Embryology beta13 The development of organisms between the one-cell stage (typically the zygote) and the end of the embryonic stage. Anatomy VT 3.1.1 Anatomy and morphology beta13 The form and function of the structures of living organisms. Literature and reference Literature search beta13 The scientific literature, reference information and documentation. Literature sources http://purl.bioontology.org/ontology/MSH/D011642 Biology VT 1.5.8 Biology beta13 VT 1.5 Biological sciences VT 1.5.23 Reproductive biology Cryobiology Biological rhythms A particular biological science, especially observable traits such as aspects of biochemistry, physiology, morphology, anatomy, development and so on. VT 1.5.7 Biological rhythm Biological science Aerobiology VT 1.5.99 Other Chronobiology VT 1.5.13 Cryobiology VT 1.5.1 Aerobiology VT 1.5.3 Behavioural biology Reproductive biology Behavioural biology Data management The development and use of architectures, policies, practices and procedures for management of data. beta13 Data handling http://purl.bioontology.org/ontology/MSH/D030541 VT 1.3.1 Data management Sequence feature detection 1.3 true beta13 The detection of the positional features, such as functional and other key sites, in molecular sequences. http://purl.bioontology.org/ontology/MSH/D058977 Nucleic acid feature detection The detection of positional features such as functional sites in nucleotide sequences. true beta13 1.3 Protein feature detection The detection, identification and analysis of positional protein sequence features, such as functional sites. beta13 1.3 true Biological system modelling 1.2 true beta13 Topic for modelling biological systems in mathematical terms. Data acquisition The acquisition of data, typically measurements of physical systems using any type of sampling system, or by another other means. beta13 Genes and proteins resources 1.3 Gene family beta13 Gene and protein families Specific genes and/or their encoded proteins or a family or other grouping of related genes and proteins. true Protein topological domains Topological domains such as cytoplasmic regions in a protein. Protein features (topological domains) 1.8 Protein variants protein sequence variants produced e.g. from alternative splicing, alternative promoter usage, alternative initiation and ribosomal frameshifting. beta13 Expression signals beta13 Nucleic acid features (expression signal) Regions within a nucleic acid sequence containing a signal that alters a biological function. DNA binding sites This includes ribosome binding sites (Shine-Dalgarno sequence in prokaryotes). beta13 Nucleic acid features (binding) Nucleic acids binding to some other molecule. Nucleic acid repeats beta13 This includes long terminal repeats (LTRs); sequences (typically retroviral) directly repeated at both ends of a defined sequence and other types of repeating unit. Repetitive elements within a nucleic acid sequence. DNA replication and recombination DNA replication or recombination. This includes binding sites for initiation of replication (origin of replication), regions where transfer is initiated during the conjugation or mobilization (origin of transfer), starting sites for DNA duplication (origin of replication) and regions which are eliminated through any of kind of recombination. Nucleosome exclusion sequences Nucleic acid features (replication and recombination) beta13 Signal or transit peptide beta13 Nucleic acid features (signal or transit peptide) A signal peptide coding sequence encodes an N-terminal domain of a secreted protein, which is involved in attaching the polypeptide to a membrane leader sequence. A transit peptide coding sequence encodes an N-terminal domain of a nuclear-encoded organellar protein; which is involved in import of the protein into the organelle. Coding sequences for a signal or transit peptide. Sequence tagged sites Nucleic acid features (STS) beta13 Sequence tagged sites are short DNA sequences that are unique within a genome and serve as a mapping landmark, detectable by PCR they allow a genome to be mapped via an ordering of STSs. Sequence tagged sites (STS) in nucleic acid sequences. Sequencing http://purl.bioontology.org/ontology/MSH/D059014 1.1 NGS Next generation sequencing The determination of complete (typically nucleotide) sequences, including those of genomes (full genome sequencing, de novo sequencing and resequencing), amplicons and transcriptomes. Next gen sequencing ChIP-seq true 1.3 Chip sequencing Chip seq 1.1 The analysis of protein-DNA interactions where chromatin immunoprecipitation (ChIP) is used in combination with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. Chip-sequencing RNA-Seq Small RNA-seq Whole transcriptome shotgun sequencing RNA-seq 1.1 1.3 A topic concerning high-throughput sequencing of cDNA to measure the RNA content (transcriptome) of a sample, for example, to investigate how different alleles of a gene are expressed, detect post-transcriptional mutations or identify gene fusions. Small RNA-Seq WTSS This includes small RNA profiling (small RNA-Seq), for example to find novel small RNAs, characterize mutations and analyze expression of small RNAs. true DNA methylation true DNA methylation including bisulfite sequencing, methylation sites and analysis, for example of patterns and profiles of DNA methylation in a population, tissue etc. 1.3 http://purl.bioontology.org/ontology/MSH/D019175 1.1 Metabolomics The systematic study of metabolites, the chemical processes they are involved, and the chemical fingerprints of specific cellular processes in a whole cell, tissue, organ or organism. http://purl.bioontology.org/ontology/MSH/D055432 1.1 Epigenomics Epigenetics concerns the heritable changes in gene expression owing to mechanisms other than DNA sequence variation. 1.1 http://purl.bioontology.org/ontology/MSH/D057890 The study of the epigenetic modifications of a whole cell, tissue, organism etc. Metagenomics http://purl.bioontology.org/ontology/MSH/D056186 Ecogenomics Community genomics Environmental genomics 1.1 The study of genetic material recovered from environmental samples, and associated environmental data. Structural variation 1.1 Variation in chromosome structure including microscopic and submicroscopic types of variation such as deletions, duplications, copy-number variants, insertions, inversions and translocations. Genomic structural variation DNA packaging beta12orEarlier DNA-histone complexes (chromatin), organisation of chromatin into nucleosomes and packaging into higher-order structures. http://purl.bioontology.org/ontology/MSH/D042003 DNA-Seq 1.1 A topic concerning high-throughput sequencing of randomly fragmented genomic DNA, for example, to investigate whole-genome sequencing and resequencing, SNP discovery, identification of copy number variations and chromosomal rearrangements. 1.3 DNA-seq true RNA-Seq alignment true 1.3 RNA-seq alignment The alignment of sequences of (typically millions) of short reads to a reference genome. This is a specialised topic within sequence alignment, especially because of complications arising from RNA splicing. beta12orEarlier ChIP-on-chip true 1.3 1.1 Experimental techniques that combine chromatin immunoprecipitation ('ChIP') with microarray ('chip'). ChIP-on-chip is used for high-throughput study protein-DNA interactions. ChIP-chip Data security 1.3 Data privacy The protection of data, such as patient health data, from damage or unwanted access from unauthorized users. Sample collections samples biobanking 1.3 biosamples Biological samples and specimens. Specimen collections Biochemistry VT 1.5.4 Biochemistry and molecular biology Chemical biology 1.3 Biological chemistry Chemical substances and physico-chemical processes and that occur within living organisms. Phylogenetics The study of evolutionary relationships amongst organisms from analysis of genetic information (typically gene or protein sequences). 1.3 http://purl.bioontology.org/ontology/MSH/D010802 Epigenetics Topic concerning the study of heritable changes, for example in gene expression or phenotype, caused by mechanisms other than changes in the DNA sequence. DNA methylation This includes sub-topics such as histone modification and DNA methylation. http://purl.bioontology.org/ontology/MSH/D019175 Histone modification 1.3 Biotechnology 1.3 The exploitation of biological process, structure and function for industrial purposes, for example the genetic manipulation of microorganisms for the antibody production. Phenomics Phenomes, or the study of the change in phenotype (the physical and biochemical traits of organisms) in response to genetic and environmental factors. 1.3 Evolutionary biology VT 1.5.16 Evolutionary biology 1.3 The evolutionary processes, from the genetic to environmental scale, that produced life in all its diversity. Physiology The functions of living organisms and their constituent parts. 1.3 VT 3.1.8 Physiology Microbiology The biology of microorganisms. 1.3 VT 1.5.20 Microbiology Parasitology 1.3 The biology of parasites. Medicine General medicine Research in support of healing by diagnosis, treatment, and prevention of disease. 1.3 VT 3.1 Basic medicine VT 3.2.9 General and internal medicine Experimental medicine Biomedical research Clinical medicine VT 3.2 Clinical medicine Internal medicine Neurobiology Neuroscience 1.3 The study of the nervous system and brain; its anatomy, physiology and function. VT 3.1.5 Neuroscience Public health and epidemiology VT 3.3.1 Epidemiology Topic concerning the the patterns, cause, and effect of disease within populations. 1.3 Public health Epidemiology Biophysics 1.3 VT 1.5.9 Biophysics The use of physics to study biological system. Computational biology VT 1.5.19 Mathematical biology VT 1.5.12 Computational biology This includes the modeling and treatment of biological processes and systems in mathematical terms (theoretical biology). Mathematical biology VT 1.5.26 Theoretical biology Theoretical biology 1.3 The development and application of theory, analytical methods, mathematical models and computational simulation of biological systems. Biomathematics Transcriptomics The analysis of transcriptomes, or a set of all the RNA molecules in a specific cell, tissue etc. Transcriptome 1.3 Chemistry VT 1.7.10 Polymer science VT 1.7.7 Mathematical chemistry VT 1.7.3 Colloid chemistry 1.3 Mathematical chemistry Physical chemistry VT 1.7.9 Physical chemistry Polymer science Chemical science Organic chemistry VT 1.7.6 Inorganic and nuclear chemistry VT 1.7 Chemical sciences VT 1.7.5 Electrochemistry Inorganic chemistry VT 1.7.2 Chemistry Nuclear chemistry VT 1.7.8 Organic chemistry The composition and properties of matter, reactions, and the use of reactions to create new substances. Mathematics The study of numbers (quantity) and other topics including structure, space, and change. VT:1.1 Mathematics Maths VT 1.1.99 Other 1.3 Computer science 1.3 VT 1.2 Computer sciences VT 1.2.99 Other The theory and practical use of computer systems. Physics The study of matter, space and time, and related concepts such as energy and force. 1.3 RNA splicing RNA splicing; post-transcription RNA modification involving the removal of introns and joining of exons. This includes the study of splice sites, splicing patterns, splice alternatives or variants, isoforms, etc. 1.3 Molecular genetics 1.3 The structure and function of genes at a molecular level. Respiratory medicine VT 3.2.25 Respiratory systems Pulmonology The study of respiratory system. Pulmonary medicine Respiratory disease 1.3 Pulmonary disorders Metabolic disease The study of metabolic diseases. 1.4 1.3 true Infectious disease Transmissable disease VT 3.3.4 Infectious diseases Communicable disease The branch of medicine that deals with the prevention, diagnosis and management of transmissable disease with clinically evident illness resulting from infection with pathogenic biological agents (viruses, bacteria, fungi, protozoa, parasites and prions). 1.3 Rare diseases 1.3 The study of rare diseases. Computational chemistry 1.3 VT 1.7.4 Computational chemistry Topic concerning the development and application of theory, analytical methods, mathematical models and computational simulation of chemical systems. Neurology Neurological disorders 1.3 The branch of medicine that deals with the anatomy, functions and disorders of the nervous system. Cardiology Cardiovascular disease VT 3.2.4 Cardiac and Cardiovascular systems 1.3 Cardiovascular medicine Heart disease VT 3.2.22 Peripheral vascular disease The diseases and abnormalities of the heart and circulatory system. Drug discovery The discovery and design of drugs or potential drug compounds. This includes methods that search compound collections, generate or analyse drug 3D conformations, identify drug targets with structural docking etc. 1.3 Biobank biobanking 1.3 Repositories of biological samples, typically human, for basic biological and clinical research. Tissue collection Mouse clinic 1.3 Laboratory study of mice, for example, phenotyping, and mutagenesis of mouse cell lines. Microbial collection Collections of microbial cells including bacteria, yeasts and moulds. 1.3 Cell culture collection 1.3 Collections of cells grown under laboratory conditions, specifically, cells from multi-cellular eukaryotes and especially animal cells. Clone library 1.3 Collections of DNA, including both collections of cloned molecules, and populations of micro-organisms that store and propagate cloned DNA. Translational medicine 'translating' the output of basic and biomedical research into better diagnostic tools, medicines, medical procedures, policies and advice. 1.3 Compound libraries and screening Translational medicine Chemical library Collections of chemicals, typically for use in high-throughput screening experiments. Compound library Chemical screening 1.3 Biomedical science Topic concerning biological science that is (typically) performed in the context of medicine. VT 3.3 Health sciences Health science 1.3 Data identity and mapping Topic concerning the identity of biological entities, or reports on such entities, and the mapping of entities and records in different databases. 1.3 Sequence search 1.3 Sequence database search The search and retrieval from a database on the basis of molecular sequence similarity. Biomarkers Diagnostic markers 1.4 Objective indicators of biological state often used to assess health, and determinate treatment. Laboratory techniques The procedures used to conduct an experiment. Lab techniques 1.4 Data architecture, analysis and design The development of policies, models and standards that cover data acquisitioin, storage and integration, such that it can be put to use, typically through a process of systematically applying statistical and / or logical techniques to describe, illustrate, summarise or evaluate data. Data analysis Data design 1.4 Data architecture Data integration and warehousing The combination and integration of data from different sources, for example into a central repository or warehouse, to provide users with a unified view of these data. Data integration 1.4 Data warehousing Biomaterials Any matter, surface or construct that interacts with a biological system. Diagnostic markers 1.4 Chemical biology 1.4 The use of synthetic chemistry to study and manipulate biological systems. Analytical chemistry 1.4 The study of the separation, identification, and quantification of the chemical components of natural and artificial materials. VT 1.7.1 Analytical chemistry Synthetic chemistry Synthetic organic chemistry The use of chemistry to create new compounds. 1.4 Software engineering VT 1.2.1 Algorithms Programming languages VT 1.2.7 Data structures Software development Software engineering Computer programming 1.4 1.2.12 Programming languages The process that leads from an original formulation of a computing problem to executable programs. Data structures Algorithms VT 1.2.14 Software engineering Drug development 1.4 Medicine development The process of bringing a new drug to market once a lead compounds has been identified through drug discovery. Drug development science Medicines development Drug formulation and delivery The process of formulating abd administering a pharmaceutical compound to achieve a therapeutic effect. Drug delivery Drug formulation 1.4 Pharmacokinetics and pharmacodynamics Pharmacodynamics Pharmacokinetics Drug distribution 1.4 Drug excretion The study of how a drug interacts with the body. Drug absorption ADME Drug metabolism Drug metabolism Medicines research and development Medicine research and development The discovery, development and approval of medicines. Health care research Drug discovery and development 1.4 Health care science Safety sciences 1.4 Drug safety The safety (or lack) of drugs and other medical interventions. Pharmacovigilence 1.4 Pharmacovigilence concerns safety once a drug has gone to market. The detection, assesment, understanding and prevention of adverse effects of medicines. Preclinical and clinical studies The testing of new medicines, vaccines or procedures on animals (preclinical) and humans (clinical) prior to their approval by regulatory authorities. Preclinical studies 1.4 Clinical studies Imaging This includes diffraction experiments that are based upon the interference of waves, typically electromagnetic waves such as X-rays or visible light, by some object being studied, typical in order to produce an image of the object or determine its structure. Microscopy imaging 1.4 Microscopy Diffraction experiment The visual representation of an object. Biological imaging The use of imaging techniques to understand biology. 1.4 Medical imaging VT 3.2.24 Radiology The use of imaging techniques for clinical purposes for medical research. 1.4 Radiology VT 3.2.14 Nuclear medicine Nuclear medicine VT 3.2.13 Medical imaging Light microscopy The use of optical instruments to magnify the image of an object. 1.4 Laboratory animal science 1.4 The use of animals and alternatives in experimental research. Marine biology 1.4 VT 1.5.18 Marine and Freshwater biology The study of organisms in the ocean or brackish waters. Molecular medicine The identification of molecular and genetic causes of disease and the development of interventions to correct them. 1.4 Nutritional science 1.4 VT 3.3.7 Nutrition and Dietetics Dietetics The study of the effects of food components on the metabolism, health, performance and disease resistance of humans and animals. It also includes the study of human behaviours related to food choices. Nutrition science Omics The collective characterisation and quantification of pools of biological molecules that translate into the structure, function, and dynamics of an organism or organisms. 1.4 Quality affairs The processes that need to be in place to ensure the quality of products for human or animal use. Good clinical practice Good manufacturing practice Quality assurance Good laboratory practice 1.4 Regulatory affairs The protection of public health by controlling the safety and efficacy of products in areas including pharmaceuticals, veterinary medicine, medical devices, pesticides, agrochemicals, cosmetics, and complementary medicines. 1.4 Regnerative medicine Stem cell research Biomedical approaches to clinical interventions that involve the use of stem cells. 1.4 Systems medicine 1.4 An interdisciplinary field of study that looks at the dynamic systems of the human body as part of an integrted whole, incoporating biochemical, physiological, and environmental interactions that sustain life. Veterinary medicine 1.4 Topic concerning the branch of medicine that deals with the prevention, diagnosis, and treatment of disease, disorder and injury in animals. Bioengineering 1.4 The application of biological concepts and methods to the analytical and synthetic methodologies of engineering. Diagnostic markers Geriatric medicine The branch of medicine dealing with the diagnosis, treatment and prevention of disease in older people, and the problems specific to aging. VT 3.2.10 Geriatrics and gerontology Ageing Aging Gerontology 1.4 Geriatrics Allergy, clinical immunology and immunotherapeutics. VT 3.2.1 Allergy Health issues related to the immune system and their prevention, diagnosis and mangement. 1.4 Immune disorders Clinical immunology Immunomodulators Allergy Immunotherapeutics Pain medicine Ageing 1.4 Algiatry The prevention of pain and the evaluation, treatment and rehabilitation of persons in pain. Anaesthesiology Anaesthetics Anaesthesia and anaesthetics. 1.4 VT 3.2.2 Anaesthesiology Critical care medicine Acute medicine Geriatrics VT 3.2.5 Critical care/Emergency medicine Emergency medicine 1.4 The multidisciplinary that cares for patients with acute, life-threatening illness or injury. Dermatology The branch of medicine that deals with prevention, diagnosis and treatment of disorders of the skin, scalp, hair and nails. Dermatological disorders 1.4 VT 3.2.7 Dermatology and venereal diseases Dentistry 1.4 The study, diagnosis, prevention and treatments of disorders of the oral cavity, maxillofacial area and adjacent structures. Ear, nose and throat medicine Otolaryngology 1.4 The branch of medicine that deals with the prevention, diagnosis, and treatment of disorders of the ear, nose and throat. Otorhinolaryngology Head and neck disorders VT 3.2.20 Otorhinolaryngology Audiovestibular medicine Endocrinology and metabolism 1.4 Metabolic disorders Metabolism Endocrinology The branch of medicine dealing with diseases of endocrine organs, hormone systems, their target organs, and disorders of the pathways of glucose and lipid metabolism. Endocrine disorders Haematology VT 3.2.11 Hematology The branch of medicine that deals with the blood, blood-forming organs and blood diseases. Haematological disorders 1.4 Blood disorders Gastroenterology The branch of medicine that deals with disorders of the oesophagus, stomach, duodenum, jejenum, ileum, large intestine, sigmoid colon and rectum. Gastrointestinal disorders VT 3.2.8 Gastroenterology and hepatology 1.4 Gender medicine The study of the biological and physiological differences between males and females and how they effect differences in disease presentation and management. 1.4 Gynaecology and obstetrics VT 3.2.15 Obstetrics and gynaecology 1.4 Gynaecology The branch of medicine that deals with the health of the female reproductive system, pregnancy and birth. Gynaecological disorders Obstetrics Hepatic and biliary medicine Hepatobiliary medicine Liver disorders 1.4 The branch of medicine that deals with the liver, gallbladder, bile ducts and bile. Infectious tropical disease The branch of medicine that deals with the infectious diseases of the tropics. 1.4 Trauma medicine 1.4 The branch of medicine that treats body wounds or shock produced by sudden physical injury, as from violence or accident. Medical toxicology The branch of medicine that deals with the diagnosis, management and prevention of poisoning and other adverse health effects caused by medications, occupational and environmental toxins, and biological agents. 1.4 Musculoskeletal medicine The branch of medicine that deals with the prevention, diagnosis, and treatment of disorders of the muscle, bone and connective tissue. It incorporates aspects of orthopaedics, rheumatology, rehabilitation medicine and pain medicine. VT 3.2.26 Rheumatology VT 3.2.19 Orthopaedics Musculoskeletal disorders Orthopaedics Rheumatology 1.4 Opthalmology Eye disoders VT 3.2.18 Optometry 1.4 Optometry VT 3.2.17 Ophthalmology Audiovestibular medicine The branch of medicine that deals with disorders of the eye, including eyelid, optic nerve/visual pathways and occular muscles. Paediatrics 1.4 The branch of medicine that deals with the medical care of infants, children and adolescents. VT 3.2.21 Paediatrics Child health Psychiatry The branch of medicine that deals with the mangement of mental illness, emotional disturbance and abnormal behaviour. 1.4 Psychiatric disorders VT 3.2.23 Psychiatry Mental health Reproductive health Reproductive disorders Audiovestibular medicine VT 3.2.3 Andrology Andrology 1.4 Family planning The health of the reproductive processes, functions and systems at all stages of life. Fertility medicine Surgery Transplantation VT 3.2.28 Transplantation The use of operative, manual and instrumental techniques on a patient to investigate and/or treat a pathological condition or help improve bodily function or appearance. 1.4 Urology and nephrology The branches of medicine and physiology focussing on the function and disorders of the urinary system in males and females, the reproductive system in males, and the kidney. VT 3.2.29 Urology and nephrology 1.4 Urology Kidney disease Urological disorders Nephrology Complementary medicine Medical therapies that fall beyond the scope of conventional medicine but may be used alongside it in the treatment of disease and ill health. VT 3.2.12 Integrative and Complementary medicine Holistic medicine 1.4 Alternative medicine Integrative medicine MRI Nuclear magnetic resonance imaging 1.7 MRT Magnetic resonance tomography Techniques that uses magnetic fields and radiowaves to form images, typically to investigate the anatomy and physiology of the human body. NMRI Magnetic resonance imaging Neutron diffraction The study of matter by studying the diffraction pattern from firing neutrons at a sample, typically to determine atomic and/or magnetic structure. Neutron microscopy Elastic neutron scattering 1.7 Neutron diffraction experiment Tomography X-ray tomography Imaging in sections (sectioning), through the use of a wave-generating device (tomograph) that generates an image (a tomogram). Electron tomography 1.7 Data mining 1.7 VT 1.3.2 Data mining The discovery of patterns in large data sets and the extraction and trasnsformation of those patterns into a useful format. KDD Knowledge discovery in databases Machine learning A topic concerning the application of artificial intelligence methods to algorithms, in order to create methods that can learn from data in order to generate an ouput, rather than relying on explicitly encoded information only. Artificial Intelligence 1.7 VT 1.2.2 Artificial Intelligence (expert systems, machine learning, robotics) Database management 1.8 Data maintenance Databases Database administration The general handling of data stored in digital archives such as databanks, databases proper, web portals and other data resources. This includes databases for the results of scientific experiments, the application of high-throughput technology, computational analysis and the scientific literature. Biological databases Animals 1.8 Animal biology Animals, e.g. information on a specific animal genome including molecular sequences, genes and annotation. Zoology Animal VT 1.5.29 Zoology The resource may be specific to a plant, a group of plants or all plants. Metazoa Protein sites, features and motifs Protein sequence features Protein functional sites 1.8 The biology, archival, detection, prediction and analysis of positional features such as functional and other key sites, in protein sequences and the conserved patterns (motifs, profiles etc.) that may be used to describe them. Nucleic acid sites, features and motifs Nucleic acid sequence features 1.8 Nucleic acid functional sites The biology, archival, detection, prediction and analysis of positional features such as functional and other key sites, in nucleic acid sequences and the conserved patterns (motifs, profiles etc.) that may be used to describe them. Gene transcript features Nucleic acid features (mRNA features) Features of a messenger RNA (mRNA) molecules including precursor RNA, primary (unprocessed) transcript and fully processed molecules. mRNA features This includes 5'untranslated region (5'UTR), coding sequences (CDS), exons, intervening sequences (intron) and 3'untranslated regions (3'UTR). 1.8 Protein-ligand interactions 1.8 Protein-ligand (small molecule) interaction(s). Protein-drug interactions 1.8 Protein-drug interaction(s). Genotyping experiment 1.8 Genotype experiment including case control, population, and family studies. These might use array based methods and re-sequencing methods. GWAS study 1.8 Genome-wide association study experiments. Genome-wide association study Microarray experiment 1.8 This might specify which raw data file relates to which sample and information on hybridisations, e.g. which are technical and which are biological replicates. Microarray experiments including conditions, protocol, sample:data relationships etc. PCR experiment 1.8 PCR experiments, e.g. quantitative real-time PCR. Proteomics experiment Proteomics experiments. 1.8 2D PAGE experiment Two-dimensional gel electrophoresis experiments, gels or spots in a gel. 1.8 Northern blot experiment Northern Blot experiments. 1.8 RNAi experiment 1.8 RNAi experiments. Simulation experiment 1.8 Biological computational model experiments (simulation), for example the minimum information required in order to permit its correct interpretation and reproduction. Protein-nucleic acid interactions 1.8 Protein-DNA/RNA interaction(s). Protein-protein interactions Domain-domain interactions Protein-protein interaction(s), including interactions between protein domains. 1.8 Protein interaction networks Cellular process pathways 1.8 Cellular process pathways. Disease pathways Disease pathways, typically of human disease. Pathway or network (disease) 1.8 Environmental information processing pathways Environmental information processing pathways. 1.8 Pathway or network (environmental information processing) Genetic information processing pathways Pathway or network (genetic information processing) 1.8 Genetic information processing pathways. Protein super-secondary structure Super-secondary structure of protein sequence(s). Protein features (super-secondary) 1.8 Super-secondary structures include leucine zippers, coiled coils, Helix-Turn-Helix etc. Protein active sites Enzyme active site 1.8 Protein features (active sites) Catalytic residues (active site) of an enzyme. Protein binding sites Ligand-binding (non-catalytic) residues of a protein, such as sites that bind metal, prosthetic groups or lipids. 1.8 Protein features (binding sites) Protein-nucleic acid binding sites RNA and DNA-binding proteins and binding sites in protein sequences. 1.8 Protein features (nucleic acid binding sites) Protein cleavage sites Cleavage sites (for a proteolytic enzyme or agent) in a protein sequence. Protein features (cleavage sites) 1.8 Protein chemical modifications Chemical modification of a protein. Protein features (chemical modifications) MOD:00000 1.8 GO:0006464 Protein disordered structure Disordered structure in a protein. 1.8 Protein features (disordered structure) Protein domains The report will typically include a graphic of the location of domains in a sequence, with associated data such as lists of related sequences, literature references, etc. Structural domains or 3D folds in a protein or polypeptide chain. 1.8 Protein structural domains Protein features (domains) Protein key folding sites Protein features (key folding sites) 1.8 Key residues involved in protein folding. Protein post-translational modifications Protein features (post-translation modifications) Post-translation modifications Post-translation modifications in a protein sequence, typically describing the specific sites involved. 1.8 Protein secondary structure The location and size of the secondary structure elements and intervening loop regions is typically given. The report can include disulphide bonds and post-translationally formed peptide bonds (crosslinks). Secondary structure (predicted or real) of a protein. Protein features (secondary structure) 1.8 Protein sequence repeats 1.8 Protein features (repeats) Short repetitive subsequences (repeat sequences) in a protein sequence. Protein repeats Protein signal peptides Protein features (signal peptides) Signal peptides or signal peptide cleavage sites in protein sequences. 1.8 Applied mathematics VT 1.1.1 Applied mathematics The application of mathematics to specific problems in science, typically by the formulation and analysis of mathematical models. 1.10 Pure mathematics VT 1.1.1 Pure mathematics The study of abstract mathematical concepts. 1.10 Data governance Data handling http://purl.bioontology.org/ontology/MSH/D030541 The control of data entry and maintenance to ensure the data meets defined standards, qualities or constraints. 1.10 Data stewardship Data quality management http://purl.bioontology.org/ontology/MSH/D030541 1.10 Data quality Data integrity Data clean-up Data enrichment The quality, integrity, cleaning up and enrichment of data. Freshwater biology 1.10 VT 1.5.18 Marine and Freshwater biology The study of organisms in freshwater ecosystems. Human genetics The study of inheritatnce in human beings. VT 3.1.2 Human genetics 1.10 Tropical medicine 1.10 Health problems that are prevalent in tropical and subtropical regions. VT 3.3.14 Tropical medicine Medical biotechnology 1.10 VT 3.4.1 Biomedical devices VT 3.4.2 Health-related biotechnology VT 3.4 Medical biotechnology VT 3.3.14 Tropical medicine Pharmaceutical biotechnology Biotechnology applied to the medical sciences and the development of medicines. Personalized medicine 1.10 Health problems that are prevalent in tropical and subtropical regions. Molecular diagnostics VT 3.4.5 Molecular diagnostics Obsolete concept (EDAM) 1.2 Needed for conversion to the OBO format. An obsolete concept (redefined in EDAM). true schema-salad-2.6.20171201034858/schema_salad/tests/test_print_oneline.py0000644000175100017510000001365013203345013025600 0ustar peterpeter00000000000000from .util import get_data import unittest from schema_salad.main import to_one_line_messages, reformat_yaml_exception_message from schema_salad.schema import load_schema, load_and_validate from schema_salad.sourceline import strip_dup_lineno from schema_salad.validate import ValidationException from os.path import normpath import re import six class TestPrintOneline(unittest.TestCase): def test_print_oneline(self): # Issue #135 document_loader, avsc_names, schema_metadata, metaschema_loader = load_schema( get_data(u"tests/test_schema/CommonWorkflowLanguage.yml")) src = "test15.cwl" with self.assertRaises(ValidationException): try: load_and_validate(document_loader, avsc_names, six.text_type(get_data("tests/test_schema/"+src)), True) except ValidationException as e: msgs = to_one_line_messages(str(e)).splitlines() self.assertEqual(len(msgs), 2) m = re.match(r'^(.+:\d+:\d+:)(.+)$', msgs[0]) self.assertTrue(msgs[0].endswith(src+":11:7: invalid field `invalid_field`, expected one of: 'loadContents', 'position', 'prefix', 'separate', 'itemSeparator', 'valueFrom', 'shellQuote'")) self.assertTrue(msgs[1].endswith(src+":12:7: invalid field `another_invalid_field`, expected one of: 'loadContents', 'position', 'prefix', 'separate', 'itemSeparator', 'valueFrom', 'shellQuote'")) print("\n", e) raise def test_print_oneline_for_invalid_yaml(self): # Issue #137 document_loader, avsc_names, schema_metadata, metaschema_loader = load_schema( get_data(u"tests/test_schema/CommonWorkflowLanguage.yml")) src = "test16.cwl" with self.assertRaises(RuntimeError): try: load_and_validate(document_loader, avsc_names, six.text_type(get_data("tests/test_schema/"+src)), True) except RuntimeError as e: msg = reformat_yaml_exception_message(strip_dup_lineno(six.text_type(e))) msg = to_one_line_messages(msg) self.assertTrue(msg.endswith(src+":10:1: could not find expected \':\'")) print("\n", e) raise def test_print_oneline_for_errors_in_the_same_line(self): # Issue #136 document_loader, avsc_names, schema_metadata, metaschema_loader = load_schema( get_data(u"tests/test_schema/CommonWorkflowLanguage.yml")) src = "test17.cwl" with self.assertRaises(ValidationException): try: load_and_validate(document_loader, avsc_names, six.text_type(get_data("tests/test_schema/"+src)), True) except ValidationException as e: msgs = to_one_line_messages(str(e)).splitlines() self.assertEqual(len(msgs), 2) self.assertTrue(msgs[0].endswith(src+":13:5: missing required field `id`")) self.assertTrue(msgs[1].endswith(src+":13:5: invalid field `aa`, expected one of: 'label', 'secondaryFiles', 'format', 'streamable', 'doc', 'id', 'outputBinding', 'type'")) print("\n", e) raise def test_print_oneline_for_errors_in_resolve_ref(self): # Issue #141 document_loader, avsc_names, schema_metadata, metaschema_loader = load_schema( get_data(u"tests/test_schema/CommonWorkflowLanguage.yml")) src = "test18.cwl" fullpath = normpath(get_data("tests/test_schema/"+src)) with self.assertRaises(ValidationException): try: load_and_validate(document_loader, avsc_names, six.text_type(fullpath), True) except ValidationException as e: msgs = to_one_line_messages(str(strip_dup_lineno(six.text_type(e)))).splitlines() # convert Windows path to Posix path if '\\' in fullpath: fullpath = '/'+fullpath.replace('\\', '/') self.assertEqual(len(msgs), 1) self.assertTrue(msgs[0].endswith(src+':13:5: Field `type` references unknown identifier `Filea`, tried file://%s#Filea' % (fullpath))) print("\n", e) raise def test_for_invalid_yaml1(self): # Issue 143 document_loader, avsc_names, schema_metadata, metaschema_loader = load_schema( get_data(u"tests/test_schema/CommonWorkflowLanguage.yml")) src = "test16.cwl" with self.assertRaises(RuntimeError): try: load_and_validate(document_loader, avsc_names, six.text_type(get_data("tests/test_schema/"+src)), True) except RuntimeError as e: msg = reformat_yaml_exception_message(strip_dup_lineno(six.text_type(e))) msgs = msg.splitlines() self.assertEqual(len(msgs), 2) self.assertTrue(msgs[0].endswith(src+":9:7: while scanning a simple key")) self.assertTrue(msgs[1].endswith(src+":10:1: could not find expected ':'")) print("\n", e) raise def test_for_invalid_yaml2(self): # Issue 143 document_loader, avsc_names, schema_metadata, metaschema_loader = load_schema( get_data(u"tests/test_schema/CommonWorkflowLanguage.yml")) src = "test19.cwl" with self.assertRaises(RuntimeError): try: load_and_validate(document_loader, avsc_names, six.text_type(get_data("tests/test_schema/"+src)), True) except RuntimeError as e: msg = reformat_yaml_exception_message(strip_dup_lineno(six.text_type(e))) self.assertTrue(msg.endswith(src+":1:1: expected , but found ':'")) print("\n", e) raise schema-salad-2.6.20171201034858/schema_salad/tests/test_examples.py0000644000175100017510000003351613203345013024554 0ustar peterpeter00000000000000from __future__ import absolute_import from __future__ import print_function from .util import get_data import unittest import schema_salad.ref_resolver import schema_salad.main import schema_salad.schema from schema_salad.jsonld_context import makerdf import rdflib import ruamel.yaml import json import os from schema_salad.sourceline import cmap, SourceLine try: from ruamel.yaml import CSafeLoader as SafeLoader except ImportError: from ruamel.yaml import SafeLoader # type: ignore from ruamel.yaml.comments import CommentedSeq, CommentedMap class TestSchemas(unittest.TestCase): def test_schemas(self): loader = schema_salad.ref_resolver.Loader({}) ra, _ = loader.resolve_all(cmap({ u"$schemas": [schema_salad.ref_resolver.file_uri(get_data("tests/EDAM.owl"))], u"$namespaces": {u"edam": u"http://edamontology.org/"}, u"edam:has_format": u"edam:format_1915" }), "") self.assertEqual({ u"$schemas": [schema_salad.ref_resolver.file_uri(get_data("tests/EDAM.owl"))], u"$namespaces": {u"edam": u"http://edamontology.org/"}, u'http://edamontology.org/has_format': u'http://edamontology.org/format_1915' }, ra) # def test_domain(self): # l = schema_salad.ref_resolver.Loader({}) # l.idx["http://example.com/stuff"] = { # "$schemas": ["tests/EDAM.owl"], # "$namespaces": {"edam": "http://edamontology.org/"}, # } # ra, _ = l.resolve_all({ # "$import": "http://example.com/stuff", # "edam:has_format": "edam:format_1915" # }, "") # self.assertEqual(ra, { # "$schemas": ["tests/EDAM.owl"], # "$namespaces": {"edam": "http://edamontology.org/"}, # 'http://edamontology.org/has_format': 'http://edamontology.org/format_1915' # }) def test_self_validate(self): self.assertEqual(0, schema_salad.main.main( argsl=[get_data("metaschema/metaschema.yml")])) self.assertEqual(0, schema_salad.main.main( argsl=[get_data("metaschema/metaschema.yml"), get_data("metaschema/metaschema.yml")])) def test_avro_regression(self): self.assertEqual(0, schema_salad.main.main( argsl=[get_data("tests/Process.yml")])) def test_jsonld_ctx(self): ldr, _, _, _ = schema_salad.schema.load_schema(cmap({ "$base": "Y", "name": "X", "$namespaces": { "foo": "http://example.com/foo#" }, "$graph": [{ "name": "ExampleType", "type": "enum", "symbols": ["asym", "bsym"]}] })) ra, _ = ldr.resolve_all(cmap({"foo:bar": "asym"}), "X") self.assertEqual(ra, { 'http://example.com/foo#bar': 'asym' }) def test_idmap(self): ldr = schema_salad.ref_resolver.Loader({}) ldr.add_context({ "inputs": { "@id": "http://example.com/inputs", "mapSubject": "id", "mapPredicate": "a" }, "outputs": { "@type": "@id", "identity": True, }, "id": "@id"}) ra, _ = ldr.resolve_all(cmap({ "id": "stuff", "inputs": { "zip": 1, "zing": 2 }, "outputs": ["out"], "other": { 'n': 9 } }), "http://example2.com/") self.assertEqual("http://example2.com/#stuff", ra["id"]) for item in ra["inputs"]: if item["a"] == 2: self.assertEqual( 'http://example2.com/#stuff/zing', item["id"]) else: self.assertEqual('http://example2.com/#stuff/zip', item["id"]) self.assertEqual(['http://example2.com/#stuff/out'], ra['outputs']) self.assertEqual({'n': 9}, ra['other']) def test_scoped_ref(self): ldr = schema_salad.ref_resolver.Loader({}) ldr.add_context({ "scatter": { "@type": "@id", "refScope": 0, }, "source": { "@type": "@id", "refScope": 2, }, "in": { "mapSubject": "id", "mapPredicate": "source" }, "out": { "@type": "@id", "identity": True }, "inputs": { "mapSubject": "id", "mapPredicate": "type" }, "outputs": { "mapSubject": "id", }, "steps": { "mapSubject": "id" }, "id": "@id"}) ra, _ = ldr.resolve_all(cmap({ "inputs": { "inp": "string", "inp2": "string" }, "outputs": { "out": { "type": "string", "source": "step2/out" } }, "steps": { "step1": { "in": { "inp": "inp", "inp2": "#inp2", "inp3": ["inp", "inp2"] }, "out": ["out"], "scatter": "inp" }, "step2": { "in": { "inp": "step1/out" }, "scatter": "inp", "out": ["out"] } } }), "http://example2.com/") self.assertEqual( {'inputs': [{ 'id': 'http://example2.com/#inp', 'type': 'string' }, { 'id': 'http://example2.com/#inp2', 'type': 'string' }], 'outputs': [{ 'id': 'http://example2.com/#out', 'type': 'string', 'source': 'http://example2.com/#step2/out' }], 'steps': [{ 'id': 'http://example2.com/#step1', 'scatter': 'http://example2.com/#step1/inp', 'in': [{ 'id': 'http://example2.com/#step1/inp', 'source': 'http://example2.com/#inp' }, { 'id': 'http://example2.com/#step1/inp2', 'source': 'http://example2.com/#inp2' }, { 'id': 'http://example2.com/#step1/inp3', 'source': ['http://example2.com/#inp', 'http://example2.com/#inp2'] }], "out": ["http://example2.com/#step1/out"], }, { 'id': 'http://example2.com/#step2', 'scatter': 'http://example2.com/#step2/inp', 'in': [{ 'id': 'http://example2.com/#step2/inp', 'source': 'http://example2.com/#step1/out' }], "out": ["http://example2.com/#step2/out"], }] }, ra) def test_examples(self): for a in ["field_name", "ident_res", "link_res", "vocab_res"]: ldr, _, _, _ = schema_salad.schema.load_schema( get_data("metaschema/%s_schema.yml" % a)) with open(get_data("metaschema/%s_src.yml" % a)) as src_fp: src = ldr.resolve_all( ruamel.yaml.round_trip_load(src_fp), "", checklinks=False)[0] with open(get_data("metaschema/%s_proc.yml" % a)) as src_proc: proc = ruamel.yaml.safe_load(src_proc) self.assertEqual(proc, src) def test_yaml_float_test(self): self.assertEqual(ruamel.yaml.safe_load("float-test: 2e-10")["float-test"], 2e-10) def test_typedsl_ref(self): ldr = schema_salad.ref_resolver.Loader({}) ldr.add_context({ "File": "http://example.com/File", "null": "http://example.com/null", "array": "http://example.com/array", "type": { "@type": "@vocab", "typeDSL": True } }) ra, _ = ldr.resolve_all(cmap({"type": "File"}), "") self.assertEqual({'type': 'File'}, ra) ra, _ = ldr.resolve_all(cmap({"type": "File?"}), "") self.assertEqual({'type': ['null', 'File']}, ra) ra, _ = ldr.resolve_all(cmap({"type": "File[]"}), "") self.assertEqual({'type': {'items': 'File', 'type': 'array'}}, ra) ra, _ = ldr.resolve_all(cmap({"type": "File[]?"}), "") self.assertEqual( {'type': ['null', {'items': 'File', 'type': 'array'}]}, ra) def test_scoped_id(self): ldr = schema_salad.ref_resolver.Loader({}) ctx = { "id": "@id", "location": { "@id": "@id", "@type": "@id" }, "bar": "http://example.com/bar", "ex": "http://example.com/" } ldr.add_context(ctx) ra, _ = ldr.resolve_all(cmap({ "id": "foo", "bar": { "id": "baz" } }), "http://example.com") self.assertEqual({'id': 'http://example.com/#foo', 'bar': { 'id': 'http://example.com/#foo/baz'}, }, ra) g = makerdf(None, ra, ctx) print(g.serialize(format="n3")) ra, _ = ldr.resolve_all(cmap({ "location": "foo", "bar": { "location": "baz" } }), "http://example.com", checklinks=False) self.assertEqual({'location': 'http://example.com/foo', 'bar': { 'location': 'http://example.com/baz'}, }, ra) g = makerdf(None, ra, ctx) print(g.serialize(format="n3")) ra, _ = ldr.resolve_all(cmap({ "id": "foo", "bar": { "location": "baz" } }), "http://example.com", checklinks=False) self.assertEqual({'id': 'http://example.com/#foo', 'bar': { 'location': 'http://example.com/baz'}, }, ra) g = makerdf(None, ra, ctx) print(g.serialize(format="n3")) ra, _ = ldr.resolve_all(cmap({ "location": "foo", "bar": { "id": "baz" } }), "http://example.com", checklinks=False) self.assertEqual({'location': 'http://example.com/foo', 'bar': { 'id': 'http://example.com/#baz'}, }, ra) g = makerdf(None, ra, ctx) print(g.serialize(format="n3")) def test_mixin(self): base_url = schema_salad.ref_resolver.file_uri(os.path.join(os.getcwd(), "tests")) ldr = schema_salad.ref_resolver.Loader({}) ra = ldr.resolve_ref(cmap({"$mixin": get_data("tests/mixin.yml"), "one": "five"}), base_url=base_url) self.assertEqual({'id': 'four', 'one': 'five'}, ra[0]) ldr = schema_salad.ref_resolver.Loader({"id": "@id"}) ra = ldr.resolve_all(cmap([{ "id": "a", "m": {"$mixin": get_data("tests/mixin.yml")} }, { "id": "b", "m": {"$mixin": get_data("tests/mixin.yml")} }]), base_url=base_url) self.assertEqual([{ 'id': base_url + '#a', 'm': { 'id': base_url + u'#a/four', 'one': 'two' }, }, { 'id': base_url + '#b', 'm': { 'id': base_url + u'#b/four', 'one': 'two'} }], ra[0]) def test_fragment(self): ldr = schema_salad.ref_resolver.Loader({"id": "@id"}) b, _ = ldr.resolve_ref(get_data("tests/frag.yml#foo2")) self.assertEqual({"id": b["id"], "bar":"b2"}, b) def test_file_uri(self): # Note: this test probably won't pass on Windows. Someone with a # windows box should add an alternate test. self.assertEquals("file:///foo/bar%20baz/quux", schema_salad.ref_resolver.file_uri("/foo/bar baz/quux")) self.assertEquals(os.path.normpath("/foo/bar baz/quux"), schema_salad.ref_resolver.uri_file_path("file:///foo/bar%20baz/quux")) self.assertEquals("file:///foo/bar%20baz/quux%23zing%20zong", schema_salad.ref_resolver.file_uri("/foo/bar baz/quux#zing zong")) self.assertEquals("file:///foo/bar%20baz/quux#zing%20zong", schema_salad.ref_resolver.file_uri("/foo/bar baz/quux#zing zong", split_frag=True)) self.assertEquals(os.path.normpath("/foo/bar baz/quux#zing zong"), schema_salad.ref_resolver.uri_file_path("file:///foo/bar%20baz/quux#zing%20zong")) class SourceLineTest(unittest.TestCase): def test_sourceline(self): ldr = schema_salad.ref_resolver.Loader({"id": "@id"}) b, _ = ldr.resolve_ref(get_data("tests/frag.yml")) class TestExp(Exception): pass try: with SourceLine(b, 1, TestExp, False): raise Exception("Whoops") except TestExp as e: self.assertTrue(str(e).endswith("frag.yml:3:3: Whoops")) except Exception: self.assertFail() try: with SourceLine(b, 1, TestExp, True): raise Exception("Whoops") except TestExp as e: self.assertTrue(str(e).splitlines()[0].endswith("frag.yml:3:3: Traceback (most recent call last):")) except Exception: self.assertFail() if __name__ == '__main__': unittest.main() schema-salad-2.6.20171201034858/schema_salad/tests/test_cli_args.py0000644000175100017510000000225713130233260024516 0ustar peterpeter00000000000000from __future__ import absolute_import import unittest import sys import schema_salad.main as cli_parser # for capturing print() output from contextlib import contextmanager from six import StringIO @contextmanager def captured_output(): new_out, new_err = StringIO(), StringIO() old_out, old_err = sys.stdout, sys.stderr try: sys.stdout, sys.stderr = new_out, new_err yield sys.stdout, sys.stderr finally: sys.stdout, sys.stderr = old_out, old_err """ test different sets of command line arguments""" class ParseCliArgs(unittest.TestCase): def test_version(self): args = [["--version"], ["-v"]] for arg in args: with captured_output() as (out, err): cli_parser.main(arg) response = out.getvalue().strip() # capture output and strip newline self.assertTrue("Current version" in response) def test_empty_input(self): # running schema_salad tool wihtout any args args = [] with captured_output() as (out, err): cli_parser.main(args) response = out.getvalue().strip() self.assertTrue("error: too few arguments" in response) schema-salad-2.6.20171201034858/schema_salad/tests/mixin.yml0000644000175100017510000000002212752677740023205 0ustar peterpeter00000000000000id: four one: two schema-salad-2.6.20171201034858/schema_salad/tests/__init__.py0000644000175100017510000000000012752677740023443 0ustar peterpeter00000000000000schema-salad-2.6.20171201034858/schema_salad/tests/cwl-pre.yml0000644000175100017510000036211113203345013023415 0ustar peterpeter00000000000000[ { "name": "https://w3id.org/cwl/cwl#Common Workflow Language, v1.0", "type": "documentation", "doc": "\n" }, { "name": "https://w3id.org/cwl/salad#PrimitiveType", "type": "enum", "symbols": [ "https://w3id.org/cwl/salad#null", "http://www.w3.org/2001/XMLSchema#boolean", "http://www.w3.org/2001/XMLSchema#int", "http://www.w3.org/2001/XMLSchema#long", "http://www.w3.org/2001/XMLSchema#float", "http://www.w3.org/2001/XMLSchema#double", "http://www.w3.org/2001/XMLSchema#string" ], "doc": [ "Salad data types are based on Avro schema declarations. Refer to the\n[Avro schema declaration documentation](https://avro.apache.org/docs/current/spec.html#schemas) for\ndetailed information.\n", "null: no value", "boolean: a binary value", "int: 32-bit signed integer", "long: 64-bit signed integer", "float: single precision (32-bit) IEEE 754 floating-point number", "double: double precision (64-bit) IEEE 754 floating-point number", "string: Unicode character sequence" ] }, { "name": "https://w3id.org/cwl/salad#Any", "type": "enum", "symbols": [ "https://w3id.org/cwl/salad#Any" ], "doc": "The **Any** type validates for any non-null value.\n" }, { "name": "https://w3id.org/cwl/salad#RecordField", "type": "record", "doc": "A field of a record.", "fields": [ { "name": "https://w3id.org/cwl/salad#RecordField/name", "type": "string", "jsonldPredicate": "@id", "doc": "The name of the field\n" }, { "name": "https://w3id.org/cwl/salad#RecordField/doc", "type": [ "null", "string" ], "doc": "A documentation string for this field\n", "jsonldPredicate": "rdfs:comment" }, { "name": "https://w3id.org/cwl/salad#RecordField/type", "type": [ "PrimitiveType", "RecordSchema", "EnumSchema", "ArraySchema", "string", { "type": "array", "items": [ "PrimitiveType", "RecordSchema", "EnumSchema", "ArraySchema", "string" ] } ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#type", "_type": "@vocab", "typeDSL": true, "refScope": 2 }, "doc": "The field type\n" } ] }, { "name": "https://w3id.org/cwl/salad#RecordSchema", "type": "record", "fields": [ { "type": [ "null", { "type": "array", "items": "RecordField" } ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#fields", "mapSubject": "name", "mapPredicate": "type" }, "doc": "Defines the fields of the record.", "name": "https://w3id.org/cwl/salad#RecordSchema/fields" }, { "doc": "Must be `record`", "type": { "type": "enum", "symbols": [ "https://w3id.org/cwl/salad#record" ] }, "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#type", "_type": "@vocab", "typeDSL": true, "refScope": 2 }, "name": "https://w3id.org/cwl/salad#RecordSchema/type" } ] }, { "name": "https://w3id.org/cwl/salad#EnumSchema", "type": "record", "doc": "Define an enumerated type.\n", "fields": [ { "type": { "type": "array", "items": "string" }, "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#symbols", "_type": "@id", "identity": true }, "doc": "Defines the set of valid symbols.", "name": "https://w3id.org/cwl/salad#EnumSchema/symbols" }, { "doc": "Must be `enum`", "type": { "type": "enum", "symbols": [ "https://w3id.org/cwl/salad#enum" ] }, "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#type", "_type": "@vocab", "typeDSL": true, "refScope": 2 }, "name": "https://w3id.org/cwl/salad#EnumSchema/type" } ] }, { "name": "https://w3id.org/cwl/salad#ArraySchema", "type": "record", "fields": [ { "type": [ "PrimitiveType", "RecordSchema", "EnumSchema", "ArraySchema", "string", { "type": "array", "items": [ "PrimitiveType", "RecordSchema", "EnumSchema", "ArraySchema", "string" ] } ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#items", "_type": "@vocab", "refScope": 2 }, "doc": "Defines the type of the array elements.", "name": "https://w3id.org/cwl/salad#ArraySchema/items" }, { "doc": "Must be `array`", "type": { "type": "enum", "symbols": [ "https://w3id.org/cwl/salad#array" ] }, "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#type", "_type": "@vocab", "typeDSL": true, "refScope": 2 }, "name": "https://w3id.org/cwl/salad#ArraySchema/type" } ] }, { "name": "https://w3id.org/cwl/cwl#BaseTypesDoc", "type": "documentation", "doc": "## Base types\n", "docChild": [ "https://w3id.org/cwl/cwl#CWLType", "https://w3id.org/cwl/cwl#Process" ] }, { "type": "enum", "name": "https://w3id.org/cwl/cwl#CWLVersion", "doc": "Version symbols for published CWL document versions.", "symbols": [ "https://w3id.org/cwl/cwl#draft-2", "https://w3id.org/cwl/cwl#draft-3.dev1", "https://w3id.org/cwl/cwl#draft-3.dev2", "https://w3id.org/cwl/cwl#draft-3.dev3", "https://w3id.org/cwl/cwl#draft-3.dev4", "https://w3id.org/cwl/cwl#draft-3.dev5", "https://w3id.org/cwl/cwl#draft-3", "https://w3id.org/cwl/cwl#draft-4.dev1", "https://w3id.org/cwl/cwl#draft-4.dev2", "https://w3id.org/cwl/cwl#draft-4.dev3", "https://w3id.org/cwl/cwl#v1.0.dev4", "https://w3id.org/cwl/cwl#v1.0" ] }, { "name": "https://w3id.org/cwl/cwl#CWLType", "type": "enum", "extends": "https://w3id.org/cwl/salad#PrimitiveType", "symbols": [ "https://w3id.org/cwl/cwl#File", "https://w3id.org/cwl/cwl#Directory" ], "doc": [ "Extends primitive types with the concept of a file and directory as a builtin type.", "File: A File object", "Directory: A Directory object" ] }, { "name": "https://w3id.org/cwl/cwl#File", "type": "record", "docParent": "https://w3id.org/cwl/cwl#CWLType", "doc": "Represents a file (or group of files if `secondaryFiles` is specified) that\nmust be accessible by tools using standard POSIX file system call API such as\nopen(2) and read(2).\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#File/class", "type": { "type": "enum", "symbols": [ "https://w3id.org/cwl/cwl#File" ] }, "jsonldPredicate": { "_id": "@type", "_type": "@vocab" }, "doc": "Must be `File` to indicate this object describes a file." }, { "name": "https://w3id.org/cwl/cwl#File/location", "type": [ "null", "string" ], "doc": "An IRI that identifies the file resource. This may be a relative\nreference, in which case it must be resolved using the base IRI of the\ndocument. The location may refer to a local or remote resource; the\nimplementation must use the IRI to retrieve file content. If an\nimplementation is unable to retrieve the file content stored at a\nremote resource (due to unsupported protocol, access denied, or other\nissue) it must signal an error.\n\nIf the `location` field is not provided, the `contents` field must be\nprovided. The implementation must assign a unique identifier for\nthe `location` field.\n\nIf the `path` field is provided but the `location` field is not, an\nimplementation may assign the value of the `path` field to `location`,\nthen follow the rules above.\n", "jsonldPredicate": { "_id": "@id", "_type": "@id" } }, { "name": "https://w3id.org/cwl/cwl#File/path", "type": [ "null", "string" ], "doc": "The local host path where the File is available when a CommandLineTool is\nexecuted. This field must be set by the implementation. The final\npath component must match the value of `basename`. This field\nmust not be used in any other context. The command line tool being\nexecuted must be able to to access the file at `path` using the POSIX\n`open(2)` syscall.\n\nAs a special case, if the `path` field is provided but the `location`\nfield is not, an implementation may assign the value of the `path`\nfield to `location`, and remove the `path` field.\n\nIf the `path` contains [POSIX shell metacharacters](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02)\n(`|`,`&`, `;`, `<`, `>`, `(`,`)`, `$`,`` ` ``, `\\`, `\"`, `'`,\n``, ``, and ``) or characters\n[not allowed](http://www.iana.org/assignments/idna-tables-6.3.0/idna-tables-6.3.0.xhtml)\nfor [Internationalized Domain Names for Applications](https://tools.ietf.org/html/rfc6452)\nthen implementations may terminate the process with a\n`permanentFailure`.\n", "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#path", "_type": "@id" } }, { "name": "https://w3id.org/cwl/cwl#File/basename", "type": [ "null", "string" ], "doc": "The base name of the file, that is, the name of the file without any\nleading directory path. The base name must not contain a slash `/`.\n\nIf not provided, the implementation must set this field based on the\n`location` field by taking the final path component after parsing\n`location` as an IRI. If `basename` is provided, it is not required to\nmatch the value from `location`.\n\nWhen this file is made available to a CommandLineTool, it must be named\nwith `basename`, i.e. the final component of the `path` field must match\n`basename`.\n", "jsonldPredicate": "cwl:basename" }, { "name": "https://w3id.org/cwl/cwl#File/dirname", "type": [ "null", "string" ], "doc": "The name of the directory containing file, that is, the path leading up\nto the final slash in the path such that `dirname + '/' + basename ==\npath`.\n\nThe implementation must set this field based on the value of `path`\nprior to evaluating parameter references or expressions in a\nCommandLineTool document. This field must not be used in any other\ncontext.\n" }, { "name": "https://w3id.org/cwl/cwl#File/nameroot", "type": [ "null", "string" ], "doc": "The basename root such that `nameroot + nameext == basename`, and\n`nameext` is empty or begins with a period and contains at most one\nperiod. For the purposess of path splitting leading periods on the\nbasename are ignored; a basename of `.cshrc` will have a nameroot of\n`.cshrc`.\n\nThe implementation must set this field automatically based on the value\nof `basename` prior to evaluating parameter references or expressions.\n" }, { "name": "https://w3id.org/cwl/cwl#File/nameext", "type": [ "null", "string" ], "doc": "The basename extension such that `nameroot + nameext == basename`, and\n`nameext` is empty or begins with a period and contains at most one\nperiod. Leading periods on the basename are ignored; a basename of\n`.cshrc` will have an empty `nameext`.\n\nThe implementation must set this field automatically based on the value\nof `basename` prior to evaluating parameter references or expressions.\n" }, { "name": "https://w3id.org/cwl/cwl#File/checksum", "type": [ "null", "string" ], "doc": "Optional hash code for validating file integrity. Currently must be in the form\n\"sha1$ + hexadecimal string\" using the SHA-1 algorithm.\n" }, { "name": "https://w3id.org/cwl/cwl#File/size", "type": [ "null", "long" ], "doc": "Optional file size" }, { "name": "https://w3id.org/cwl/cwl#File/secondaryFiles", "type": [ "null", { "type": "array", "items": [ "https://w3id.org/cwl/cwl#File", "https://w3id.org/cwl/cwl#Directory" ] } ], "jsonldPredicate": "cwl:secondaryFiles", "doc": "A list of additional files that are associated with the primary file\nand must be transferred alongside the primary file. Examples include\nindexes of the primary file, or external references which must be\nincluded when loading primary document. A file object listed in\n`secondaryFiles` may itself include `secondaryFiles` for which the same\nrules apply.\n" }, { "name": "https://w3id.org/cwl/cwl#File/format", "type": [ "null", "string" ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#format", "_type": "@id", "identity": true }, "doc": "The format of the file: this must be an IRI of a concept node that\nrepresents the file format, preferrably defined within an ontology.\nIf no ontology is available, file formats may be tested by exact match.\n\nReasoning about format compatability must be done by checking that an\ninput file format is the same, `owl:equivalentClass` or\n`rdfs:subClassOf` the format required by the input parameter.\n`owl:equivalentClass` is transitive with `rdfs:subClassOf`, e.g. if\n` owl:equivalentClass ` and ` owl:subclassOf ` then infer\n` owl:subclassOf `.\n\nFile format ontologies may be provided in the \"$schema\" metadata at the\nroot of the document. If no ontologies are specified in `$schema`, the\nruntime may perform exact file format matches.\n" }, { "name": "https://w3id.org/cwl/cwl#File/contents", "type": [ "null", "string" ], "doc": "File contents literal. Maximum of 64 KiB.\n\nIf neither `location` nor `path` is provided, `contents` must be\nnon-null. The implementation must assign a unique identifier for the\n`location` field. When the file is staged as input to CommandLineTool,\nthe value of `contents` must be written to a file.\n\nIf `loadContents` of `inputBinding` or `outputBinding` is true and\n`location` is valid, the implementation must read up to the first 64\nKiB of text from the file and place it in the \"contents\" field.\n" } ] }, { "name": "https://w3id.org/cwl/cwl#Directory", "type": "record", "docAfter": "https://w3id.org/cwl/cwl#File", "doc": "Represents a directory to present to a command line tool.\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#Directory/class", "type": { "type": "enum", "symbols": [ "https://w3id.org/cwl/cwl#Directory" ] }, "jsonldPredicate": { "_id": "@type", "_type": "@vocab" }, "doc": "Must be `Directory` to indicate this object describes a Directory." }, { "name": "https://w3id.org/cwl/cwl#Directory/location", "type": [ "null", "string" ], "doc": "An IRI that identifies the directory resource. This may be a relative\nreference, in which case it must be resolved using the base IRI of the\ndocument. The location may refer to a local or remote resource. If\nthe `listing` field is not set, the implementation must use the\nlocation IRI to retrieve directory listing. If an implementation is\nunable to retrieve the directory listing stored at a remote resource (due to\nunsupported protocol, access denied, or other issue) it must signal an\nerror.\n\nIf the `location` field is not provided, the `listing` field must be\nprovided. The implementation must assign a unique identifier for\nthe `location` field.\n\nIf the `path` field is provided but the `location` field is not, an\nimplementation may assign the value of the `path` field to `location`,\nthen follow the rules above.\n", "jsonldPredicate": { "_id": "@id", "_type": "@id" } }, { "name": "https://w3id.org/cwl/cwl#Directory/path", "type": [ "null", "string" ], "doc": "The local path where the Directory is made available prior to executing a\nCommandLineTool. This must be set by the implementation. This field\nmust not be used in any other context. The command line tool being\nexecuted must be able to to access the directory at `path` using the POSIX\n`opendir(2)` syscall.\n\nIf the `path` contains [POSIX shell metacharacters](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02)\n(`|`,`&`, `;`, `<`, `>`, `(`,`)`, `$`,`` ` ``, `\\`, `\"`, `'`,\n``, ``, and ``) or characters\n[not allowed](http://www.iana.org/assignments/idna-tables-6.3.0/idna-tables-6.3.0.xhtml)\nfor [Internationalized Domain Names for Applications](https://tools.ietf.org/html/rfc6452)\nthen implementations may terminate the process with a\n`permanentFailure`.\n", "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#path", "_type": "@id" } }, { "name": "https://w3id.org/cwl/cwl#Directory/basename", "type": [ "null", "string" ], "doc": "The base name of the directory, that is, the name of the file without any\nleading directory path. The base name must not contain a slash `/`.\n\nIf not provided, the implementation must set this field based on the\n`location` field by taking the final path component after parsing\n`location` as an IRI. If `basename` is provided, it is not required to\nmatch the value from `location`.\n\nWhen this file is made available to a CommandLineTool, it must be named\nwith `basename`, i.e. the final component of the `path` field must match\n`basename`.\n", "jsonldPredicate": "cwl:basename" }, { "name": "https://w3id.org/cwl/cwl#Directory/listing", "type": [ "null", { "type": "array", "items": [ "https://w3id.org/cwl/cwl#File", "https://w3id.org/cwl/cwl#Directory" ] } ], "doc": "List of files or subdirectories contained in this directory. The name\nof each file or subdirectory is determined by the `basename` field of\neach `File` or `Directory` object. It is an error if a `File` shares a\n`basename` with any other entry in `listing`. If two or more\n`Directory` object share the same `basename`, this must be treated as\nequivalent to a single subdirectory with the listings recursively\nmerged.\n", "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#listing" } } ] }, { "name": "https://w3id.org/cwl/cwl#SchemaBase", "type": "record", "abstract": true, "fields": [ { "name": "https://w3id.org/cwl/cwl#SchemaBase/label", "type": [ "null", "string" ], "jsonldPredicate": "rdfs:label", "doc": "A short, human-readable label of this object." } ] }, { "name": "https://w3id.org/cwl/cwl#Parameter", "type": "record", "extends": "https://w3id.org/cwl/cwl#SchemaBase", "abstract": true, "doc": "Define an input or output parameter to a process.\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#Parameter/secondaryFiles", "type": [ "null", "string", "https://w3id.org/cwl/cwl#Expression", { "type": "array", "items": [ "string", "https://w3id.org/cwl/cwl#Expression" ] } ], "jsonldPredicate": "cwl:secondaryFiles", "doc": "Only valid when `type: File` or is an array of `items: File`.\n\nDescribes files that must be included alongside the primary file(s).\n\nIf the value is an expression, the value of `self` in the expression\nmust be the primary input or output File to which this binding applies.\n\nIf the value is a string, it specifies that the following pattern\nshould be applied to the primary file:\n\n 1. If string begins with one or more caret `^` characters, for each\n caret, remove the last file extension from the path (the last\n period `.` and all following characters). If there are no file\n extensions, the path is unchanged.\n 2. Append the remainder of the string to the end of the file path.\n" }, { "name": "https://w3id.org/cwl/cwl#Parameter/format", "type": [ "null", "string", { "type": "array", "items": "string" }, "https://w3id.org/cwl/cwl#Expression" ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#format", "_type": "@id", "identity": true }, "doc": "Only valid when `type: File` or is an array of `items: File`.\n\nFor input parameters, this must be one or more IRIs of concept nodes\nthat represents file formats which are allowed as input to this\nparameter, preferrably defined within an ontology. If no ontology is\navailable, file formats may be tested by exact match.\n\nFor output parameters, this is the file format that will be assigned to\nthe output parameter.\n" }, { "name": "https://w3id.org/cwl/cwl#Parameter/streamable", "type": [ "null", "boolean" ], "doc": "Only valid when `type: File` or is an array of `items: File`.\n\nA value of `true` indicates that the file is read or written\nsequentially without seeking. An implementation may use this flag to\nindicate whether it is valid to stream file contents using a named\npipe. Default: `false`.\n" }, { "name": "https://w3id.org/cwl/cwl#Parameter/doc", "type": [ "null", "string", { "type": "array", "items": "string" } ], "doc": "A documentation string for this type, or an array of strings which should be concatenated.", "jsonldPredicate": "rdfs:comment" } ] }, { "type": "enum", "name": "https://w3id.org/cwl/cwl#Expression", "doc": "'Expression' is not a real type. It indicates that a field must allow\nruntime parameter references. If [InlineJavascriptRequirement](#InlineJavascriptRequirement)\nis declared and supported by the platform, the field must also allow\nJavascript expressions.\n", "symbols": [ "https://w3id.org/cwl/cwl#ExpressionPlaceholder" ] }, { "name": "https://w3id.org/cwl/cwl#InputBinding", "type": "record", "abstract": true, "fields": [ { "name": "https://w3id.org/cwl/cwl#InputBinding/loadContents", "type": [ "null", "boolean" ], "jsonldPredicate": "cwl:loadContents", "doc": "Only valid when `type: File` or is an array of `items: File`.\n\nRead up to the first 64 KiB of text from the file and place it in the\n\"contents\" field of the file object for use by expressions.\n" } ] }, { "name": "https://w3id.org/cwl/cwl#OutputBinding", "type": "record", "abstract": true }, { "name": "https://w3id.org/cwl/cwl#InputSchema", "extends": "https://w3id.org/cwl/cwl#SchemaBase", "type": "record", "abstract": true }, { "name": "https://w3id.org/cwl/cwl#OutputSchema", "extends": "https://w3id.org/cwl/cwl#SchemaBase", "type": "record", "abstract": true }, { "name": "https://w3id.org/cwl/cwl#InputRecordField", "type": "record", "extends": "https://w3id.org/cwl/salad#RecordField", "specialize": [ { "specializeFrom": "https://w3id.org/cwl/salad#RecordSchema", "specializeTo": "https://w3id.org/cwl/cwl#InputRecordSchema" }, { "specializeFrom": "https://w3id.org/cwl/salad#EnumSchema", "specializeTo": "https://w3id.org/cwl/cwl#InputEnumSchema" }, { "specializeFrom": "https://w3id.org/cwl/salad#ArraySchema", "specializeTo": "https://w3id.org/cwl/cwl#InputArraySchema" }, { "specializeFrom": "https://w3id.org/cwl/salad#PrimitiveType", "specializeTo": "https://w3id.org/cwl/cwl#CWLType" } ], "fields": [ { "name": "https://w3id.org/cwl/cwl#InputRecordField/inputBinding", "type": [ "null", "https://w3id.org/cwl/cwl#InputBinding" ], "jsonldPredicate": "cwl:inputBinding" }, { "name": "https://w3id.org/cwl/cwl#InputRecordField/label", "type": [ "null", "string" ], "jsonldPredicate": "rdfs:label", "doc": "A short, human-readable label of this process object." } ] }, { "name": "https://w3id.org/cwl/cwl#InputRecordSchema", "type": "record", "extends": [ "https://w3id.org/cwl/salad#RecordSchema", "https://w3id.org/cwl/cwl#InputSchema" ], "specialize": [ { "specializeFrom": "https://w3id.org/cwl/salad#RecordField", "specializeTo": "https://w3id.org/cwl/cwl#InputRecordField" } ] }, { "name": "https://w3id.org/cwl/cwl#InputEnumSchema", "type": "record", "extends": [ "https://w3id.org/cwl/salad#EnumSchema", "https://w3id.org/cwl/cwl#InputSchema" ], "fields": [ { "name": "https://w3id.org/cwl/cwl#InputEnumSchema/inputBinding", "type": [ "null", "https://w3id.org/cwl/cwl#InputBinding" ], "jsonldPredicate": "cwl:inputBinding" } ] }, { "name": "https://w3id.org/cwl/cwl#InputArraySchema", "type": "record", "extends": [ "https://w3id.org/cwl/salad#ArraySchema", "https://w3id.org/cwl/cwl#InputSchema" ], "specialize": [ { "specializeFrom": "https://w3id.org/cwl/salad#RecordSchema", "specializeTo": "https://w3id.org/cwl/cwl#InputRecordSchema" }, { "specializeFrom": "https://w3id.org/cwl/salad#EnumSchema", "specializeTo": "https://w3id.org/cwl/cwl#InputEnumSchema" }, { "specializeFrom": "https://w3id.org/cwl/salad#ArraySchema", "specializeTo": "https://w3id.org/cwl/cwl#InputArraySchema" }, { "specializeFrom": "https://w3id.org/cwl/salad#PrimitiveType", "specializeTo": "https://w3id.org/cwl/cwl#CWLType" } ], "fields": [ { "name": "https://w3id.org/cwl/cwl#InputArraySchema/inputBinding", "type": [ "null", "https://w3id.org/cwl/cwl#InputBinding" ], "jsonldPredicate": "cwl:inputBinding" } ] }, { "name": "https://w3id.org/cwl/cwl#OutputRecordField", "type": "record", "extends": "https://w3id.org/cwl/salad#RecordField", "specialize": [ { "specializeFrom": "https://w3id.org/cwl/salad#RecordSchema", "specializeTo": "https://w3id.org/cwl/cwl#OutputRecordSchema" }, { "specializeFrom": "https://w3id.org/cwl/salad#EnumSchema", "specializeTo": "https://w3id.org/cwl/cwl#OutputEnumSchema" }, { "specializeFrom": "https://w3id.org/cwl/salad#ArraySchema", "specializeTo": "https://w3id.org/cwl/cwl#OutputArraySchema" }, { "specializeFrom": "https://w3id.org/cwl/salad#PrimitiveType", "specializeTo": "https://w3id.org/cwl/cwl#CWLType" } ], "fields": [ { "name": "https://w3id.org/cwl/cwl#OutputRecordField/outputBinding", "type": [ "null", "https://w3id.org/cwl/cwl#OutputBinding" ], "jsonldPredicate": "cwl:outputBinding" } ] }, { "name": "https://w3id.org/cwl/cwl#OutputRecordSchema", "type": "record", "extends": [ "https://w3id.org/cwl/salad#RecordSchema", "https://w3id.org/cwl/cwl#OutputSchema" ], "docParent": "https://w3id.org/cwl/cwl#OutputParameter", "specialize": [ { "specializeFrom": "https://w3id.org/cwl/salad#RecordField", "specializeTo": "https://w3id.org/cwl/cwl#OutputRecordField" } ] }, { "name": "https://w3id.org/cwl/cwl#OutputEnumSchema", "type": "record", "extends": [ "https://w3id.org/cwl/salad#EnumSchema", "https://w3id.org/cwl/cwl#OutputSchema" ], "docParent": "https://w3id.org/cwl/cwl#OutputParameter", "fields": [ { "name": "https://w3id.org/cwl/cwl#OutputEnumSchema/outputBinding", "type": [ "null", "https://w3id.org/cwl/cwl#OutputBinding" ], "jsonldPredicate": "cwl:outputBinding" } ] }, { "name": "https://w3id.org/cwl/cwl#OutputArraySchema", "type": "record", "extends": [ "https://w3id.org/cwl/salad#ArraySchema", "https://w3id.org/cwl/cwl#OutputSchema" ], "docParent": "https://w3id.org/cwl/cwl#OutputParameter", "specialize": [ { "specializeFrom": "https://w3id.org/cwl/salad#RecordSchema", "specializeTo": "https://w3id.org/cwl/cwl#OutputRecordSchema" }, { "specializeFrom": "https://w3id.org/cwl/salad#EnumSchema", "specializeTo": "https://w3id.org/cwl/cwl#OutputEnumSchema" }, { "specializeFrom": "https://w3id.org/cwl/salad#ArraySchema", "specializeTo": "https://w3id.org/cwl/cwl#OutputArraySchema" }, { "specializeFrom": "https://w3id.org/cwl/salad#PrimitiveType", "specializeTo": "https://w3id.org/cwl/cwl#CWLType" } ], "fields": [ { "name": "https://w3id.org/cwl/cwl#OutputArraySchema/outputBinding", "type": [ "null", "https://w3id.org/cwl/cwl#OutputBinding" ], "jsonldPredicate": "cwl:outputBinding" } ] }, { "name": "https://w3id.org/cwl/cwl#InputParameter", "type": "record", "extends": "https://w3id.org/cwl/cwl#Parameter", "fields": [ { "name": "https://w3id.org/cwl/cwl#InputParameter/id", "type": "string", "jsonldPredicate": "@id", "doc": "The unique identifier for this parameter object." }, { "name": "https://w3id.org/cwl/cwl#InputParameter/inputBinding", "type": [ "null", "https://w3id.org/cwl/cwl#InputBinding" ], "jsonldPredicate": "cwl:inputBinding", "doc": "Describes how to handle the inputs of a process and convert them\ninto a concrete form for execution, such as command line parameters.\n" }, { "name": "https://w3id.org/cwl/cwl#InputParameter/default", "type": [ "null", "Any" ], "jsonldPredicate": "cwl:default", "doc": "The default value for this parameter if not provided in the input\nobject.\n" }, { "name": "https://w3id.org/cwl/cwl#InputParameter/type", "type": [ "null", "https://w3id.org/cwl/cwl#CWLType", "https://w3id.org/cwl/cwl#InputRecordSchema", "https://w3id.org/cwl/cwl#InputEnumSchema", "https://w3id.org/cwl/cwl#InputArraySchema", "string", { "type": "array", "items": [ "https://w3id.org/cwl/cwl#CWLType", "https://w3id.org/cwl/cwl#InputRecordSchema", "https://w3id.org/cwl/cwl#InputEnumSchema", "https://w3id.org/cwl/cwl#InputArraySchema", "string" ] } ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#type", "_type": "@vocab", "refScope": 2, "typeDSL": true }, "doc": "Specify valid types of data that may be assigned to this parameter.\n" } ] }, { "name": "https://w3id.org/cwl/cwl#OutputParameter", "type": "record", "extends": "https://w3id.org/cwl/cwl#Parameter", "fields": [ { "name": "https://w3id.org/cwl/cwl#OutputParameter/id", "type": "string", "jsonldPredicate": "@id", "doc": "The unique identifier for this parameter object." }, { "name": "https://w3id.org/cwl/cwl#OutputParameter/outputBinding", "type": [ "null", "https://w3id.org/cwl/cwl#OutputBinding" ], "jsonldPredicate": "cwl:outputBinding", "doc": "Describes how to handle the outputs of a process.\n" } ] }, { "type": "record", "name": "https://w3id.org/cwl/cwl#ProcessRequirement", "abstract": true, "doc": "A process requirement declares a prerequisite that may or must be fulfilled\nbefore executing a process. See [`Process.hints`](#process) and\n[`Process.requirements`](#process).\n\nProcess requirements are the primary mechanism for specifying extensions to\nthe CWL core specification.\n" }, { "type": "record", "name": "https://w3id.org/cwl/cwl#Process", "abstract": true, "doc": "\nThe base executable type in CWL is the `Process` object defined by the\ndocument. Note that the `Process` object is abstract and cannot be\ndirectly executed.\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#Process/id", "type": [ "null", "string" ], "jsonldPredicate": "@id", "doc": "The unique identifier for this process object." }, { "name": "https://w3id.org/cwl/cwl#Process/inputs", "type": { "type": "array", "items": "https://w3id.org/cwl/cwl#InputParameter" }, "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#inputs", "mapSubject": "id", "mapPredicate": "type" }, "doc": "Defines the input parameters of the process. The process is ready to\nrun when all required input parameters are associated with concrete\nvalues. Input parameters include a schema for each parameter which is\nused to validate the input object. It may also be used to build a user\ninterface for constructing the input object.\n" }, { "name": "https://w3id.org/cwl/cwl#Process/outputs", "type": { "type": "array", "items": "https://w3id.org/cwl/cwl#OutputParameter" }, "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#outputs", "mapSubject": "id", "mapPredicate": "type" }, "doc": "Defines the parameters representing the output of the process. May be\nused to generate and/or validate the output object.\n" }, { "name": "https://w3id.org/cwl/cwl#Process/requirements", "type": [ "null", { "type": "array", "items": "https://w3id.org/cwl/cwl#ProcessRequirement" } ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#requirements", "mapSubject": "class" }, "doc": "Declares requirements that apply to either the runtime environment or the\nworkflow engine that must be met in order to execute this process. If\nan implementation cannot satisfy all requirements, or a requirement is\nlisted which is not recognized by the implementation, it is a fatal\nerror and the implementation must not attempt to run the process,\nunless overridden at user option.\n" }, { "name": "https://w3id.org/cwl/cwl#Process/hints", "type": [ "null", { "type": "array", "items": "Any" } ], "doc": "Declares hints applying to either the runtime environment or the\nworkflow engine that may be helpful in executing this process. It is\nnot an error if an implementation cannot satisfy all hints, however\nthe implementation may report a warning.\n", "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#hints", "noLinkCheck": true, "mapSubject": "class" } }, { "name": "https://w3id.org/cwl/cwl#Process/label", "type": [ "null", "string" ], "jsonldPredicate": "rdfs:label", "doc": "A short, human-readable label of this process object." }, { "name": "https://w3id.org/cwl/cwl#Process/doc", "type": [ "null", "string" ], "jsonldPredicate": "rdfs:comment", "doc": "A long, human-readable description of this process object." }, { "name": "https://w3id.org/cwl/cwl#Process/cwlVersion", "type": [ "null", "https://w3id.org/cwl/cwl#CWLVersion" ], "doc": "CWL document version. Always required at the document root. Not\nrequired for a Process embedded inside another Process.\n", "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#cwlVersion", "_type": "@vocab" } } ] }, { "name": "https://w3id.org/cwl/cwl#InlineJavascriptRequirement", "type": "record", "extends": "https://w3id.org/cwl/cwl#ProcessRequirement", "doc": "Indicates that the workflow platform must support inline Javascript expressions.\nIf this requirement is not present, the workflow platform must not perform expression\ninterpolatation.\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#InlineJavascriptRequirement/class", "type": "string", "doc": "Always 'InlineJavascriptRequirement'", "jsonldPredicate": { "_id": "@type", "_type": "@vocab" } }, { "name": "https://w3id.org/cwl/cwl#InlineJavascriptRequirement/expressionLib", "type": [ "null", { "type": "array", "items": "string" } ], "doc": "Additional code fragments that will also be inserted\nbefore executing the expression code. Allows for function definitions that may\nbe called from CWL expressions.\n" } ] }, { "name": "https://w3id.org/cwl/cwl#SchemaDefRequirement", "type": "record", "extends": "https://w3id.org/cwl/cwl#ProcessRequirement", "doc": "This field consists of an array of type definitions which must be used when\ninterpreting the `inputs` and `outputs` fields. When a `type` field\ncontain a IRI, the implementation must check if the type is defined in\n`schemaDefs` and use that definition. If the type is not found in\n`schemaDefs`, it is an error. The entries in `schemaDefs` must be\nprocessed in the order listed such that later schema definitions may refer\nto earlier schema definitions.\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#SchemaDefRequirement/class", "type": "string", "doc": "Always 'SchemaDefRequirement'", "jsonldPredicate": { "_id": "@type", "_type": "@vocab" } }, { "name": "https://w3id.org/cwl/cwl#SchemaDefRequirement/types", "type": { "type": "array", "items": "https://w3id.org/cwl/cwl#InputSchema" }, "doc": "The list of type definitions." } ] }, { "name": "https://w3id.org/cwl/cwl#CommandLineToolDoc", "type": "documentation", "doc": [ "# Common Workflow Language (CWL) Command Line Tool Description, v1.0\n\nThis version:\n * https://w3id.org/cwl/v1.0/\n\nCurrent version:\n * https://w3id.org/cwl/\n", "\n\n", "\n", "\n\n", "# Abstract\n\nA Command Line Tool is a non-interactive executable program that reads\nsome input, performs a computation, and terminates after producing some\noutput. Command line programs are a flexible unit of code sharing and\nreuse, unfortunately the syntax and input/output semantics among command\nline programs is extremely heterogeneous. A common layer for describing\nthe syntax and semantics of programs can reduce this incidental\ncomplexity by providing a consistent way to connect programs together.\nThis specification defines the Common Workflow Language (CWL) Command\nLine Tool Description, a vendor-neutral standard for describing the\nsyntax and input/output semantics of command line programs.\n", "\n", "## Introduction to v1.0\n\nThis specification represents the first full release from the CWL group.\nSince draft-3, version 1.0 introduces the following changes and additions:\n\n * The [Directory](#Directory) type.\n * Syntax simplifcations: denoted by the `map<>` syntax. Example: inputs\n contains a list of items, each with an id. Now one can specify\n a mapping of that identifier to the corresponding\n `CommandInputParamater`.\n ```\n inputs:\n - id: one\n type: string\n doc: First input parameter\n - id: two\n type: int\n doc: Second input parameter\n ```\n can be\n ```\n inputs:\n one:\n type: string\n doc: First input parameter\n two:\n type: int\n doc: Second input parameter\n ```\n * [InitialWorkDirRequirement](#InitialWorkDirRequirement): list of\n files and subdirectories to be present in the output directory prior\n to execution.\n * Shortcuts for specifying the standard [output](#stdout) and/or\n [error](#stderr) streams as a (streamable) File output.\n * [SoftwareRequirement](#SoftwareRequirement) for describing software\n dependencies of a tool.\n * The common `description` field has been renamed to `doc`.\n\n## Errata\n\nPost v1.0 release changes to the spec.\n\n * 13 July 2016: Mark `baseCommand` as optional and update descriptive text.\n\n## Purpose\n\nStandalone programs are a flexible and interoperable form of code reuse.\nUnlike monolithic applications, applications and analysis workflows which\nare composed of multiple separate programs can be written in multiple\nlanguages and execute concurrently on multiple hosts. However, POSIX\ndoes not dictate computer-readable grammar or semantics for program input\nand output, resulting in extremely heterogeneous command line grammar and\ninput/output semantics among program. This is a particular problem in\ndistributed computing (multi-node compute clusters) and virtualized\nenvironments (such as Docker containers) where it is often necessary to\nprovision resources such as input files before executing the program.\n\nOften this gap is filled by hard coding program invocation and\nimplicitly assuming requirements will be met, or abstracting program\ninvocation with wrapper scripts or descriptor documents. Unfortunately,\nwhere these approaches are application or platform specific it creates a\nsignificant barrier to reproducibility and portability, as methods\ndeveloped for one platform must be manually ported to be used on new\nplatforms. Similarly it creates redundant work, as wrappers for popular\ntools must be rewritten for each application or platform in use.\n\nThe Common Workflow Language Command Line Tool Description is designed to\nprovide a common standard description of grammar and semantics for\ninvoking programs used in data-intensive fields such as Bioinformatics,\nChemistry, Physics, Astronomy, and Statistics. This specification\ndefines a precise data and execution model for Command Line Tools that\ncan be implemented on a variety of computing platforms, ranging from a\nsingle workstation to cluster, grid, cloud, and high performance\ncomputing platforms.\n", "\n", "\n" ] }, { "type": "record", "name": "https://w3id.org/cwl/cwl#EnvironmentDef", "doc": "Define an environment variable that will be set in the runtime environment\nby the workflow platform when executing the command line tool. May be the\nresult of executing an expression, such as getting a parameter from input.\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#EnvironmentDef/envName", "type": "string", "doc": "The environment variable name" }, { "name": "https://w3id.org/cwl/cwl#EnvironmentDef/envValue", "type": [ "string", "https://w3id.org/cwl/cwl#Expression" ], "doc": "The environment variable value" } ] }, { "type": "record", "name": "https://w3id.org/cwl/cwl#CommandLineBinding", "extends": "https://w3id.org/cwl/cwl#InputBinding", "doc": "\nWhen listed under `inputBinding` in the input schema, the term\n\"value\" refers to the the corresponding value in the input object. For\nbinding objects listed in `CommandLineTool.arguments`, the term \"value\"\nrefers to the effective value after evaluating `valueFrom`.\n\nThe binding behavior when building the command line depends on the data\ntype of the value. If there is a mismatch between the type described by\nthe input schema and the effective value, such as resulting from an\nexpression evaluation, an implementation must use the data type of the\neffective value.\n\n - **string**: Add `prefix` and the string to the command line.\n\n - **number**: Add `prefix` and decimal representation to command line.\n\n - **boolean**: If true, add `prefix` to the command line. If false, add\n nothing.\n\n - **File**: Add `prefix` and the value of\n [`File.path`](#File) to the command line.\n\n - **array**: If `itemSeparator` is specified, add `prefix` and the join\n the array into a single string with `itemSeparator` separating the\n items. Otherwise first add `prefix`, then recursively process\n individual elements.\n\n - **object**: Add `prefix` only, and recursively add object fields for\n which `inputBinding` is specified.\n\n - **null**: Add nothing.\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#CommandLineBinding/position", "type": [ "null", "int" ], "doc": "The sorting key. Default position is 0." }, { "name": "https://w3id.org/cwl/cwl#CommandLineBinding/prefix", "type": [ "null", "string" ], "doc": "Command line prefix to add before the value." }, { "name": "https://w3id.org/cwl/cwl#CommandLineBinding/separate", "type": [ "null", "boolean" ], "doc": "If true (default), then the prefix and value must be added as separate\ncommand line arguments; if false, prefix and value must be concatenated\ninto a single command line argument.\n" }, { "name": "https://w3id.org/cwl/cwl#CommandLineBinding/itemSeparator", "type": [ "null", "string" ], "doc": "Join the array elements into a single string with the elements\nseparated by by `itemSeparator`.\n" }, { "name": "https://w3id.org/cwl/cwl#CommandLineBinding/valueFrom", "type": [ "null", "string", "https://w3id.org/cwl/cwl#Expression" ], "jsonldPredicate": "cwl:valueFrom", "doc": "If `valueFrom` is a constant string value, use this as the value and\napply the binding rules above.\n\nIf `valueFrom` is an expression, evaluate the expression to yield the\nactual value to use to build the command line and apply the binding\nrules above. If the inputBinding is associated with an input\nparameter, the value of `self` in the expression will be the value of the\ninput parameter.\n\nWhen a binding is part of the `CommandLineTool.arguments` field,\nthe `valueFrom` field is required.\n" }, { "name": "https://w3id.org/cwl/cwl#CommandLineBinding/shellQuote", "type": [ "null", "boolean" ], "doc": "If `ShellCommandRequirement` is in the requirements for the current command,\nthis controls whether the value is quoted on the command line (default is true).\nUse `shellQuote: false` to inject metacharacters for operations such as pipes.\n" } ] }, { "type": "record", "name": "https://w3id.org/cwl/cwl#CommandOutputBinding", "extends": "https://w3id.org/cwl/cwl#OutputBinding", "doc": "Describes how to generate an output parameter based on the files produced\nby a CommandLineTool.\n\nThe output parameter is generated by applying these operations in\nthe following order:\n\n - glob\n - loadContents\n - outputEval\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#CommandOutputBinding/glob", "type": [ "null", "string", "https://w3id.org/cwl/cwl#Expression", { "type": "array", "items": "string" } ], "doc": "Find files relative to the output directory, using POSIX glob(3)\npathname matching. If an array is provided, find files that match any\npattern in the array. If an expression is provided, the expression must\nreturn a string or an array of strings, which will then be evaluated as\none or more glob patterns. Must only match and return files which\nactually exist.\n" }, { "name": "https://w3id.org/cwl/cwl#CommandOutputBinding/loadContents", "type": [ "null", "boolean" ], "jsonldPredicate": "cwl:loadContents", "doc": "For each file matched in `glob`, read up to\nthe first 64 KiB of text from the file and place it in the `contents`\nfield of the file object for manipulation by `outputEval`.\n" }, { "name": "https://w3id.org/cwl/cwl#CommandOutputBinding/outputEval", "type": [ "null", "string", "https://w3id.org/cwl/cwl#Expression" ], "doc": "Evaluate an expression to generate the output value. If `glob` was\nspecified, the value of `self` must be an array containing file objects\nthat were matched. If no files were matched, `self` must be a zero\nlength array; if a single file was matched, the value of `self` is an\narray of a single element. Additionally, if `loadContents` is `true`,\nthe File objects must include up to the first 64 KiB of file contents\nin the `contents` field.\n" } ] }, { "name": "https://w3id.org/cwl/cwl#CommandInputRecordField", "type": "record", "extends": "https://w3id.org/cwl/cwl#InputRecordField", "specialize": [ { "specializeFrom": "https://w3id.org/cwl/cwl#InputRecordSchema", "specializeTo": "https://w3id.org/cwl/cwl#CommandInputRecordSchema" }, { "specializeFrom": "https://w3id.org/cwl/cwl#InputEnumSchema", "specializeTo": "https://w3id.org/cwl/cwl#CommandInputEnumSchema" }, { "specializeFrom": "https://w3id.org/cwl/cwl#InputArraySchema", "specializeTo": "https://w3id.org/cwl/cwl#CommandInputArraySchema" }, { "specializeFrom": "https://w3id.org/cwl/cwl#InputBinding", "specializeTo": "https://w3id.org/cwl/cwl#CommandLineBinding" } ] }, { "name": "https://w3id.org/cwl/cwl#CommandInputRecordSchema", "type": "record", "extends": "https://w3id.org/cwl/cwl#InputRecordSchema", "specialize": [ { "specializeFrom": "https://w3id.org/cwl/cwl#InputRecordField", "specializeTo": "https://w3id.org/cwl/cwl#CommandInputRecordField" } ] }, { "name": "https://w3id.org/cwl/cwl#CommandInputEnumSchema", "type": "record", "extends": "https://w3id.org/cwl/cwl#InputEnumSchema", "specialize": [ { "specializeFrom": "https://w3id.org/cwl/cwl#InputBinding", "specializeTo": "https://w3id.org/cwl/cwl#CommandLineBinding" } ] }, { "name": "https://w3id.org/cwl/cwl#CommandInputArraySchema", "type": "record", "extends": "https://w3id.org/cwl/cwl#InputArraySchema", "specialize": [ { "specializeFrom": "https://w3id.org/cwl/cwl#InputRecordSchema", "specializeTo": "https://w3id.org/cwl/cwl#CommandInputRecordSchema" }, { "specializeFrom": "https://w3id.org/cwl/cwl#InputEnumSchema", "specializeTo": "https://w3id.org/cwl/cwl#CommandInputEnumSchema" }, { "specializeFrom": "https://w3id.org/cwl/cwl#InputArraySchema", "specializeTo": "https://w3id.org/cwl/cwl#CommandInputArraySchema" }, { "specializeFrom": "https://w3id.org/cwl/cwl#InputBinding", "specializeTo": "https://w3id.org/cwl/cwl#CommandLineBinding" } ] }, { "name": "https://w3id.org/cwl/cwl#CommandOutputRecordField", "type": "record", "extends": "https://w3id.org/cwl/cwl#OutputRecordField", "specialize": [ { "specializeFrom": "https://w3id.org/cwl/cwl#OutputRecordSchema", "specializeTo": "https://w3id.org/cwl/cwl#CommandOutputRecordSchema" }, { "specializeFrom": "https://w3id.org/cwl/cwl#OutputEnumSchema", "specializeTo": "https://w3id.org/cwl/cwl#CommandOutputEnumSchema" }, { "specializeFrom": "https://w3id.org/cwl/cwl#OutputArraySchema", "specializeTo": "https://w3id.org/cwl/cwl#CommandOutputArraySchema" }, { "specializeFrom": "https://w3id.org/cwl/cwl#OutputBinding", "specializeTo": "https://w3id.org/cwl/cwl#CommandOutputBinding" } ] }, { "name": "https://w3id.org/cwl/cwl#CommandOutputRecordSchema", "type": "record", "extends": "https://w3id.org/cwl/cwl#OutputRecordSchema", "specialize": [ { "specializeFrom": "https://w3id.org/cwl/cwl#OutputRecordField", "specializeTo": "https://w3id.org/cwl/cwl#CommandOutputRecordField" } ] }, { "name": "https://w3id.org/cwl/cwl#CommandOutputEnumSchema", "type": "record", "extends": "https://w3id.org/cwl/cwl#OutputEnumSchema", "specialize": [ { "specializeFrom": "https://w3id.org/cwl/cwl#OutputRecordSchema", "specializeTo": "https://w3id.org/cwl/cwl#CommandOutputRecordSchema" }, { "specializeFrom": "https://w3id.org/cwl/cwl#OutputEnumSchema", "specializeTo": "https://w3id.org/cwl/cwl#CommandOutputEnumSchema" }, { "specializeFrom": "https://w3id.org/cwl/cwl#OutputArraySchema", "specializeTo": "https://w3id.org/cwl/cwl#CommandOutputArraySchema" }, { "specializeFrom": "https://w3id.org/cwl/cwl#OutputBinding", "specializeTo": "https://w3id.org/cwl/cwl#CommandOutputBinding" } ] }, { "name": "https://w3id.org/cwl/cwl#CommandOutputArraySchema", "type": "record", "extends": "https://w3id.org/cwl/cwl#OutputArraySchema", "specialize": [ { "specializeFrom": "https://w3id.org/cwl/cwl#OutputRecordSchema", "specializeTo": "https://w3id.org/cwl/cwl#CommandOutputRecordSchema" }, { "specializeFrom": "https://w3id.org/cwl/cwl#OutputEnumSchema", "specializeTo": "https://w3id.org/cwl/cwl#CommandOutputEnumSchema" }, { "specializeFrom": "https://w3id.org/cwl/cwl#OutputArraySchema", "specializeTo": "https://w3id.org/cwl/cwl#CommandOutputArraySchema" }, { "specializeFrom": "https://w3id.org/cwl/cwl#OutputBinding", "specializeTo": "https://w3id.org/cwl/cwl#CommandOutputBinding" } ] }, { "type": "record", "name": "https://w3id.org/cwl/cwl#CommandInputParameter", "extends": "https://w3id.org/cwl/cwl#InputParameter", "doc": "An input parameter for a CommandLineTool.", "specialize": [ { "specializeFrom": "https://w3id.org/cwl/cwl#InputRecordSchema", "specializeTo": "https://w3id.org/cwl/cwl#CommandInputRecordSchema" }, { "specializeFrom": "https://w3id.org/cwl/cwl#InputEnumSchema", "specializeTo": "https://w3id.org/cwl/cwl#CommandInputEnumSchema" }, { "specializeFrom": "https://w3id.org/cwl/cwl#InputArraySchema", "specializeTo": "https://w3id.org/cwl/cwl#CommandInputArraySchema" }, { "specializeFrom": "https://w3id.org/cwl/cwl#InputBinding", "specializeTo": "https://w3id.org/cwl/cwl#CommandLineBinding" } ] }, { "type": "record", "name": "https://w3id.org/cwl/cwl#CommandOutputParameter", "extends": "https://w3id.org/cwl/cwl#OutputParameter", "doc": "An output parameter for a CommandLineTool.", "specialize": [ { "specializeFrom": "https://w3id.org/cwl/cwl#OutputBinding", "specializeTo": "https://w3id.org/cwl/cwl#CommandOutputBinding" } ], "fields": [ { "name": "https://w3id.org/cwl/cwl#CommandOutputParameter/type", "type": [ "null", "https://w3id.org/cwl/cwl#CWLType", "https://w3id.org/cwl/cwl#stdout", "https://w3id.org/cwl/cwl#stderr", "https://w3id.org/cwl/cwl#CommandOutputRecordSchema", "https://w3id.org/cwl/cwl#CommandOutputEnumSchema", "https://w3id.org/cwl/cwl#CommandOutputArraySchema", "string", { "type": "array", "items": [ "https://w3id.org/cwl/cwl#CWLType", "https://w3id.org/cwl/cwl#CommandOutputRecordSchema", "https://w3id.org/cwl/cwl#CommandOutputEnumSchema", "https://w3id.org/cwl/cwl#CommandOutputArraySchema", "string" ] } ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#type", "_type": "@vocab", "refScope": 2, "typeDSL": true }, "doc": "Specify valid types of data that may be assigned to this parameter.\n" } ] }, { "name": "https://w3id.org/cwl/cwl#stdout", "type": "enum", "symbols": [ "https://w3id.org/cwl/cwl#stdout" ], "docParent": "https://w3id.org/cwl/cwl#CommandOutputParameter", "doc": "Only valid as a `type` for a `CommandLineTool` output with no\n`outputBinding` set.\n\nThe following\n```\noutputs:\n an_output_name:\n type: stdout\n\nstdout: a_stdout_file\n```\nis equivalent to\n```\noutputs:\n an_output_name:\n type: File\n streamable: true\n outputBinding:\n glob: a_stdout_file\n\nstdout: a_stdout_file\n```\n\nIf there is no `stdout` name provided, a random filename will be created.\nFor example, the following\n```\noutputs:\n an_output_name:\n type: stdout\n```\nis equivalent to\n```\noutputs:\n an_output_name:\n type: File\n streamable: true\n outputBinding:\n glob: random_stdout_filenameABCDEFG\n\nstdout: random_stdout_filenameABCDEFG\n```\n" }, { "name": "https://w3id.org/cwl/cwl#stderr", "type": "enum", "symbols": [ "https://w3id.org/cwl/cwl#stderr" ], "docParent": "https://w3id.org/cwl/cwl#CommandOutputParameter", "doc": "Only valid as a `type` for a `CommandLineTool` output with no\n`outputBinding` set.\n\nThe following\n```\noutputs:\n an_output_name:\n type: stderr\n\nstderr: a_stderr_file\n```\nis equivalent to\n```\noutputs:\n an_output_name:\n type: File\n streamable: true\n outputBinding:\n glob: a_stderr_file\n\nstderr: a_stderr_file\n```\n\nIf there is no `stderr` name provided, a random filename will be created.\nFor example, the following\n```\noutputs:\n an_output_name:\n type: stderr\n```\nis equivalent to\n```\noutputs:\n an_output_name:\n type: File\n streamable: true\n outputBinding:\n glob: random_stderr_filenameABCDEFG\n\nstderr: random_stderr_filenameABCDEFG\n```\n" }, { "type": "record", "name": "https://w3id.org/cwl/cwl#CommandLineTool", "extends": "https://w3id.org/cwl/cwl#Process", "documentRoot": true, "specialize": [ { "specializeFrom": "https://w3id.org/cwl/cwl#InputParameter", "specializeTo": "https://w3id.org/cwl/cwl#CommandInputParameter" }, { "specializeFrom": "https://w3id.org/cwl/cwl#OutputParameter", "specializeTo": "https://w3id.org/cwl/cwl#CommandOutputParameter" } ], "doc": "This defines the schema of the CWL Command Line Tool Description document.\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#CommandLineTool/class", "jsonldPredicate": { "_id": "@type", "_type": "@vocab" }, "type": "string" }, { "name": "https://w3id.org/cwl/cwl#CommandLineTool/baseCommand", "doc": "Specifies the program to execute. If an array, the first element of\nthe array is the command to execute, and subsequent elements are\nmandatory command line arguments. The elements in `baseCommand` must\nappear before any command line bindings from `inputBinding` or\n`arguments`.\n\nIf `baseCommand` is not provided or is an empty array, the first\nelement of the command line produced after processing `inputBinding` or\n`arguments` must be used as the program to execute.\n\nIf the program includes a path separator character it must\nbe an absolute path, otherwise it is an error. If the program does not\ninclude a path separator, search the `$PATH` variable in the runtime\nenvironment of the workflow runner find the absolute path of the\nexecutable.\n", "type": [ "null", "string", { "type": "array", "items": "string" } ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#baseCommand", "_container": "@list" } }, { "name": "https://w3id.org/cwl/cwl#CommandLineTool/arguments", "doc": "Command line bindings which are not directly associated with input parameters.\n", "type": [ "null", { "type": "array", "items": [ "string", "https://w3id.org/cwl/cwl#Expression", "https://w3id.org/cwl/cwl#CommandLineBinding" ] } ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#arguments", "_container": "@list" } }, { "name": "https://w3id.org/cwl/cwl#CommandLineTool/stdin", "type": [ "null", "string", "https://w3id.org/cwl/cwl#Expression" ], "doc": "A path to a file whose contents must be piped into the command's\nstandard input stream.\n" }, { "name": "https://w3id.org/cwl/cwl#CommandLineTool/stderr", "type": [ "null", "string", "https://w3id.org/cwl/cwl#Expression" ], "jsonldPredicate": "https://w3id.org/cwl/cwl#stderr", "doc": "Capture the command's standard error stream to a file written to\nthe designated output directory.\n\nIf `stderr` is a string, it specifies the file name to use.\n\nIf `stderr` is an expression, the expression is evaluated and must\nreturn a string with the file name to use to capture stderr. If the\nreturn value is not a string, or the resulting path contains illegal\ncharacters (such as the path separator `/`) it is an error.\n" }, { "name": "https://w3id.org/cwl/cwl#CommandLineTool/stdout", "type": [ "null", "string", "https://w3id.org/cwl/cwl#Expression" ], "jsonldPredicate": "https://w3id.org/cwl/cwl#stdout", "doc": "Capture the command's standard output stream to a file written to\nthe designated output directory.\n\nIf `stdout` is a string, it specifies the file name to use.\n\nIf `stdout` is an expression, the expression is evaluated and must\nreturn a string with the file name to use to capture stdout. If the\nreturn value is not a string, or the resulting path contains illegal\ncharacters (such as the path separator `/`) it is an error.\n" }, { "name": "https://w3id.org/cwl/cwl#CommandLineTool/successCodes", "type": [ "null", { "type": "array", "items": "int" } ], "doc": "Exit codes that indicate the process completed successfully.\n" }, { "name": "https://w3id.org/cwl/cwl#CommandLineTool/temporaryFailCodes", "type": [ "null", { "type": "array", "items": "int" } ], "doc": "Exit codes that indicate the process failed due to a possibly\ntemporary condition, where executing the process with the same\nruntime environment and inputs may produce different results.\n" }, { "name": "https://w3id.org/cwl/cwl#CommandLineTool/permanentFailCodes", "type": [ "null", { "type": "array", "items": "int" } ], "doc": "Exit codes that indicate the process failed due to a permanent logic error, where executing the process with the same runtime environment and same inputs is expected to always fail." } ] }, { "type": "record", "name": "https://w3id.org/cwl/cwl#DockerRequirement", "extends": "https://w3id.org/cwl/cwl#ProcessRequirement", "doc": "Indicates that a workflow component should be run in a\n[Docker](http://docker.com) container, and specifies how to fetch or build\nthe image.\n\nIf a CommandLineTool lists `DockerRequirement` under\n`hints` (or `requirements`), it may (or must) be run in the specified Docker\ncontainer.\n\nThe platform must first acquire or install the correct Docker image as\nspecified by `dockerPull`, `dockerImport`, `dockerLoad` or `dockerFile`.\n\nThe platform must execute the tool in the container using `docker run` with\nthe appropriate Docker image and tool command line.\n\nThe workflow platform may provide input files and the designated output\ndirectory through the use of volume bind mounts. The platform may rewrite\nfile paths in the input object to correspond to the Docker bind mounted\nlocations.\n\nWhen running a tool contained in Docker, the workflow platform must not\nassume anything about the contents of the Docker container, such as the\npresence or absence of specific software, except to assume that the\ngenerated command line represents a valid command within the runtime\nenvironment of the container.\n\n## Interaction with other requirements\n\nIf [EnvVarRequirement](#EnvVarRequirement) is specified alongside a\nDockerRequirement, the environment variables must be provided to Docker\nusing `--env` or `--env-file` and interact with the container's preexisting\nenvironment as defined by Docker.\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#DockerRequirement/class", "type": "string", "doc": "Always 'DockerRequirement'", "jsonldPredicate": { "_id": "@type", "_type": "@vocab" } }, { "name": "https://w3id.org/cwl/cwl#DockerRequirement/dockerPull", "type": [ "null", "string" ], "doc": "Specify a Docker image to retrieve using `docker pull`." }, { "name": "https://w3id.org/cwl/cwl#DockerRequirement/dockerLoad", "type": [ "null", "string" ], "doc": "Specify a HTTP URL from which to download a Docker image using `docker load`." }, { "name": "https://w3id.org/cwl/cwl#DockerRequirement/dockerFile", "type": [ "null", "string" ], "doc": "Supply the contents of a Dockerfile which will be built using `docker build`." }, { "name": "https://w3id.org/cwl/cwl#DockerRequirement/dockerImport", "type": [ "null", "string" ], "doc": "Provide HTTP URL to download and gunzip a Docker images using `docker import." }, { "name": "https://w3id.org/cwl/cwl#DockerRequirement/dockerImageId", "type": [ "null", "string" ], "doc": "The image id that will be used for `docker run`. May be a\nhuman-readable image name or the image identifier hash. May be skipped\nif `dockerPull` is specified, in which case the `dockerPull` image id\nmust be used.\n" }, { "name": "https://w3id.org/cwl/cwl#DockerRequirement/dockerOutputDirectory", "type": [ "null", "string" ], "doc": "Set the designated output directory to a specific location inside the\nDocker container.\n" } ] }, { "type": "record", "name": "https://w3id.org/cwl/cwl#SoftwareRequirement", "extends": "https://w3id.org/cwl/cwl#ProcessRequirement", "doc": "A list of software packages that should be configured in the environment of\nthe defined process.\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#SoftwareRequirement/class", "type": "string", "doc": "Always 'SoftwareRequirement'", "jsonldPredicate": { "_id": "@type", "_type": "@vocab" } }, { "name": "https://w3id.org/cwl/cwl#SoftwareRequirement/packages", "type": { "type": "array", "items": "https://w3id.org/cwl/cwl#SoftwarePackage" }, "doc": "The list of software to be configured.", "jsonldPredicate": { "mapSubject": "package", "mapPredicate": "specs" } } ] }, { "name": "https://w3id.org/cwl/cwl#SoftwarePackage", "type": "record", "fields": [ { "name": "https://w3id.org/cwl/cwl#SoftwarePackage/package", "type": "string", "doc": "The common name of the software to be configured." }, { "name": "https://w3id.org/cwl/cwl#SoftwarePackage/version", "type": [ "null", { "type": "array", "items": "string" } ], "doc": "The (optional) version of the software to configured." }, { "name": "https://w3id.org/cwl/cwl#SoftwarePackage/specs", "type": [ "null", { "type": "array", "items": "string" } ], "doc": "Must be one or more IRIs identifying resources for installing or\nenabling the software. Implementations may provide resolvers which map\nwell-known software spec IRIs to some configuration action.\n\nFor example, an IRI `https://packages.debian.org/jessie/bowtie` could\nbe resolved with `apt-get install bowtie`. An IRI\n`https://anaconda.org/bioconda/bowtie` could be resolved with `conda\ninstall -c bioconda bowtie`.\n\nTools may also provide IRIs to index entries such as\n[RRID](http://www.identifiers.org/rrid/), such as\n`http://identifiers.org/rrid/RRID:SCR_005476`\n" } ] }, { "name": "https://w3id.org/cwl/cwl#Dirent", "type": "record", "doc": "Define a file or subdirectory that must be placed in the designated output\ndirectory prior to executing the command line tool. May be the result of\nexecuting an expression, such as building a configuration file from a\ntemplate.\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#Dirent/entryname", "type": [ "null", "string", "https://w3id.org/cwl/cwl#Expression" ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#entryname" }, "doc": "The name of the file or subdirectory to create in the output directory.\nIf `entry` is a File or Directory, this overrides `basename`. Optional.\n" }, { "name": "https://w3id.org/cwl/cwl#Dirent/entry", "type": [ "string", "https://w3id.org/cwl/cwl#Expression" ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#entry" }, "doc": "If the value is a string literal or an expression which evaluates to a\nstring, a new file must be created with the string as the file contents.\n\nIf the value is an expression that evaluates to a `File` object, this\nindicates the referenced file should be added to the designated output\ndirectory prior to executing the tool.\n\nIf the value is an expression that evaluates to a `Dirent` object, this\nindicates that the File or Directory in `entry` should be added to the\ndesignated output directory with the name in `entryname`.\n\nIf `writable` is false, the file may be made available using a bind\nmount or file system link to avoid unnecessary copying of the input\nfile.\n" }, { "name": "https://w3id.org/cwl/cwl#Dirent/writable", "type": [ "null", "boolean" ], "doc": "If true, the file or directory must be writable by the tool. Changes\nto the file or directory must be isolated and not visible by any other\nCommandLineTool process. This may be implemented by making a copy of\nthe original file or directory. Default false (files and directories\nread-only by default).\n" } ] }, { "name": "https://w3id.org/cwl/cwl#InitialWorkDirRequirement", "type": "record", "extends": "https://w3id.org/cwl/cwl#ProcessRequirement", "doc": "Define a list of files and subdirectories that must be created by the workflow platform in the designated output directory prior to executing the command line tool.", "fields": [ { "name": "https://w3id.org/cwl/cwl#InitialWorkDirRequirement/class", "type": "string", "doc": "InitialWorkDirRequirement", "jsonldPredicate": { "_id": "@type", "_type": "@vocab" } }, { "name": "https://w3id.org/cwl/cwl#InitialWorkDirRequirement/listing", "type": [ { "type": "array", "items": [ "https://w3id.org/cwl/cwl#File", "https://w3id.org/cwl/cwl#Directory", "https://w3id.org/cwl/cwl#Dirent", "string", "https://w3id.org/cwl/cwl#Expression" ] }, "string", "https://w3id.org/cwl/cwl#Expression" ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#listing" }, "doc": "The list of files or subdirectories that must be placed in the\ndesignated output directory prior to executing the command line tool.\n\nMay be an expression. If so, the expression return value must validate\nas `{type: array, items: [File, Directory]}`.\n" } ] }, { "name": "https://w3id.org/cwl/cwl#EnvVarRequirement", "type": "record", "extends": "https://w3id.org/cwl/cwl#ProcessRequirement", "doc": "Define a list of environment variables which will be set in the\nexecution environment of the tool. See `EnvironmentDef` for details.\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#EnvVarRequirement/class", "type": "string", "doc": "Always 'EnvVarRequirement'", "jsonldPredicate": { "_id": "@type", "_type": "@vocab" } }, { "name": "https://w3id.org/cwl/cwl#EnvVarRequirement/envDef", "type": { "type": "array", "items": "https://w3id.org/cwl/cwl#EnvironmentDef" }, "doc": "The list of environment variables.", "jsonldPredicate": { "mapSubject": "envName", "mapPredicate": "envValue" } } ] }, { "type": "record", "name": "https://w3id.org/cwl/cwl#ShellCommandRequirement", "extends": "https://w3id.org/cwl/cwl#ProcessRequirement", "doc": "Modify the behavior of CommandLineTool to generate a single string\ncontaining a shell command line. Each item in the argument list must be\njoined into a string separated by single spaces and quoted to prevent\nintepretation by the shell, unless `CommandLineBinding` for that argument\ncontains `shellQuote: false`. If `shellQuote: false` is specified, the\nargument is joined into the command string without quoting, which allows\nthe use of shell metacharacters such as `|` for pipes.\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#ShellCommandRequirement/class", "type": "string", "doc": "Always 'ShellCommandRequirement'", "jsonldPredicate": { "_id": "@type", "_type": "@vocab" } } ] }, { "type": "record", "name": "https://w3id.org/cwl/cwl#ResourceRequirement", "extends": "https://w3id.org/cwl/cwl#ProcessRequirement", "doc": "Specify basic hardware resource requirements.\n\n\"min\" is the minimum amount of a resource that must be reserved to schedule\na job. If \"min\" cannot be satisfied, the job should not be run.\n\n\"max\" is the maximum amount of a resource that the job shall be permitted\nto use. If a node has sufficient resources, multiple jobs may be scheduled\non a single node provided each job's \"max\" resource requirements are\nmet. If a job attempts to exceed its \"max\" resource allocation, an\nimplementation may deny additional resources, which may result in job\nfailure.\n\nIf \"min\" is specified but \"max\" is not, then \"max\" == \"min\"\nIf \"max\" is specified by \"min\" is not, then \"min\" == \"max\".\n\nIt is an error if max < min.\n\nIt is an error if the value of any of these fields is negative.\n\nIf neither \"min\" nor \"max\" is specified for a resource, an implementation may provide a default.\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#ResourceRequirement/class", "type": "string", "doc": "Always 'ResourceRequirement'", "jsonldPredicate": { "_id": "@type", "_type": "@vocab" } }, { "name": "https://w3id.org/cwl/cwl#ResourceRequirement/coresMin", "type": [ "null", "long", "string", "https://w3id.org/cwl/cwl#Expression" ], "doc": "Minimum reserved number of CPU cores" }, { "name": "https://w3id.org/cwl/cwl#ResourceRequirement/coresMax", "type": [ "null", "int", "string", "https://w3id.org/cwl/cwl#Expression" ], "doc": "Maximum reserved number of CPU cores" }, { "name": "https://w3id.org/cwl/cwl#ResourceRequirement/ramMin", "type": [ "null", "long", "string", "https://w3id.org/cwl/cwl#Expression" ], "doc": "Minimum reserved RAM in mebibytes (2**20)" }, { "name": "https://w3id.org/cwl/cwl#ResourceRequirement/ramMax", "type": [ "null", "long", "string", "https://w3id.org/cwl/cwl#Expression" ], "doc": "Maximum reserved RAM in mebibytes (2**20)" }, { "name": "https://w3id.org/cwl/cwl#ResourceRequirement/tmpdirMin", "type": [ "null", "long", "string", "https://w3id.org/cwl/cwl#Expression" ], "doc": "Minimum reserved filesystem based storage for the designated temporary directory, in mebibytes (2**20)" }, { "name": "https://w3id.org/cwl/cwl#ResourceRequirement/tmpdirMax", "type": [ "null", "long", "string", "https://w3id.org/cwl/cwl#Expression" ], "doc": "Maximum reserved filesystem based storage for the designated temporary directory, in mebibytes (2**20)" }, { "name": "https://w3id.org/cwl/cwl#ResourceRequirement/outdirMin", "type": [ "null", "long", "string", "https://w3id.org/cwl/cwl#Expression" ], "doc": "Minimum reserved filesystem based storage for the designated output directory, in mebibytes (2**20)" }, { "name": "https://w3id.org/cwl/cwl#ResourceRequirement/outdirMax", "type": [ "null", "long", "string", "https://w3id.org/cwl/cwl#Expression" ], "doc": "Maximum reserved filesystem based storage for the designated output directory, in mebibytes (2**20)" } ] }, { "name": "https://w3id.org/cwl/cwl#WorkflowDoc", "type": "documentation", "doc": [ "# Common Workflow Language (CWL) Workflow Description, v1.0\n\nThis version:\n * https://w3id.org/cwl/v1.0/\n\nCurrent version:\n * https://w3id.org/cwl/\n", "\n\n", "\n", "\n\n", "# Abstract\n\nOne way to define a workflow is: an analysis task represented by a\ndirected graph describing a sequence of operations that transform an\ninput data set to output. This specification defines the Common Workflow\nLanguage (CWL) Workflow description, a vendor-neutral standard for\nrepresenting workflows intended to be portable across a variety of\ncomputing platforms.\n", "\n", "\n## Introduction to v1.0\n\nThis specification represents the first full release from the CWL group.\nSince draft-3, this draft introduces the following changes and additions:\n\n * The `inputs` and `outputs` fields have been renamed `in` and `out`.\n * Syntax simplifcations: denoted by the `map<>` syntax. Example: `in`\n contains a list of items, each with an id. Now one can specify\n a mapping of that identifier to the corresponding\n `InputParameter`.\n ```\n in:\n - id: one\n type: string\n doc: First input parameter\n - id: two\n type: int\n doc: Second input parameter\n ```\n can be\n ```\n in:\n one:\n type: string\n doc: First input parameter\n two:\n type: int\n doc: Second input parameter\n ```\n * The common field `description` has been renamed to `doc`.\n\n## Purpose\n\nThe Common Workflow Language Command Line Tool Description express\nworkflows for data-intensive science, such as Bioinformatics, Chemistry,\nPhysics, and Astronomy. This specification is intended to define a data\nand execution model for Workflows that can be implemented on top of a\nvariety of computing platforms, ranging from an individual workstation to\ncluster, grid, cloud, and high performance computing systems.\n", "\n" ] }, { "name": "https://w3id.org/cwl/cwl#ExpressionToolOutputParameter", "type": "record", "extends": "https://w3id.org/cwl/cwl#OutputParameter", "fields": [ { "name": "https://w3id.org/cwl/cwl#ExpressionToolOutputParameter/type", "type": [ "null", "https://w3id.org/cwl/cwl#CWLType", "https://w3id.org/cwl/cwl#OutputRecordSchema", "https://w3id.org/cwl/cwl#OutputEnumSchema", "https://w3id.org/cwl/cwl#OutputArraySchema", "string", { "type": "array", "items": [ "https://w3id.org/cwl/cwl#CWLType", "https://w3id.org/cwl/cwl#OutputRecordSchema", "https://w3id.org/cwl/cwl#OutputEnumSchema", "https://w3id.org/cwl/cwl#OutputArraySchema", "string" ] } ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#type", "_type": "@vocab", "refScope": 2, "typeDSL": true }, "doc": "Specify valid types of data that may be assigned to this parameter.\n" } ] }, { "type": "record", "name": "https://w3id.org/cwl/cwl#ExpressionTool", "extends": "https://w3id.org/cwl/cwl#Process", "specialize": [ { "specializeFrom": "https://w3id.org/cwl/cwl#OutputParameter", "specializeTo": "https://w3id.org/cwl/cwl#ExpressionToolOutputParameter" } ], "documentRoot": true, "doc": "Execute an expression as a Workflow step.\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#ExpressionTool/class", "jsonldPredicate": { "_id": "@type", "_type": "@vocab" }, "type": "string" }, { "name": "https://w3id.org/cwl/cwl#ExpressionTool/expression", "type": [ "string", "https://w3id.org/cwl/cwl#Expression" ], "doc": "The expression to execute. The expression must return a JSON object which\nmatches the output parameters of the ExpressionTool.\n" } ] }, { "name": "https://w3id.org/cwl/cwl#LinkMergeMethod", "type": "enum", "docParent": "https://w3id.org/cwl/cwl#WorkflowStepInput", "doc": "The input link merge method, described in [WorkflowStepInput](#WorkflowStepInput).", "symbols": [ "https://w3id.org/cwl/cwl#LinkMergeMethod/merge_nested", "https://w3id.org/cwl/cwl#LinkMergeMethod/merge_flattened" ] }, { "name": "https://w3id.org/cwl/cwl#WorkflowOutputParameter", "type": "record", "extends": "https://w3id.org/cwl/cwl#OutputParameter", "docParent": "https://w3id.org/cwl/cwl#Workflow", "doc": "Describe an output parameter of a workflow. The parameter must be\nconnected to one or more parameters defined in the workflow that will\nprovide the value of the output parameter.\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#WorkflowOutputParameter/outputSource", "doc": "Specifies one or more workflow parameters that supply the value of to\nthe output parameter.\n", "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#outputSource", "_type": "@id", "refScope": 0 }, "type": [ "null", "string", { "type": "array", "items": "string" } ] }, { "name": "https://w3id.org/cwl/cwl#WorkflowOutputParameter/linkMerge", "type": [ "null", "https://w3id.org/cwl/cwl#LinkMergeMethod" ], "jsonldPredicate": "cwl:linkMerge", "doc": "The method to use to merge multiple sources into a single array.\nIf not specified, the default method is \"merge_nested\".\n" }, { "name": "https://w3id.org/cwl/cwl#WorkflowOutputParameter/type", "type": [ "null", "https://w3id.org/cwl/cwl#CWLType", "https://w3id.org/cwl/cwl#OutputRecordSchema", "https://w3id.org/cwl/cwl#OutputEnumSchema", "https://w3id.org/cwl/cwl#OutputArraySchema", "string", { "type": "array", "items": [ "https://w3id.org/cwl/cwl#CWLType", "https://w3id.org/cwl/cwl#OutputRecordSchema", "https://w3id.org/cwl/cwl#OutputEnumSchema", "https://w3id.org/cwl/cwl#OutputArraySchema", "string" ] } ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#type", "_type": "@vocab", "refScope": 2, "typeDSL": true }, "doc": "Specify valid types of data that may be assigned to this parameter.\n" } ] }, { "name": "https://w3id.org/cwl/cwl#Sink", "type": "record", "abstract": true, "fields": [ { "name": "https://w3id.org/cwl/cwl#Sink/source", "doc": "Specifies one or more workflow parameters that will provide input to\nthe underlying step parameter.\n", "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#source", "_type": "@id", "refScope": 2 }, "type": [ "null", "string", { "type": "array", "items": "string" } ] }, { "name": "https://w3id.org/cwl/cwl#Sink/linkMerge", "type": [ "null", "https://w3id.org/cwl/cwl#LinkMergeMethod" ], "jsonldPredicate": "cwl:linkMerge", "doc": "The method to use to merge multiple inbound links into a single array.\nIf not specified, the default method is \"merge_nested\".\n" } ] }, { "type": "record", "name": "https://w3id.org/cwl/cwl#WorkflowStepInput", "extends": "https://w3id.org/cwl/cwl#Sink", "docParent": "https://w3id.org/cwl/cwl#WorkflowStep", "doc": "The input of a workflow step connects an upstream parameter (from the\nworkflow inputs, or the outputs of other workflows steps) with the input\nparameters of the underlying step.\n\n## Input object\n\nA WorkflowStepInput object must contain an `id` field in the form\n`#fieldname` or `#stepname.fieldname`. When the `id` field contains a\nperiod `.` the field name consists of the characters following the final\nperiod. This defines a field of the workflow step input object with the\nvalue of the `source` parameter(s).\n\n## Merging\n\nTo merge multiple inbound data links,\n[MultipleInputFeatureRequirement](#MultipleInputFeatureRequirement) must be specified\nin the workflow or workflow step requirements.\n\nIf the sink parameter is an array, or named in a [workflow\nscatter](#WorkflowStep) operation, there may be multiple inbound data links\nlisted in the `source` field. The values from the input links are merged\ndepending on the method specified in the `linkMerge` field. If not\nspecified, the default method is \"merge_nested\".\n\n* **merge_nested**\n\n The input must be an array consisting of exactly one entry for each\n input link. If \"merge_nested\" is specified with a single link, the value\n from the link must be wrapped in a single-item list.\n\n* **merge_flattened**\n\n 1. The source and sink parameters must be compatible types, or the source\n type must be compatible with single element from the \"items\" type of\n the destination array parameter.\n 2. Source parameters which are arrays are concatenated.\n Source parameters which are single element types are appended as\n single elements.\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#WorkflowStepInput/id", "type": "string", "jsonldPredicate": "@id", "doc": "A unique identifier for this workflow input parameter." }, { "name": "https://w3id.org/cwl/cwl#WorkflowStepInput/default", "type": [ "null", "Any" ], "doc": "The default value for this parameter if there is no `source`\nfield.\n", "jsonldPredicate": "cwl:default" }, { "name": "https://w3id.org/cwl/cwl#WorkflowStepInput/valueFrom", "type": [ "null", "string", "https://w3id.org/cwl/cwl#Expression" ], "jsonldPredicate": "cwl:valueFrom", "doc": "To use valueFrom, [StepInputExpressionRequirement](#StepInputExpressionRequirement) must\nbe specified in the workflow or workflow step requirements.\n\nIf `valueFrom` is a constant string value, use this as the value for\nthis input parameter.\n\nIf `valueFrom` is a parameter reference or expression, it must be\nevaluated to yield the actual value to be assiged to the input field.\n\nThe `self` value of in the parameter reference or expression must be\nthe value of the parameter(s) specified in the `source` field, or\nnull if there is no `source` field.\n\nThe value of `inputs` in the parameter reference or expression must be\nthe input object to the workflow step after assigning the `source`\nvalues and then scattering. The order of evaluating `valueFrom` among\nstep input parameters is undefined and the result of evaluating\n`valueFrom` on a parameter must not be visible to evaluation of\n`valueFrom` on other parameters.\n" } ] }, { "type": "record", "name": "https://w3id.org/cwl/cwl#WorkflowStepOutput", "docParent": "https://w3id.org/cwl/cwl#WorkflowStep", "doc": "Associate an output parameter of the underlying process with a workflow\nparameter. The workflow parameter (given in the `id` field) be may be used\nas a `source` to connect with input parameters of other workflow steps, or\nwith an output parameter of the process.\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#WorkflowStepOutput/id", "type": "string", "jsonldPredicate": "@id", "doc": "A unique identifier for this workflow output parameter. This is the\nidentifier to use in the `source` field of `WorkflowStepInput` to\nconnect the output value to downstream parameters.\n" } ] }, { "name": "https://w3id.org/cwl/cwl#ScatterMethod", "type": "enum", "docParent": "https://w3id.org/cwl/cwl#WorkflowStep", "doc": "The scatter method, as described in [workflow step scatter](#WorkflowStep).", "symbols": [ "https://w3id.org/cwl/cwl#ScatterMethod/dotproduct", "https://w3id.org/cwl/cwl#ScatterMethod/nested_crossproduct", "https://w3id.org/cwl/cwl#ScatterMethod/flat_crossproduct" ] }, { "name": "https://w3id.org/cwl/cwl#WorkflowStep", "type": "record", "docParent": "https://w3id.org/cwl/cwl#Workflow", "doc": "A workflow step is an executable element of a workflow. It specifies the\nunderlying process implementation (such as `CommandLineTool` or another\n`Workflow`) in the `run` field and connects the input and output parameters\nof the underlying process to workflow parameters.\n\n# Scatter/gather\n\nTo use scatter/gather,\n[ScatterFeatureRequirement](#ScatterFeatureRequirement) must be specified\nin the workflow or workflow step requirements.\n\nA \"scatter\" operation specifies that the associated workflow step or\nsubworkflow should execute separately over a list of input elements. Each\njob making up a scatter operation is independent and may be executed\nconcurrently.\n\nThe `scatter` field specifies one or more input parameters which will be\nscattered. An input parameter may be listed more than once. The declared\ntype of each input parameter is implicitly wrapped in an array for each\ntime it appears in the `scatter` field. As a result, upstream parameters\nwhich are connected to scattered parameters may be arrays.\n\nAll output parameter types are also implicitly wrapped in arrays. Each job\nin the scatter results in an entry in the output array.\n\nIf `scatter` declares more than one input parameter, `scatterMethod`\ndescribes how to decompose the input into a discrete set of jobs.\n\n * **dotproduct** specifies that each of the input arrays are aligned and one\n element taken from each array to construct each job. It is an error\n if all input arrays are not the same length.\n\n * **nested_crossproduct** specifies the Cartesian product of the inputs,\n producing a job for every combination of the scattered inputs. The\n output must be nested arrays for each level of scattering, in the\n order that the input arrays are listed in the `scatter` field.\n\n * **flat_crossproduct** specifies the Cartesian product of the inputs,\n producing a job for every combination of the scattered inputs. The\n output arrays must be flattened to a single level, but otherwise listed in the\n order that the input arrays are listed in the `scatter` field.\n\n# Subworkflows\n\nTo specify a nested workflow as part of a workflow step,\n[SubworkflowFeatureRequirement](#SubworkflowFeatureRequirement) must be\nspecified in the workflow or workflow step requirements.\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#WorkflowStep/id", "type": "string", "jsonldPredicate": "@id", "doc": "The unique identifier for this workflow step." }, { "name": "https://w3id.org/cwl/cwl#WorkflowStep/in", "type": { "type": "array", "items": "https://w3id.org/cwl/cwl#WorkflowStepInput" }, "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#in", "mapSubject": "id", "mapPredicate": "source" }, "doc": "Defines the input parameters of the workflow step. The process is ready to\nrun when all required input parameters are associated with concrete\nvalues. Input parameters include a schema for each parameter which is\nused to validate the input object. It may also be used build a user\ninterface for constructing the input object.\n" }, { "name": "https://w3id.org/cwl/cwl#WorkflowStep/out", "type": [ { "type": "array", "items": [ "string", "https://w3id.org/cwl/cwl#WorkflowStepOutput" ] } ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#out", "_type": "@id", "identity": true }, "doc": "Defines the parameters representing the output of the process. May be\nused to generate and/or validate the output object.\n" }, { "name": "https://w3id.org/cwl/cwl#WorkflowStep/requirements", "type": [ "null", { "type": "array", "items": "https://w3id.org/cwl/cwl#ProcessRequirement" } ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#requirements", "mapSubject": "class" }, "doc": "Declares requirements that apply to either the runtime environment or the\nworkflow engine that must be met in order to execute this workflow step. If\nan implementation cannot satisfy all requirements, or a requirement is\nlisted which is not recognized by the implementation, it is a fatal\nerror and the implementation must not attempt to run the process,\nunless overridden at user option.\n" }, { "name": "https://w3id.org/cwl/cwl#WorkflowStep/hints", "type": [ "null", { "type": "array", "items": "Any" } ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#hints", "noLinkCheck": true, "mapSubject": "class" }, "doc": "Declares hints applying to either the runtime environment or the\nworkflow engine that may be helpful in executing this workflow step. It is\nnot an error if an implementation cannot satisfy all hints, however\nthe implementation may report a warning.\n" }, { "name": "https://w3id.org/cwl/cwl#WorkflowStep/label", "type": [ "null", "string" ], "jsonldPredicate": "rdfs:label", "doc": "A short, human-readable label of this process object." }, { "name": "https://w3id.org/cwl/cwl#WorkflowStep/doc", "type": [ "null", "string" ], "jsonldPredicate": "rdfs:comment", "doc": "A long, human-readable description of this process object." }, { "name": "https://w3id.org/cwl/cwl#WorkflowStep/run", "type": [ "string", "https://w3id.org/cwl/cwl#Process" ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#run", "_type": "@id" }, "doc": "Specifies the process to run.\n" }, { "name": "https://w3id.org/cwl/cwl#WorkflowStep/scatter", "type": [ "null", "string", { "type": "array", "items": "string" } ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#scatter", "_type": "@id", "_container": "@list", "refScope": 0 } }, { "name": "https://w3id.org/cwl/cwl#WorkflowStep/scatterMethod", "doc": "Required if `scatter` is an array of more than one element.\n", "type": [ "null", "https://w3id.org/cwl/cwl#ScatterMethod" ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/cwl#scatterMethod", "_type": "@vocab" } } ] }, { "name": "https://w3id.org/cwl/cwl#Workflow", "type": "record", "extends": "https://w3id.org/cwl/cwl#Process", "documentRoot": true, "specialize": [ { "specializeFrom": "https://w3id.org/cwl/cwl#OutputParameter", "specializeTo": "https://w3id.org/cwl/cwl#WorkflowOutputParameter" } ], "doc": "A workflow describes a set of **steps** and the **dependencies** between\nthose steps. When a step produces output that will be consumed by a\nsecond step, the first step is a dependency of the second step.\n\nWhen there is a dependency, the workflow engine must execute the preceeding\nstep and wait for it to successfully produce output before executing the\ndependent step. If two steps are defined in the workflow graph that\nare not directly or indirectly dependent, these steps are **independent**,\nand may execute in any order or execute concurrently. A workflow is\ncomplete when all steps have been executed.\n\nDependencies between parameters are expressed using the `source` field on\n[workflow step input parameters](#WorkflowStepInput) and [workflow output\nparameters](#WorkflowOutputParameter).\n\nThe `source` field expresses the dependency of one parameter on another\nsuch that when a value is associated with the parameter specified by\n`source`, that value is propagated to the destination parameter. When all\ndata links inbound to a given step are fufilled, the step is ready to\nexecute.\n\n## Workflow success and failure\n\nA completed step must result in one of `success`, `temporaryFailure` or\n`permanentFailure` states. An implementation may choose to retry a step\nexecution which resulted in `temporaryFailure`. An implementation may\nchoose to either continue running other steps of a workflow, or terminate\nimmediately upon `permanentFailure`.\n\n* If any step of a workflow execution results in `permanentFailure`, then\nthe workflow status is `permanentFailure`.\n\n* If one or more steps result in `temporaryFailure` and all other steps\ncomplete `success` or are not executed, then the workflow status is\n`temporaryFailure`.\n\n* If all workflow steps are executed and complete with `success`, then the\nworkflow status is `success`.\n\n# Extensions\n\n[ScatterFeatureRequirement](#ScatterFeatureRequirement) and\n[SubworkflowFeatureRequirement](#SubworkflowFeatureRequirement) are\navailable as standard [extensions](#Extensions_and_Metadata) to core\nworkflow semantics.\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#Workflow/class", "jsonldPredicate": { "_id": "@type", "_type": "@vocab" }, "type": "string" }, { "name": "https://w3id.org/cwl/cwl#Workflow/steps", "doc": "The individual steps that make up the workflow. Each step is executed when all of its\ninput data links are fufilled. An implementation may choose to execute\nthe steps in a different order than listed and/or execute steps\nconcurrently, provided that dependencies between steps are met.\n", "type": [ { "type": "array", "items": "https://w3id.org/cwl/cwl#WorkflowStep" } ], "jsonldPredicate": { "mapSubject": "id" } } ] }, { "type": "record", "name": "https://w3id.org/cwl/cwl#SubworkflowFeatureRequirement", "extends": "https://w3id.org/cwl/cwl#ProcessRequirement", "doc": "Indicates that the workflow platform must support nested workflows in\nthe `run` field of [WorkflowStep](#WorkflowStep).\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#SubworkflowFeatureRequirement/class", "type": "string", "doc": "Always 'SubworkflowFeatureRequirement'", "jsonldPredicate": { "_id": "@type", "_type": "@vocab" } } ] }, { "name": "https://w3id.org/cwl/cwl#ScatterFeatureRequirement", "type": "record", "extends": "https://w3id.org/cwl/cwl#ProcessRequirement", "doc": "Indicates that the workflow platform must support the `scatter` and\n`scatterMethod` fields of [WorkflowStep](#WorkflowStep).\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#ScatterFeatureRequirement/class", "type": "string", "doc": "Always 'ScatterFeatureRequirement'", "jsonldPredicate": { "_id": "@type", "_type": "@vocab" } } ] }, { "name": "https://w3id.org/cwl/cwl#MultipleInputFeatureRequirement", "type": "record", "extends": "https://w3id.org/cwl/cwl#ProcessRequirement", "doc": "Indicates that the workflow platform must support multiple inbound data links\nlisted in the `source` field of [WorkflowStepInput](#WorkflowStepInput).\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#MultipleInputFeatureRequirement/class", "type": "string", "doc": "Always 'MultipleInputFeatureRequirement'", "jsonldPredicate": { "_id": "@type", "_type": "@vocab" } } ] }, { "type": "record", "name": "https://w3id.org/cwl/cwl#StepInputExpressionRequirement", "extends": "https://w3id.org/cwl/cwl#ProcessRequirement", "doc": "Indicate that the workflow platform must support the `valueFrom` field\nof [WorkflowStepInput](#WorkflowStepInput).\n", "fields": [ { "name": "https://w3id.org/cwl/cwl#StepInputExpressionRequirement/class", "type": "string", "doc": "Always 'StepInputExpressionRequirement'", "jsonldPredicate": { "_id": "@type", "_type": "@vocab" } } ] } ] schema-salad-2.6.20171201034858/schema_salad/tests/hello.txt0000644000175100017510000000001513203345013023155 0ustar peterpeter00000000000000hello world! schema-salad-2.6.20171201034858/schema_salad/tests/metaschema-pre.yml0000644000175100017510000014720313203345013024742 0ustar peterpeter00000000000000[ { "name": "https://w3id.org/cwl/salad#Semantic_Annotations_for_Linked_Avro_Data", "type": "documentation", "doc": [ "# Semantic Annotations for Linked Avro Data (SALAD)\n\nAuthor:\n\n* Peter Amstutz , Curoverse\n\nContributors:\n\n* The developers of Apache Avro\n* The developers of JSON-LD\n* Neboj\u0161a Tijani\u0107 , Seven Bridges Genomics\n\n# Abstract\n\nSalad is a schema language for describing structured linked data documents\nin JSON or YAML documents. A Salad schema provides rules for\npreprocessing, structural validation, and link checking for documents\ndescribed by a Salad schema. Salad builds on JSON-LD and the Apache Avro\ndata serialization system, and extends Avro with features for rich data\nmodeling such as inheritance, template specialization, object identifiers,\nand object references. Salad was developed to provide a bridge between the\nrecord oriented data modeling supported by Apache Avro and the Semantic\nWeb.\n\n# Status of This Document\n\nThis document is the product of the [Common Workflow Language working\ngroup](https://groups.google.com/forum/#!forum/common-workflow-language). The\nlatest version of this document is available in the \"schema_salad\" repository at\n\nhttps://github.com/common-workflow-language/schema_salad\n\nThe products of the CWL working group (including this document) are made available\nunder the terms of the Apache License, version 2.0.\n\n\n\n# Introduction\n\nThe JSON data model is an extremely popular way to represent structured\ndata. It is attractive because of its relative simplicity and is a\nnatural fit with the standard types of many programming languages.\nHowever, this simplicity means that basic JSON lacks expressive features\nuseful for working with complex data structures and document formats, such\nas schemas, object references, and namespaces.\n\nJSON-LD is a W3C standard providing a way to describe how to interpret a\nJSON document as Linked Data by means of a \"context\". JSON-LD provides a\npowerful solution for representing object references and namespaces in JSON\nbased on standard web URIs, but is not itself a schema language. Without a\nschema providing a well defined structure, it is difficult to process an\narbitrary JSON-LD document as idiomatic JSON because there are many ways to\nexpress the same data that are logically equivalent but structurally\ndistinct.\n\nSeveral schema languages exist for describing and validating JSON data,\nsuch as the Apache Avro data serialization system, however none understand\nlinked data. As a result, to fully take advantage of JSON-LD to build the\nnext generation of linked data applications, one must maintain separate\nJSON schema, JSON-LD context, RDF schema, and human documentation, despite\nsignificant overlap of content and obvious need for these documents to stay\nsynchronized.\n\nSchema Salad is designed to address this gap. It provides a schema\nlanguage and processing rules for describing structured JSON content\npermitting URI resolution and strict document validation. The schema\nlanguage supports linked data through annotations that describe the linked\ndata interpretation of the content, enables generation of JSON-LD context\nand RDF schema, and production of RDF triples by applying the JSON-LD\ncontext. The schema language also provides for robust support of inline\ndocumentation.\n\n## Introduction to v1.0\n\nThis is the second version of of the Schema Salad specification. It is\ndeveloped concurrently with v1.0 of the Common Workflow Language for use in\nspecifying the Common Workflow Language, however Schema Salad is intended to be\nuseful to a broader audience. Compared to the draft-1 schema salad\nspecification, the following changes have been made:\n\n* Use of [mapSubject and mapPredicate](#Identifier_maps) to transform maps to lists of records.\n* Resolution of the [domain Specific Language for types](#Domain_Specific_Language_for_types)\n* Consolidation of the formal [schema into section 5](#Schema).\n\n## References to Other Specifications\n\n**Javascript Object Notation (JSON)**: http://json.org\n\n**JSON Linked Data (JSON-LD)**: http://json-ld.org\n\n**YAML**: http://yaml.org\n\n**Avro**: https://avro.apache.org/docs/current/spec.html\n\n**Uniform Resource Identifier (URI) Generic Syntax**: https://tools.ietf.org/html/rfc3986)\n\n**Resource Description Framework (RDF)**: http://www.w3.org/RDF/\n\n**UTF-8**: https://www.ietf.org/rfc/rfc2279.txt)\n\n## Scope\n\nThis document describes the syntax, data model, algorithms, and schema\nlanguage for working with Salad documents. It is not intended to document\na specific implementation of Salad, however it may serve as a reference for\nthe behavior of conforming implementations.\n\n## Terminology\n\nThe terminology used to describe Salad documents is defined in the Concepts\nsection of the specification. The terms defined in the following list are\nused in building those definitions and in describing the actions of an\nSalad implementation:\n\n**may**: Conforming Salad documents and Salad implementations are permitted but\nnot required to be interpreted as described.\n\n**must**: Conforming Salad documents and Salad implementations are required\nto be interpreted as described; otherwise they are in error.\n\n**error**: A violation of the rules of this specification; results are\nundefined. Conforming implementations may detect and report an error and may\nrecover from it.\n\n**fatal error**: A violation of the rules of this specification; results\nare undefined. Conforming implementations must not continue to process the\ndocument and may report an error.\n\n**at user option**: Conforming software may or must (depending on the modal verb in\nthe sentence) behave as described; if it does, it must provide users a means to\nenable or disable the behavior described.\n\n# Document model\n\n## Data concepts\n\nAn **object** is a data structure equivalent to the \"object\" type in JSON,\nconsisting of a unordered set of name/value pairs (referred to here as\n**fields**) and where the name is a string and the value is a string, number,\nboolean, array, or object.\n\nA **document** is a file containing a serialized object, or an array of\nobjects.\n\nA **document type** is a class of files that share a common structure and\nsemantics.\n\nA **document schema** is a formal description of the grammar of a document type.\n\nA **base URI** is a context-dependent URI used to resolve relative references.\n\nAn **identifier** is a URI that designates a single document or single\nobject within a document.\n\nA **vocabulary** is the set of symbolic field names and enumerated symbols defined\nby a document schema, where each term maps to absolute URI.\n\n## Syntax\n\nConforming Salad documents are serialized and loaded using YAML syntax and\nUTF-8 text encoding. Salad documents are written using the JSON-compatible\nsubset of YAML. Features of YAML such as headers and type tags that are\nnot found in the standard JSON data model must not be used in conforming\nSalad documents. It is a fatal error if the document is not valid YAML.\n\nA Salad document must consist only of either a single root object or an\narray of objects.\n\n## Document context\n\n### Implied context\n\nThe implicit context consists of the vocabulary defined by the schema and\nthe base URI. By default, the base URI must be the URI that was used to\nload the document. It may be overridden by an explicit context.\n\n### Explicit context\n\nIf a document consists of a root object, this object may contain the\nfields `$base`, `$namespaces`, `$schemas`, and `$graph`:\n\n * `$base`: Must be a string. Set the base URI for the document used to\n resolve relative references.\n\n * `$namespaces`: Must be an object with strings as values. The keys of\n the object are namespace prefixes used in the document; the values of\n the object are the prefix expansions.\n\n * `$schemas`: Must be an array of strings. This field may list URI\n references to documents in RDF-XML format which will be queried for RDF\n schema data. The subjects and predicates described by the RDF schema\n may provide additional semantic context for the document, and may be\n used for validation of prefixed extension fields found in the document.\n\nOther directives beginning with `$` must be ignored.\n\n## Document graph\n\nIf a document consists of a single root object, this object may contain the\nfield `$graph`. This field must be an array of objects. If present, this\nfield holds the primary content of the document. A document that consists\nof array of objects at the root is an implicit graph.\n\n## Document metadata\n\nIf a document consists of a single root object, metadata about the\ndocument, such as authorship, may be declared in the root object.\n\n## Document schema\n\nDocument preprocessing, link validation and schema validation require a\ndocument schema. A schema may consist of:\n\n * At least one record definition object which defines valid fields that\n make up a record type. Record field definitions include the valid types\n that may be assigned to each field and annotations to indicate fields\n that represent identifiers and links, described below in \"Semantic\n Annotations\".\n\n * Any number of enumerated type objects which define a set of finite set of symbols that are\n valid value of the type.\n\n * Any number of documentation objects which allow in-line documentation of the schema.\n\nThe schema for defining a salad schema (the metaschema) is described in\ndetail in \"Schema validation\".\n\n### Record field annotations\n\nIn a document schema, record field definitions may include the field\n`jsonldPredicate`, which may be either a string or object. Implementations\nmust use the following document preprocessing of fields by the following\nrules:\n\n * If the value of `jsonldPredicate` is `@id`, the field is an identifier\n field.\n\n * If the value of `jsonldPredicate` is an object, and contains that\n object contains the field `_type` with the value `@id`, the field is a\n link field.\n\n * If the value of `jsonldPredicate` is an object, and contains that\n object contains the field `_type` with the value `@vocab`, the field is a\n vocabulary field, which is a subtype of link field.\n\n## Document traversal\n\nTo perform document document preprocessing, link validation and schema\nvalidation, the document must be traversed starting from the fields or\narray items of the root object or array and recursively visiting each child\nitem which contains an object or arrays.\n\n# Document preprocessing\n\nAfter processing the explicit context (if any), document preprocessing\nbegins. Starting from the document root, object fields values or array\nitems which contain objects or arrays are recursively traversed\ndepth-first. For each visited object, field names, identifier fields, link\nfields, vocabulary fields, and `$import` and `$include` directives must be\nprocessed as described in this section. The order of traversal of child\nnodes within a parent node is undefined.\n", "## Field name resolution\n\nThe document schema declares the vocabulary of known field names. During\npreprocessing traversal, field name in the document which are not part of\nthe schema vocabulary must be resolved to absolute URIs. Under \"strict\"\nvalidation, it is an error for a document to include fields which are not\npart of the vocabulary and not resolvable to absolute URIs. Fields names\nwhich are not part of the vocabulary are resolved using the following\nrules:\n\n* If an field name URI begins with a namespace prefix declared in the\ndocument context (`@context`) followed by a colon `:`, the prefix and\ncolon must be replaced by the namespace declared in `@context`.\n\n* If there is a vocabulary term which maps to the URI of a resolved\nfield, the field name must be replace with the vocabulary term.\n\n* If a field name URI is an absolute URI consisting of a scheme and path\nand is not part of the vocabulary, no processing occurs.\n\nField name resolution is not relative. It must not be affected by the\nbase URI.\n\n### Field name resolution example\n\nGiven the following schema:\n\n```\n", "{\n \"$namespaces\": {\n \"acid\": \"http://example.com/acid#\"\n },\n \"$graph\": [{\n \"name\": \"ExampleType\",\n \"type\": \"record\",\n \"fields\": [{\n \"name\": \"base\",\n \"type\": \"string\",\n \"jsonldPredicate\": \"http://example.com/base\"\n }]\n }]\n}\n", "```\n\nProcess the following example:\n\n```\n", " {\n \"base\": \"one\",\n \"form\": {\n \"http://example.com/base\": \"two\",\n \"http://example.com/three\": \"three\",\n },\n \"acid:four\": \"four\"\n }\n", "```\n\nThis becomes:\n\n```\n", " {\n \"base\": \"one\",\n \"form\": {\n \"base\": \"two\",\n \"http://example.com/three\": \"three\",\n },\n \"http://example.com/acid#four\": \"four\"\n }\n", "```\n", "## Identifier resolution\n\nThe schema may designate one or more fields as identifier fields to identify\nspecific objects. Processing must resolve relative identifiers to absolute\nidentifiers using the following rules:\n\n * If an identifier URI is prefixed with `#` it is a URI relative\n fragment identifier. It is resolved relative to the base URI by setting\n or replacing the fragment portion of the base URI.\n\n * If an identifier URI does not contain a scheme and is not prefixed `#` it\n is a parent relative fragment identifier. It is resolved relative to the\n base URI by the following rule: if the base URI does not contain a\n document fragment, set the fragment portion of the base URI. If the base\n URI does contain a document fragment, append a slash `/` followed by the\n identifier field to the fragment portion of the base URI.\n\n * If an identifier URI begins with a namespace prefix declared in\n `$namespaces` followed by a colon `:`, the prefix and colon must be\n replaced by the namespace declared in `$namespaces`.\n\n * If an identifier URI is an absolute URI consisting of a scheme and path,\n no processing occurs.\n\nWhen preprocessing visits a node containing an identifier, that identifier\nmust be used as the base URI to process child nodes.\n\nIt is an error for more than one object in a document to have the same\nabsolute URI.\n\n### Identifier resolution example\n\nGiven the following schema:\n\n```\n", "{\n \"$namespaces\": {\n \"acid\": \"http://example.com/acid#\"\n },\n \"$graph\": [{\n \"name\": \"ExampleType\",\n \"type\": \"record\",\n \"fields\": [{\n \"name\": \"id\",\n \"type\": \"string\",\n \"jsonldPredicate\": \"@id\"\n }]\n }]\n}\n", "```\n\nProcess the following example:\n\n```\n", " {\n \"id\": \"http://example.com/base\",\n \"form\": {\n \"id\": \"one\",\n \"things\": [\n {\n \"id\": \"two\"\n },\n {\n \"id\": \"#three\",\n },\n {\n \"id\": \"four#five\",\n },\n {\n \"id\": \"acid:six\",\n }\n ]\n }\n }\n", "```\n\nThis becomes:\n\n```\n", "{\n \"id\": \"http://example.com/base\",\n \"form\": {\n \"id\": \"http://example.com/base#one\",\n \"things\": [\n {\n \"id\": \"http://example.com/base#one/two\"\n },\n {\n \"id\": \"http://example.com/base#three\"\n },\n {\n \"id\": \"http://example.com/four#five\",\n },\n {\n \"id\": \"http://example.com/acid#six\",\n }\n ]\n }\n}\n", "```\n", "## Link resolution\n\nThe schema may designate one or more fields as link fields reference other\nobjects. Processing must resolve links to either absolute URIs using the\nfollowing rules:\n\n* If a reference URI is prefixed with `#` it is a relative\nfragment identifier. It is resolved relative to the base URI by setting\nor replacing the fragment portion of the base URI.\n\n* If a reference URI does not contain a scheme and is not prefixed with `#`\nit is a path relative reference. If the reference URI contains `#` in any\nposition other than the first character, the reference URI must be divided\ninto a path portion and a fragment portion split on the first instance of\n`#`. The path portion is resolved relative to the base URI by the following\nrule: if the path portion of the base URI ends in a slash `/`, append the\npath portion of the reference URI to the path portion of the base URI. If\nthe path portion of the base URI does not end in a slash, replace the final\npath segment with the path portion of the reference URI. Replace the\nfragment portion of the base URI with the fragment portion of the reference\nURI.\n\n* If a reference URI begins with a namespace prefix declared in `$namespaces`\nfollowed by a colon `:`, the prefix and colon must be replaced by the\nnamespace declared in `$namespaces`.\n\n* If a reference URI is an absolute URI consisting of a scheme and path,\nno processing occurs.\n\nLink resolution must not affect the base URI used to resolve identifiers\nand other links.\n\n### Link resolution example\n\nGiven the following schema:\n\n```\n", "{\n \"$namespaces\": {\n \"acid\": \"http://example.com/acid#\"\n },\n \"$graph\": [{\n \"name\": \"ExampleType\",\n \"type\": \"record\",\n \"fields\": [{\n \"name\": \"link\",\n \"type\": \"string\",\n \"jsonldPredicate\": {\n \"_type\": \"@id\"\n }\n }]\n }]\n}\n", "```\n\nProcess the following example:\n\n```\n", "{\n \"$base\": \"http://example.com/base\",\n \"link\": \"http://example.com/base/zero\",\n \"form\": {\n \"link\": \"one\",\n \"things\": [\n {\n \"link\": \"two\"\n },\n {\n \"link\": \"#three\",\n },\n {\n \"link\": \"four#five\",\n },\n {\n \"link\": \"acid:six\",\n }\n ]\n }\n}\n", "```\n\nThis becomes:\n\n```\n", "{\n \"$base\": \"http://example.com/base\",\n \"link\": \"http://example.com/base/zero\",\n \"form\": {\n \"link\": \"http://example.com/one\",\n \"things\": [\n {\n \"link\": \"http://example.com/two\"\n },\n {\n \"link\": \"http://example.com/base#three\"\n },\n {\n \"link\": \"http://example.com/four#five\",\n },\n {\n \"link\": \"http://example.com/acid#six\",\n }\n ]\n }\n}\n", "```\n", "## Vocabulary resolution\n\n The schema may designate one or more vocabulary fields which use terms\n defined in the vocabulary. Processing must resolve vocabulary fields to\n either vocabulary terms or absolute URIs by first applying the link\n resolution rules defined above, then applying the following additional\n rule:\n\n * If a reference URI is a vocabulary field, and there is a vocabulary\n term which maps to the resolved URI, the reference must be replace with\n the vocabulary term.\n\n### Vocabulary resolution example\n\nGiven the following schema:\n\n```\n", "{\n \"$namespaces\": {\n \"acid\": \"http://example.com/acid#\"\n },\n \"$graph\": [{\n \"name\": \"Colors\",\n \"type\": \"enum\",\n \"symbols\": [\"acid:red\"]\n },\n {\n \"name\": \"ExampleType\",\n \"type\": \"record\",\n \"fields\": [{\n \"name\": \"voc\",\n \"type\": \"string\",\n \"jsonldPredicate\": {\n \"_type\": \"@vocab\"\n }\n }]\n }]\n}\n", "```\n\nProcess the following example:\n\n```\n", " {\n \"form\": {\n \"things\": [\n {\n \"voc\": \"red\",\n },\n {\n \"voc\": \"http://example.com/acid#red\",\n },\n {\n \"voc\": \"http://example.com/acid#blue\",\n }\n ]\n }\n }\n", "```\n\nThis becomes:\n\n```\n", " {\n \"form\": {\n \"things\": [\n {\n \"voc\": \"red\",\n },\n {\n \"voc\": \"red\",\n },\n {\n \"voc\": \"http://example.com/acid#blue\",\n }\n ]\n }\n }\n", "```\n", "## Import\n\nDuring preprocessing traversal, an implementation must resolve `$import`\ndirectives. An `$import` directive is an object consisting of exactly one\nfield `$import` specifying resource by URI string. It is an error if there\nare additional fields in the `$import` object, such additional fields must\nbe ignored.\n\nThe URI string must be resolved to an absolute URI using the link\nresolution rules described previously. Implementations must support\nloading from `file`, `http` and `https` resources. The URI referenced by\n`$import` must be loaded and recursively preprocessed as a Salad document.\nThe external imported document does not inherit the context of the\nimporting document, and the default base URI for processing the imported\ndocument must be the URI used to retrieve the imported document. If the\n`$import` URI includes a document fragment, the fragment must be excluded\nfrom the base URI used to preprocess the imported document.\n\nOnce loaded and processed, the `$import` node is replaced in the document\nstructure by the object or array yielded from the import operation.\n\nURIs may reference document fragments which refer to specific an object in\nthe target document. This indicates that the `$import` node must be\nreplaced by only the object with the appropriate fragment identifier.\n\nIt is a fatal error if an import directive refers to an external resource\nor resource fragment which does not exist or is not accessible.\n\n### Import example\n\nimport.yml:\n```\n{\n \"hello\": \"world\"\n}\n\n```\n\nparent.yml:\n```\n{\n \"form\": {\n \"bar\": {\n \"$import\": \"import.yml\"\n }\n }\n}\n\n```\n\nThis becomes:\n\n```\n{\n \"form\": {\n \"bar\": {\n \"hello\": \"world\"\n }\n }\n}\n```\n\n## Include\n\nDuring preprocessing traversal, an implementation must resolve `$include`\ndirectives. An `$include` directive is an object consisting of exactly one\nfield `$include` specifying a URI string. It is an error if there are\nadditional fields in the `$include` object, such additional fields must be\nignored.\n\nThe URI string must be resolved to an absolute URI using the link\nresolution rules described previously. The URI referenced by `$include` must\nbe loaded as a text data. Implementations must support loading from\n`file`, `http` and `https` resources. Implementations may transcode the\ncharacter encoding of the text data to match that of the parent document,\nbut must not interpret or parse the text document in any other way.\n\nOnce loaded, the `$include` node is replaced in the document structure by a\nstring containing the text data loaded from the resource.\n\nIt is a fatal error if an import directive refers to an external resource\nwhich does not exist or is not accessible.\n\n### Include example\n\nparent.yml:\n```\n{\n \"form\": {\n \"bar\": {\n \"$include\": \"include.txt\"\n }\n }\n}\n\n```\n\ninclude.txt:\n```\nhello world\n\n```\n\nThis becomes:\n\n```\n{\n \"form\": {\n \"bar\": \"hello world\"\n }\n}\n```\n\n\n## Mixin\n\nDuring preprocessing traversal, an implementation must resolve `$mixin`\ndirectives. An `$mixin` directive is an object consisting of the field\n`$mixin` specifying resource by URI string. If there are additional fields in\nthe `$mixin` object, these fields override fields in the object which is loaded\nfrom the `$mixin` URI.\n\nThe URI string must be resolved to an absolute URI using the link resolution\nrules described previously. Implementations must support loading from `file`,\n`http` and `https` resources. The URI referenced by `$mixin` must be loaded\nand recursively preprocessed as a Salad document. The external imported\ndocument must inherit the context of the importing document, however the file\nURI for processing the imported document must be the URI used to retrieve the\nimported document. The `$mixin` URI must not include a document fragment.\n\nOnce loaded and processed, the `$mixin` node is replaced in the document\nstructure by the object or array yielded from the import operation.\n\nURIs may reference document fragments which refer to specific an object in\nthe target document. This indicates that the `$mixin` node must be\nreplaced by only the object with the appropriate fragment identifier.\n\nIt is a fatal error if an import directive refers to an external resource\nor resource fragment which does not exist or is not accessible.\n\n### Mixin example\n\nmixin.yml:\n```\n{\n \"hello\": \"world\",\n \"carrot\": \"orange\"\n}\n\n```\n\nparent.yml:\n```\n{\n \"form\": {\n \"bar\": {\n \"$mixin\": \"mixin.yml\"\n \"carrot\": \"cake\"\n }\n }\n}\n\n```\n\nThis becomes:\n\n```\n{\n \"form\": {\n \"bar\": {\n \"hello\": \"world\",\n \"carrot\": \"cake\"\n }\n }\n}\n```\n", "## Identifier maps\n\nThe schema may designate certain fields as having a `mapSubject`. If the\nvalue of the field is a JSON object, it must be transformed into an array of\nJSON objects. Each key-value pair from the source JSON object is a list\nitem, each list item must be a JSON objects, and the value of the key is\nassigned to the field specified by `mapSubject`.\n\nFields which have `mapSubject` specified may also supply a `mapPredicate`.\nIf the value of a map item is not a JSON object, the item is transformed to a\nJSON object with the key assigned to the field specified by `mapSubject` and\nthe value assigned to the field specified by `mapPredicate`.\n\n### Identifier map example\n\nGiven the following schema:\n\n```\n", "{\n \"$graph\": [{\n \"name\": \"MappedType\",\n \"type\": \"record\",\n \"documentRoot\": true,\n \"fields\": [{\n \"name\": \"mapped\",\n \"type\": {\n \"type\": \"array\",\n \"items\": \"ExampleRecord\"\n },\n \"jsonldPredicate\": {\n \"mapSubject\": \"key\",\n \"mapPredicate\": \"value\"\n }\n }],\n },\n {\n \"name\": \"ExampleRecord\",\n \"type\": \"record\",\n \"fields\": [{\n \"name\": \"key\",\n \"type\": \"string\"\n }, {\n \"name\": \"value\",\n \"type\": \"string\"\n }\n ]\n }]\n}\n", "```\n\nProcess the following example:\n\n```\n", "{\n \"mapped\": {\n \"shaggy\": {\n \"value\": \"scooby\"\n },\n \"fred\": \"daphne\"\n }\n}", "```\n\nThis becomes:\n\n```\n", "{\n \"mapped\": [\n {\n \"value\": \"daphne\",\n \"key\": \"fred\"\n },\n {\n \"value\": \"scooby\",\n \"key\": \"shaggy\"\n }\n ]\n}", "```\n", "## Domain Specific Language for types\n\nFields may be tagged `typeDSL: true`. If so, the field is expanded using the\nfollowing micro-DSL for schema salad types:\n\n* If the type ends with a question mark `?` it is expanded to a union with `null`\n* If the type ends with square brackets `[]` it is expanded to an array with items of the preceeding type symbol\n* The type may end with both `[]?` to indicate it is an optional array.\n* Identifier resolution is applied after type DSL expansion.\n\n### Type DSL example\n\nGiven the following schema:\n\n```\n", "{\n \"$graph\": [\n {\"$import\": \"metaschema_base.yml\"},\n {\n \"name\": \"TypeDSLExample\",\n \"type\": \"record\",\n \"documentRoot\": true,\n \"fields\": [{\n \"name\": \"extype\",\n \"type\": \"string\",\n \"jsonldPredicate\": {\n _type: \"@vocab\",\n \"typeDSL\": true\n }\n }]\n }]\n}\n", "```\n\nProcess the following example:\n\n```\n", "[{\n \"extype\": \"string\"\n}, {\n \"extype\": \"string?\"\n}, {\n \"extype\": \"string[]\"\n}, {\n \"extype\": \"string[]?\"\n}]\n", "```\n\nThis becomes:\n\n```\n", "[\n {\n \"extype\": \"string\"\n }, \n {\n \"extype\": [\n \"null\", \n \"string\"\n ]\n }, \n {\n \"extype\": {\n \"type\": \"array\", \n \"items\": \"string\"\n }\n }, \n {\n \"extype\": [\n \"null\", \n {\n \"type\": \"array\", \n \"items\": \"string\"\n }\n ]\n }\n]\n", "```\n" ] }, { "name": "https://w3id.org/cwl/salad#Link_Validation", "type": "documentation", "doc": "# Link validation\n\nOnce a document has been preprocessed, an implementation may validate\nlinks. The link validation traversal may visit fields which the schema\ndesignates as link fields and check that each URI references an existing\nobject in the current document, an imported document, file system, or\nnetwork resource. Failure to validate links may be a fatal error. Link\nvalidation behavior for individual fields may be modified by `identity` and\n`noLinkCheck` in the `jsonldPredicate` section of the field schema.\n" }, { "name": "https://w3id.org/cwl/salad#Schema_validation", "type": "documentation", "doc": "" }, { "name": "https://w3id.org/cwl/salad#Schema", "type": "documentation", "doc": "# Schema\n" }, { "name": "https://w3id.org/cwl/salad#PrimitiveType", "type": "enum", "symbols": [ "https://w3id.org/cwl/salad#null", "http://www.w3.org/2001/XMLSchema#boolean", "http://www.w3.org/2001/XMLSchema#int", "http://www.w3.org/2001/XMLSchema#long", "http://www.w3.org/2001/XMLSchema#float", "http://www.w3.org/2001/XMLSchema#double", "http://www.w3.org/2001/XMLSchema#string" ], "doc": [ "Salad data types are based on Avro schema declarations. Refer to the\n[Avro schema declaration documentation](https://avro.apache.org/docs/current/spec.html#schemas) for\ndetailed information.\n", "null: no value", "boolean: a binary value", "int: 32-bit signed integer", "long: 64-bit signed integer", "float: single precision (32-bit) IEEE 754 floating-point number", "double: double precision (64-bit) IEEE 754 floating-point number", "string: Unicode character sequence" ] }, { "name": "https://w3id.org/cwl/salad#Any", "type": "enum", "symbols": [ "https://w3id.org/cwl/salad#Any" ], "docAfter": "https://w3id.org/cwl/salad#PrimitiveType", "doc": "The **Any** type validates for any non-null value.\n" }, { "name": "https://w3id.org/cwl/salad#RecordField", "type": "record", "doc": "A field of a record.", "fields": [ { "name": "https://w3id.org/cwl/salad#RecordField/name", "type": "string", "jsonldPredicate": "@id", "doc": "The name of the field\n" }, { "name": "https://w3id.org/cwl/salad#RecordField/doc", "type": [ "null", "string" ], "doc": "A documentation string for this field\n", "jsonldPredicate": "rdfs:comment" }, { "name": "https://w3id.org/cwl/salad#RecordField/type", "type": [ "PrimitiveType", "RecordSchema", "EnumSchema", "ArraySchema", "string", { "type": "array", "items": [ "PrimitiveType", "RecordSchema", "EnumSchema", "ArraySchema", "string" ] } ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#type", "_type": "@vocab", "typeDSL": true, "refScope": 2 }, "doc": "The field type\n" } ] }, { "name": "https://w3id.org/cwl/salad#RecordSchema", "type": "record", "fields": [ { "type": [ "null", { "type": "array", "items": "RecordField" } ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#fields", "mapSubject": "name", "mapPredicate": "type" }, "doc": "Defines the fields of the record.", "name": "https://w3id.org/cwl/salad#RecordSchema/fields" }, { "doc": "Must be `record`", "type": { "type": "enum", "symbols": [ "https://w3id.org/cwl/salad#record" ] }, "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#type", "_type": "@vocab", "typeDSL": true, "refScope": 2 }, "name": "https://w3id.org/cwl/salad#RecordSchema/type" } ] }, { "name": "https://w3id.org/cwl/salad#EnumSchema", "type": "record", "doc": "Define an enumerated type.\n", "fields": [ { "type": { "type": "array", "items": "string" }, "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#symbols", "_type": "@id", "identity": true }, "doc": "Defines the set of valid symbols.", "name": "https://w3id.org/cwl/salad#EnumSchema/symbols" }, { "doc": "Must be `enum`", "type": { "type": "enum", "symbols": [ "https://w3id.org/cwl/salad#enum" ] }, "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#type", "_type": "@vocab", "typeDSL": true, "refScope": 2 }, "name": "https://w3id.org/cwl/salad#EnumSchema/type" } ] }, { "name": "https://w3id.org/cwl/salad#ArraySchema", "type": "record", "fields": [ { "type": [ "PrimitiveType", "RecordSchema", "EnumSchema", "ArraySchema", "string", { "type": "array", "items": [ "PrimitiveType", "RecordSchema", "EnumSchema", "ArraySchema", "string" ] } ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#items", "_type": "@vocab", "refScope": 2 }, "doc": "Defines the type of the array elements.", "name": "https://w3id.org/cwl/salad#ArraySchema/items" }, { "doc": "Must be `array`", "type": { "type": "enum", "symbols": [ "https://w3id.org/cwl/salad#array" ] }, "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#type", "_type": "@vocab", "typeDSL": true, "refScope": 2 }, "name": "https://w3id.org/cwl/salad#ArraySchema/type" } ] }, { "name": "https://w3id.org/cwl/salad#JsonldPredicate", "type": "record", "doc": "Attached to a record field to define how the parent record field is handled for\nURI resolution and JSON-LD context generation.\n", "fields": [ { "name": "https://w3id.org/cwl/salad#JsonldPredicate/_id", "type": [ "null", "string" ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#_id", "_type": "@id", "identity": true }, "doc": "The predicate URI that this field corresponds to.\nCorresponds to JSON-LD `@id` directive.\n" }, { "name": "https://w3id.org/cwl/salad#JsonldPredicate/_type", "type": [ "null", "string" ], "doc": "The context type hint, corresponds to JSON-LD `@type` directive.\n\n* If the value of this field is `@id` and `identity` is false or\nunspecified, the parent field must be resolved using the link\nresolution rules. If `identity` is true, the parent field must be\nresolved using the identifier expansion rules.\n\n* If the value of this field is `@vocab`, the parent field must be\n resolved using the vocabulary resolution rules.\n" }, { "name": "https://w3id.org/cwl/salad#JsonldPredicate/_container", "type": [ "null", "string" ], "doc": "Structure hint, corresponds to JSON-LD `@container` directive.\n" }, { "name": "https://w3id.org/cwl/salad#JsonldPredicate/identity", "type": [ "null", "boolean" ], "doc": "If true and `_type` is `@id` this indicates that the parent field must\nbe resolved according to identity resolution rules instead of link\nresolution rules. In addition, the field value is considered an\nassertion that the linked value exists; absence of an object in the loaded document\nwith the URI is not an error.\n" }, { "name": "https://w3id.org/cwl/salad#JsonldPredicate/noLinkCheck", "type": [ "null", "boolean" ], "doc": "If true, this indicates that link validation traversal must stop at\nthis field. This field (it is is a URI) or any fields under it (if it\nis an object or array) are not subject to link checking.\n" }, { "name": "https://w3id.org/cwl/salad#JsonldPredicate/mapSubject", "type": [ "null", "string" ], "doc": "If the value of the field is a JSON object, it must be transformed\ninto an array of JSON objects, where each key-value pair from the\nsource JSON object is a list item, the list items must be JSON objects,\nand the key is assigned to the field specified by `mapSubject`.\n" }, { "name": "https://w3id.org/cwl/salad#JsonldPredicate/mapPredicate", "type": [ "null", "string" ], "doc": "Only applies if `mapSubject` is also provided. If the value of the\nfield is a JSON object, it is transformed as described in `mapSubject`,\nwith the addition that when the value of a map item is not an object,\nthe item is transformed to a JSON object with the key assigned to the\nfield specified by `mapSubject` and the value assigned to the field\nspecified by `mapPredicate`.\n" }, { "name": "https://w3id.org/cwl/salad#JsonldPredicate/refScope", "type": [ "null", "int" ], "doc": "If the field contains a relative reference, it must be resolved by\nsearching for valid document references in each successive parent scope\nin the document fragment. For example, a reference of `foo` in the\ncontext `#foo/bar/baz` will first check for the existence of\n`#foo/bar/baz/foo`, followed by `#foo/bar/foo`, then `#foo/foo` and\nthen finally `#foo`. The first valid URI in the search order shall be\nused as the fully resolved value of the identifier. The value of the\nrefScope field is the specified number of levels from the containing\nidentifer scope before starting the search, so if `refScope: 2` then\n\"baz\" and \"bar\" must be stripped to get the base `#foo` and search\n`#foo/foo` and the `#foo`. The last scope searched must be the top\nlevel scope before determining if the identifier cannot be resolved.\n" }, { "name": "https://w3id.org/cwl/salad#JsonldPredicate/typeDSL", "type": [ "null", "boolean" ], "doc": "Field must be expanded based on the the Schema Salad type DSL.\n" } ] }, { "name": "https://w3id.org/cwl/salad#SpecializeDef", "type": "record", "fields": [ { "name": "https://w3id.org/cwl/salad#SpecializeDef/specializeFrom", "type": "string", "doc": "The data type to be replaced", "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#specializeFrom", "_type": "@id", "refScope": 1 } }, { "name": "https://w3id.org/cwl/salad#SpecializeDef/specializeTo", "type": "string", "doc": "The new data type to replace with", "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#specializeTo", "_type": "@id", "refScope": 1 } } ] }, { "name": "https://w3id.org/cwl/salad#NamedType", "type": "record", "abstract": true, "docParent": "https://w3id.org/cwl/salad#Schema", "fields": [ { "name": "https://w3id.org/cwl/salad#NamedType/name", "type": "string", "jsonldPredicate": "@id", "doc": "The identifier for this type" }, { "name": "https://w3id.org/cwl/salad#NamedType/inVocab", "type": [ "null", "boolean" ], "doc": "By default or if \"true\", include the short name of this type in the\nvocabulary (the keys of the JSON-LD context). If false, do not include\nthe short name in the vocabulary.\n" } ] }, { "name": "https://w3id.org/cwl/salad#DocType", "type": "record", "abstract": true, "docParent": "https://w3id.org/cwl/salad#Schema", "fields": [ { "name": "https://w3id.org/cwl/salad#DocType/doc", "type": [ "null", "string", { "type": "array", "items": "string" } ], "doc": "A documentation string for this type, or an array of strings which should be concatenated.", "jsonldPredicate": "rdfs:comment" }, { "name": "https://w3id.org/cwl/salad#DocType/docParent", "type": [ "null", "string" ], "doc": "Hint to indicate that during documentation generation, documentation\nfor this type should appear in a subsection under `docParent`.\n", "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#docParent", "_type": "@id" } }, { "name": "https://w3id.org/cwl/salad#DocType/docChild", "type": [ "null", "string", { "type": "array", "items": "string" } ], "doc": "Hint to indicate that during documentation generation, documentation\nfor `docChild` should appear in a subsection under this type.\n", "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#docChild", "_type": "@id" } }, { "name": "https://w3id.org/cwl/salad#DocType/docAfter", "type": [ "null", "string" ], "doc": "Hint to indicate that during documentation generation, documentation\nfor this type should appear after the `docAfter` section at the same\nlevel.\n", "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#docAfter", "_type": "@id" } } ] }, { "name": "https://w3id.org/cwl/salad#SchemaDefinedType", "type": "record", "extends": "https://w3id.org/cwl/salad#DocType", "doc": "Abstract base for schema-defined types.\n", "abstract": true, "fields": [ { "name": "https://w3id.org/cwl/salad#SchemaDefinedType/jsonldPredicate", "type": [ "null", "string", "JsonldPredicate" ], "doc": "Annotate this type with linked data context.\n", "jsonldPredicate": "sld:jsonldPredicate" }, { "name": "https://w3id.org/cwl/salad#SchemaDefinedType/documentRoot", "type": [ "null", "boolean" ], "doc": "If true, indicates that the type is a valid at the document root. At\nleast one type in a schema must be tagged with `documentRoot: true`.\n" } ] }, { "name": "https://w3id.org/cwl/salad#SaladRecordField", "type": "record", "extends": "https://w3id.org/cwl/salad#RecordField", "doc": "A field of a record.", "fields": [ { "name": "https://w3id.org/cwl/salad#SaladRecordField/jsonldPredicate", "type": [ "null", "string", "JsonldPredicate" ], "doc": "Annotate this type with linked data context.\n", "jsonldPredicate": "sld:jsonldPredicate" } ] }, { "name": "https://w3id.org/cwl/salad#SaladRecordSchema", "docParent": "https://w3id.org/cwl/salad#Schema", "type": "record", "extends": [ "https://w3id.org/cwl/salad#NamedType", "https://w3id.org/cwl/salad#RecordSchema", "https://w3id.org/cwl/salad#SchemaDefinedType" ], "documentRoot": true, "specialize": [ { "specializeTo": "https://w3id.org/cwl/salad#SaladRecordField", "specializeFrom": "https://w3id.org/cwl/salad#RecordField" } ], "fields": [ { "name": "https://w3id.org/cwl/salad#SaladRecordSchema/abstract", "type": [ "null", "boolean" ], "doc": "If true, this record is abstract and may be used as a base for other\nrecords, but is not valid on its own.\n" }, { "name": "https://w3id.org/cwl/salad#SaladRecordSchema/extends", "type": [ "null", "string", { "type": "array", "items": "string" } ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#extends", "_type": "@id", "refScope": 1 }, "doc": "Indicates that this record inherits fields from one or more base records.\n" }, { "name": "https://w3id.org/cwl/salad#SaladRecordSchema/specialize", "type": [ "null", { "type": "array", "items": "SpecializeDef" } ], "doc": "Only applies if `extends` is declared. Apply type specialization using the\nbase record as a template. For each field inherited from the base\nrecord, replace any instance of the type `specializeFrom` with\n`specializeTo`.\n", "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#specialize", "mapSubject": "specializeFrom", "mapPredicate": "specializeTo" } } ] }, { "name": "https://w3id.org/cwl/salad#SaladEnumSchema", "docParent": "https://w3id.org/cwl/salad#Schema", "type": "record", "extends": [ "https://w3id.org/cwl/salad#NamedType", "https://w3id.org/cwl/salad#EnumSchema", "https://w3id.org/cwl/salad#SchemaDefinedType" ], "documentRoot": true, "doc": "Define an enumerated type.\n", "fields": [ { "name": "https://w3id.org/cwl/salad#SaladEnumSchema/extends", "type": [ "null", "string", { "type": "array", "items": "string" } ], "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#extends", "_type": "@id", "refScope": 1 }, "doc": "Indicates that this enum inherits symbols from a base enum.\n" } ] }, { "name": "https://w3id.org/cwl/salad#Documentation", "type": "record", "docParent": "https://w3id.org/cwl/salad#Schema", "extends": [ "https://w3id.org/cwl/salad#NamedType", "https://w3id.org/cwl/salad#DocType" ], "documentRoot": true, "doc": "A documentation section. This type exists to facilitate self-documenting\nschemas but has no role in formal validation.\n", "fields": [ { "name": "https://w3id.org/cwl/salad#Documentation/type", "doc": "Must be `documentation`", "type": { "type": "enum", "symbols": [ "https://w3id.org/cwl/salad#documentation" ] }, "jsonldPredicate": { "_id": "https://w3id.org/cwl/salad#type", "_type": "@vocab", "typeDSL": true, "refScope": 2 } } ] } ] schema-salad-2.6.20171201034858/schema_salad/tests/df20000644000175100017510000000001113165562750021723 0ustar peterpeter00000000000000say what schema-salad-2.6.20171201034858/schema_salad/tests/pt.yml0000644000175100017510000000160413203345013022464 0ustar peterpeter00000000000000$namespaces: sld: "https://w3id.org/cwl/salad#" dct: "http://purl.org/dc/terms/" rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdfs: "http://www.w3.org/2000/01/rdf-schema#" xsd: "http://www.w3.org/2001/XMLSchema#" name: PrimitiveType type: enum symbols: - "sld:null" - "xsd:boolean" - "xsd:int" - "xsd:long" - "xsd:float" - "xsd:double" - "xsd:string" doc: - | Salad data types are based on Avro schema declarations. Refer to the [Avro schema declaration documentation](https://avro.apache.org/docs/current/spec.html#schemas) for detailed information. - "null: no value" - "boolean: a binary value" - "int: 32-bit signed integer" - "long: 64-bit signed integer" - "float: single precision (32-bit) IEEE 754 floating-point number" - "double: double precision (64-bit) IEEE 754 floating-point number" - "string: Unicode character sequence" schema-salad-2.6.20171201034858/schema_salad/jsonld_context.py0000755000175100017510000001734513130233260023576 0ustar peterpeter00000000000000from __future__ import absolute_import import collections import shutil import json import six from six.moves import urllib import ruamel.yaml as yaml try: from ruamel.yaml import CSafeLoader as SafeLoader except ImportError: from ruamel.yaml import SafeLoader # type: ignore import os import subprocess import copy import pprint import re import sys import rdflib from rdflib import Graph, URIRef import rdflib.namespace from rdflib.namespace import RDF, RDFS import logging from schema_salad.utils import aslist from typing import (cast, Any, Dict, Iterable, List, Optional, Text, Tuple, Union) from .ref_resolver import Loader, ContextType _logger = logging.getLogger("salad") def pred(datatype, # type: Dict[str, Union[Dict, str]] field, # type: Optional[Dict] name, # type: str context, # type: ContextType defaultBase, # type: str namespaces # type: Dict[str, rdflib.namespace.Namespace] ): # type: (...) -> Union[Dict, Text] split = urllib.parse.urlsplit(name) vee = None # type: Optional[Text] if split.scheme != '': vee = name (ns, ln) = rdflib.namespace.split_uri(six.text_type(vee)) name = ln if ns[0:-1] in namespaces: vee = six.text_type(namespaces[ns[0:-1]][ln]) _logger.debug("name, v %s %s", name, vee) v = None # type: Optional[Dict] if field is not None and "jsonldPredicate" in field: if isinstance(field["jsonldPredicate"], dict): v = {} for k, val in field["jsonldPredicate"].items(): v[("@" + k[1:] if k.startswith("_") else k)] = val if "@id" not in v: v["@id"] = vee else: v = field["jsonldPredicate"] elif "jsonldPredicate" in datatype: if isinstance(datatype["jsonldPredicate"], collections.Iterable): for d in datatype["jsonldPredicate"]: if isinstance(d, dict): if d["symbol"] == name: v = d["predicate"] else: raise Exception( "entries in the jsonldPredicate List must be " "Dictionaries") else: raise Exception("jsonldPredicate must be a List of Dictionaries.") ret = v or vee if not ret: ret = defaultBase + name if name in context: if context[name] != ret: raise Exception("Predicate collision on %s, '%s' != '%s'" % (name, context[name], ret)) else: _logger.debug("Adding to context '%s' %s (%s)", name, ret, type(ret)) context[name] = ret return ret def process_type(t, # type: Dict[str, Any] g, # type: Graph context, # type: ContextType defaultBase, # type: str namespaces, # type: Dict[str, rdflib.namespace.Namespace] defaultPrefix # type: str ): # type: (...) -> None if t["type"] == "record": recordname = t["name"] _logger.debug("Processing record %s\n", t) classnode = URIRef(recordname) g.add((classnode, RDF.type, RDFS.Class)) split = urllib.parse.urlsplit(recordname) predicate = recordname if t.get("inVocab", True): if split.scheme: (ns, ln) = rdflib.namespace.split_uri(six.text_type(recordname)) predicate = recordname recordname = ln else: predicate = "%s:%s" % (defaultPrefix, recordname) if context.get(recordname, predicate) != predicate: raise Exception("Predicate collision on '%s', '%s' != '%s'" % ( recordname, context[recordname], predicate)) if not recordname: raise Exception() _logger.debug("Adding to context '%s' %s (%s)", recordname, predicate, type(predicate)) context[recordname] = predicate for i in t.get("fields", []): fieldname = i["name"] _logger.debug("Processing field %s", i) v = pred(t, i, fieldname, context, defaultPrefix, namespaces) # type: Union[Dict[Any, Any], Text, None] if isinstance(v, six.string_types): v = v if v[0] != "@" else None elif v is not None: v = v["_@id"] if v.get("_@id", "@")[0] != "@" else None if bool(v): (ns, ln) = rdflib.namespace.split_uri(six.text_type(v)) if ns[0:-1] in namespaces: propnode = namespaces[ns[0:-1]][ln] else: propnode = URIRef(v) g.add((propnode, RDF.type, RDF.Property)) g.add((propnode, RDFS.domain, classnode)) # TODO generate range from datatype. if isinstance(i["type"], dict) and "name" in i["type"]: process_type(i["type"], g, context, defaultBase, namespaces, defaultPrefix) if "extends" in t: for e in aslist(t["extends"]): g.add((classnode, RDFS.subClassOf, URIRef(e))) elif t["type"] == "enum": _logger.debug("Processing enum %s", t["name"]) for i in t["symbols"]: pred(t, None, i, context, defaultBase, namespaces) def salad_to_jsonld_context(j, schema_ctx): # type: (Iterable, Dict[str, Any]) -> Tuple[ContextType, Graph] context = {} # type: ContextType namespaces = {} g = Graph() defaultPrefix = "" for k, v in schema_ctx.items(): context[k] = v namespaces[k] = rdflib.namespace.Namespace(v) if "@base" in context: defaultBase = cast(str, context["@base"]) del context["@base"] else: defaultBase = "" for k, v in namespaces.items(): g.bind(k, v) for t in j: process_type(t, g, context, defaultBase, namespaces, defaultPrefix) return (context, g) def fix_jsonld_ids(obj, # type: Union[Dict[Text, Any], List[Dict[Text, Any]]] ids # type: List[Text] ): # type: (...) -> None if isinstance(obj, dict): for i in ids: if i in obj: obj["@id"] = obj[i] for v in obj.values(): fix_jsonld_ids(v, ids) if isinstance(obj, list): for entry in obj: fix_jsonld_ids(entry, ids) def makerdf(workflow, # type: Text wf, # type: Union[List[Dict[Text, Any]], Dict[Text, Any]] ctx, # type: ContextType graph=None # type: Graph ): # type: (...) -> Graph prefixes = {} idfields = [] for k, v in six.iteritems(ctx): if isinstance(v, dict): url = v["@id"] else: url = v if url == "@id": idfields.append(k) doc_url, frg = urllib.parse.urldefrag(url) if "/" in frg: p = frg.split("/")[0] prefixes[p] = u"%s#%s/" % (doc_url, p) fix_jsonld_ids(wf, idfields) if graph is None: g = Graph() else: g = graph if isinstance(wf, list): for w in wf: w["@context"] = ctx g.parse(data=json.dumps(w), format='json-ld', publicID=str(workflow)) else: wf["@context"] = ctx g.parse(data=json.dumps(wf), format='json-ld', publicID=str(workflow)) # Bug in json-ld loader causes @id fields to be added to the graph for sub, pred, obj in g.triples((None, URIRef("@id"), None)): g.remove((sub, pred, obj)) for k2, v2 in six.iteritems(prefixes): g.namespace_manager.bind(k2, v2) return g schema-salad-2.6.20171201034858/schema_salad/__main__.py0000644000175100017510000000015213130233260022242 0ustar peterpeter00000000000000from __future__ import absolute_import from . import main import sys import typing sys.exit(main.main()) schema-salad-2.6.20171201034858/Makefile0000644000175100017510000001416213165713053017225 0ustar peterpeter00000000000000# This file is part of schema-salad, # https://github.com/common-workflow-language/schema-salad/, and is # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # Contact: common-workflow-language@googlegroups.com # make pep8 to check for basic Python code compliance # make autopep8 to fix most pep8 errors # make pylint to check Python code for enhanced compliance including naming # and documentation # make coverage-report to check coverage of the python scripts by the tests MODULE=schema_salad # `SHELL=bash` Will break Titus's laptop, so don't use BASH-isms like # `[[` conditional expressions. PYSOURCES=$(wildcard ${MODULE}/**.py tests/*.py) setup.py DEVPKGS=pep8 diff_cover autopep8 pylint coverage pep257 pytest flake8 COVBASE=coverage run --branch --append --source=${MODULE} \ --omit=schema_salad/tests/* VERSION=$(shell git describe --tags --dirty | sed s/v//) ## all : default task all: ./setup.py develop ## help : print this help message and exit help: Makefile @sed -n 's/^##//p' $< ## install-dep : install most of the development dependencies via pip install-dep: install-dependencies install-dependencies: pip install --upgrade $(DEVPKGS) pip install -r requirements.txt ## install : install the ${MODULE} module and schema-salad-tool install: FORCE pip install . ## dist : create a module package for distribution dist: dist/${MODULE}-$(VERSION).tar.gz dist/${MODULE}-$(VERSION).tar.gz: $(SOURCES) ./setup.py sdist ## clean : clean up all temporary / machine-generated files clean: FORCE rm -f ${MODILE}/*.pyc tests/*.pyc ./setup.py clean --all || true rm -Rf .coverage rm -f diff-cover.html ## pep8 : check Python code style pep8: $(PYSOURCES) pep8 --exclude=_version.py --show-source --show-pep8 $^ || true pep8_report.txt: $(PYSOURCES) pep8 --exclude=_version.py $^ > $@ || true diff_pep8_report: pep8_report.txt diff-quality --violations=pep8 pep8_report.txt ## pep257 : check Python code style pep257: $(PYSOURCES) pep257 --ignore=D100,D101,D102,D103 $^ || true pep257_report.txt: $(PYSOURCES) pep257 setup.py $^ > $@ 2>&1 || true diff_pep257_report: pep257_report.txt diff-quality --violations=pep8 pep257_report.txt ## autopep8 : fix most Python code indentation and formatting autopep8: $(PYSOURCES) autopep8 --recursive --in-place --ignore E309 $^ # A command to automatically run astyle and autopep8 on appropriate files ## format : check/fix all code indentation and formatting (runs autopep8) format: autopep8 # Do nothing ## pylint : run static code analysis on Python code pylint: $(PYSOURCES) pylint --msg-template="{path}:{line}: [{msg_id}({symbol}), {obj}] {msg}" \ $^ || true pylint_report.txt: ${PYSOURCES} pylint --msg-template="{path}:{line}: [{msg_id}({symbol}), {obj}] {msg}" \ $^ > $@ || true diff_pylint_report: pylint_report.txt diff-quality --violations=pylint pylint_report.txt .coverage: $(PYSOURCES) rm -f .coverage $(COVBASE) setup.py test $(COVBASE) -m schema_salad.main \ --print-jsonld-context schema_salad/metaschema/metaschema.yml \ > /dev/null $(COVBASE) -m schema_salad.main \ --print-rdfs schema_salad/metaschema/metaschema.yml \ > /dev/null $(COVBASE) -m schema_salad.main \ --print-avro schema_salad/metaschema/metaschema.yml \ > /dev/null $(COVBASE) -m schema_salad.main \ --print-rdf schema_salad/metaschema/metaschema.yml \ > /dev/null $(COVBASE) -m schema_salad.main \ --print-pre schema_salad/metaschema/metaschema.yml \ > /dev/null $(COVBASE) -m schema_salad.main \ --print-index schema_salad/metaschema/metaschema.yml \ > /dev/null $(COVBASE) -m schema_salad.main \ --print-metadata schema_salad/metaschema/metaschema.yml \ > /dev/null $(COVBASE) -m schema_salad.makedoc \ schema_salad/metaschema/metaschema.yml \ > /dev/null coverage.xml: .coverage coverage xml coverage.html: htmlcov/index.html htmlcov/index.html: .coverage coverage html @echo Test coverage of the Python code is now in htmlcov/index.html coverage-report: .coverage coverage report diff-cover: coverage.xml diff-cover $^ diff-cover.html: coverage.xml diff-cover $^ --html-report $@ ## test : run the ${MODULE} test suite test: FORCE python setup.py test sloccount.sc: ${PYSOURCES} Makefile sloccount --duplicates --wide --details $^ > $@ ## sloccount : count lines of code sloccount: ${PYSOURCES} Makefile sloccount $^ list-author-emails: @echo 'name, E-Mail Address' @git log --format='%aN,%aE' | sort -u | grep -v 'root' mypy2: ${PYSOURCES} rm -Rf typeshed/2and3/ruamel/yaml ln -s $(shell python -c 'from __future__ import print_function; import ruamel.yaml; import os.path; print(os.path.dirname(ruamel.yaml.__file__))') \ typeshed/2and3/ruamel/ MYPYPATH=$MYPYPATH:typeshed/2.7:typeshed/2and3 mypy --py2 --disallow-untyped-calls \ --warn-redundant-casts \ schema_salad mypy3: ${PYSOURCES} rm -Rf typeshed/2and3/ruamel/yaml ln -s $(shell python -c 'from __future__ import print_function; import ruamel.yaml; import os.path; print(os.path.dirname(ruamel.yaml.__file__))') \ typeshed/2and3/ruamel/ MYPYPATH=$MYPYPATH:typeshed/3:typeshed/2and3 mypy --disallow-untyped-calls \ --warn-redundant-casts \ schema_salad jenkins: rm -Rf env && virtualenv env . env/bin/activate ; \ pip install -U setuptools pip wheel ; \ ${MAKE} install-dep coverage.html coverage.xml pep257_report.txt \ sloccount.sc pep8_report.txt pylint_report.txt if ! test -d env3 ; then virtualenv -p python3 env3 ; fi . env3/bin/activate ; \ pip install -U setuptools pip wheel ; \ ${MAKE} install-dep ; \ pip install -U -r mypy_requirements.txt ; ${MAKE} mypy2 # pip install -U -r mypy_requirements.txt ; ${MAKE} mypy3 FORCE: schema-salad-2.6.20171201034858/PKG-INFO0000644000175100017510000001334513211573301016654 0ustar peterpeter00000000000000Metadata-Version: 1.1 Name: schema-salad Version: 2.6.20171201034858 Summary: Schema Annotations for Linked Avro Data (SALAD) Home-page: https://github.com/common-workflow-language/common-workflow-language Author: Common workflow language working group Author-email: common-workflow-language@googlegroups.com License: Apache 2.0 Download-URL: https://github.com/common-workflow-language/common-workflow-language Description-Content-Type: UNKNOWN Description: |Build Status| |Build status| .. |Build Status| image:: https://img.shields.io/travis/common-workflow-language/schema_salad/master.svg?label=unix%20build :target: https://travis-ci.org/common-workflow-language/schema_salad .. |Build status| image:: https://img.shields.io/appveyor/ci/mr-c/schema-salad/master.svg?label=windows%20build :target: https://ci.appveyor.com/project/mr-c/schema-salad/branch/master Schema Salad ------------ Salad is a schema language for describing JSON or YAML structured linked data documents. Salad is based originally on JSON-LD_ and the Apache Avro_ data serialization system. Salad schema describes rules for preprocessing, structural validation, and link checking for documents described by a Salad schema. Salad features for rich data modeling such as inheritance, template specialization, object identifiers, object references, documentation generation, and transformation to RDF_. Salad provides a bridge between document and record oriented data modeling and the Semantic Web. Usage ----- :: $ pip install schema_salad $ schema-salad-tool usage: schema-salad-tool [-h] [--rdf-serializer RDF_SERIALIZER] [--print-jsonld-context | --print-doc | --print-rdfs | --print-avro | --print-rdf | --print-pre | --print-index | --print-metadata | --version] [--strict | --non-strict] [--verbose | --quiet | --debug] schema [document] $ python >>> import schema_salad To install from source:: git clone https://github.com/common-workflow-language/schema_salad cd schema_salad python setup.py install Documentation ------------- See the specification_ and the metaschema_ (salad schema for itself). For an example application of Schema Salad see the Common Workflow Language_. Rationale --------- The JSON data model is an popular way to represent structured data. It is attractive because of it's relative simplicity and is a natural fit with the standard types of many programming languages. However, this simplicity comes at the cost that basic JSON lacks expressive features useful for working with complex data structures and document formats, such as schemas, object references, and namespaces. JSON-LD is a W3C standard providing a way to describe how to interpret a JSON document as Linked Data by means of a "context". JSON-LD provides a powerful solution for representing object references and namespaces in JSON based on standard web URIs, but is not itself a schema language. Without a schema providing a well defined structure, it is difficult to process an arbitrary JSON-LD document as idiomatic JSON because there are many ways to express the same data that are logically equivalent but structurally distinct. Several schema languages exist for describing and validating JSON data, such as JSON Schema and Apache Avro data serialization system, however none understand linked data. As a result, to fully take advantage of JSON-LD to build the next generation of linked data applications, one must maintain separate JSON schema, JSON-LD context, RDF schema, and human documentation, despite significant overlap of content and obvious need for these documents to stay synchronized. Schema Salad is designed to address this gap. It provides a schema language and processing rules for describing structured JSON content permitting URI resolution and strict document validation. The schema language supports linked data through annotations that describe the linked data interpretation of the content, enables generation of JSON-LD context and RDF schema, and production of RDF triples by applying the JSON-LD context. The schema language also provides for robust support of inline documentation. .. _JSON-LD: http://json-ld.org .. _Avro: http://avro.apache.org .. _metaschema: https://github.com/common-workflow-language/schema_salad/blob/master/schema_salad/metaschema/metaschema.yml .. _specification: http://www.commonwl.org/v1.0/SchemaSalad.html .. _Language: https://github.com/common-workflow-language/common-workflow-language/blob/master/v1.0/CommandLineTool.yml .. _RDF: https://www.w3.org/RDF/ Platform: UNKNOWN Classifier: Environment :: Console Classifier: Intended Audience :: Science/Research Classifier: Operating System :: POSIX :: Linux Classifier: Operating System :: MacOS :: MacOS X Classifier: Development Status :: 4 - Beta Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3.3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 schema-salad-2.6.20171201034858/setup.cfg0000644000175100017510000000042013211573301017366 0ustar peterpeter00000000000000[flake8] ignore = E124,E128,E129,E201,E202,E225,E226,E231,E265,E271,E302,E303,F401,E402,E501,W503,E731,F811,F821,F841 [bdist_wheel] universal = 1 [aliases] test = pytest [tool:pytest] addopts = --pyarg schema_salad [egg_info] tag_build = .20171201034858 tag_date = 0 schema-salad-2.6.20171201034858/schema_salad.egg-info/0000755000175100017510000000000013211573301021647 5ustar peterpeter00000000000000schema-salad-2.6.20171201034858/schema_salad.egg-info/requires.txt0000644000175100017510000000041413211573301024246 0ustar peterpeter00000000000000setuptools requests>=1.0 ruamel.yaml<0.15,>=0.12.4 rdflib<4.3.0,>=4.2.2 rdflib-jsonld<0.5.0,>=0.3.0 mistune<0.8,>=0.7.3 typing>=3.5.3 CacheControl<0.12,>=0.11.7 lockfile>=0.9 six>=1.8.0 [:python_version<"3"] avro==1.8.1 [:python_version>="3"] future avro-cwl==1.8.4 schema-salad-2.6.20171201034858/schema_salad.egg-info/pbr.json0000644000175100017510000000005713162735300023333 0ustar peterpeter00000000000000{"is_release": false, "git_version": "c7f3140"}schema-salad-2.6.20171201034858/schema_salad.egg-info/top_level.txt0000644000175100017510000000001513211573301024375 0ustar peterpeter00000000000000schema_salad schema-salad-2.6.20171201034858/schema_salad.egg-info/dependency_links.txt0000644000175100017510000000000113211573301025715 0ustar peterpeter00000000000000 schema-salad-2.6.20171201034858/schema_salad.egg-info/PKG-INFO0000644000175100017510000001334513211573301022752 0ustar peterpeter00000000000000Metadata-Version: 1.1 Name: schema-salad Version: 2.6.20171201034858 Summary: Schema Annotations for Linked Avro Data (SALAD) Home-page: https://github.com/common-workflow-language/common-workflow-language Author: Common workflow language working group Author-email: common-workflow-language@googlegroups.com License: Apache 2.0 Download-URL: https://github.com/common-workflow-language/common-workflow-language Description-Content-Type: UNKNOWN Description: |Build Status| |Build status| .. |Build Status| image:: https://img.shields.io/travis/common-workflow-language/schema_salad/master.svg?label=unix%20build :target: https://travis-ci.org/common-workflow-language/schema_salad .. |Build status| image:: https://img.shields.io/appveyor/ci/mr-c/schema-salad/master.svg?label=windows%20build :target: https://ci.appveyor.com/project/mr-c/schema-salad/branch/master Schema Salad ------------ Salad is a schema language for describing JSON or YAML structured linked data documents. Salad is based originally on JSON-LD_ and the Apache Avro_ data serialization system. Salad schema describes rules for preprocessing, structural validation, and link checking for documents described by a Salad schema. Salad features for rich data modeling such as inheritance, template specialization, object identifiers, object references, documentation generation, and transformation to RDF_. Salad provides a bridge between document and record oriented data modeling and the Semantic Web. Usage ----- :: $ pip install schema_salad $ schema-salad-tool usage: schema-salad-tool [-h] [--rdf-serializer RDF_SERIALIZER] [--print-jsonld-context | --print-doc | --print-rdfs | --print-avro | --print-rdf | --print-pre | --print-index | --print-metadata | --version] [--strict | --non-strict] [--verbose | --quiet | --debug] schema [document] $ python >>> import schema_salad To install from source:: git clone https://github.com/common-workflow-language/schema_salad cd schema_salad python setup.py install Documentation ------------- See the specification_ and the metaschema_ (salad schema for itself). For an example application of Schema Salad see the Common Workflow Language_. Rationale --------- The JSON data model is an popular way to represent structured data. It is attractive because of it's relative simplicity and is a natural fit with the standard types of many programming languages. However, this simplicity comes at the cost that basic JSON lacks expressive features useful for working with complex data structures and document formats, such as schemas, object references, and namespaces. JSON-LD is a W3C standard providing a way to describe how to interpret a JSON document as Linked Data by means of a "context". JSON-LD provides a powerful solution for representing object references and namespaces in JSON based on standard web URIs, but is not itself a schema language. Without a schema providing a well defined structure, it is difficult to process an arbitrary JSON-LD document as idiomatic JSON because there are many ways to express the same data that are logically equivalent but structurally distinct. Several schema languages exist for describing and validating JSON data, such as JSON Schema and Apache Avro data serialization system, however none understand linked data. As a result, to fully take advantage of JSON-LD to build the next generation of linked data applications, one must maintain separate JSON schema, JSON-LD context, RDF schema, and human documentation, despite significant overlap of content and obvious need for these documents to stay synchronized. Schema Salad is designed to address this gap. It provides a schema language and processing rules for describing structured JSON content permitting URI resolution and strict document validation. The schema language supports linked data through annotations that describe the linked data interpretation of the content, enables generation of JSON-LD context and RDF schema, and production of RDF triples by applying the JSON-LD context. The schema language also provides for robust support of inline documentation. .. _JSON-LD: http://json-ld.org .. _Avro: http://avro.apache.org .. _metaschema: https://github.com/common-workflow-language/schema_salad/blob/master/schema_salad/metaschema/metaschema.yml .. _specification: http://www.commonwl.org/v1.0/SchemaSalad.html .. _Language: https://github.com/common-workflow-language/common-workflow-language/blob/master/v1.0/CommandLineTool.yml .. _RDF: https://www.w3.org/RDF/ Platform: UNKNOWN Classifier: Environment :: Console Classifier: Intended Audience :: Science/Research Classifier: Operating System :: POSIX :: Linux Classifier: Operating System :: MacOS :: MacOS X Classifier: Development Status :: 4 - Beta Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3.3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 schema-salad-2.6.20171201034858/schema_salad.egg-info/entry_points.txt0000644000175100017510000000015313211573301025144 0ustar peterpeter00000000000000[console_scripts] schema-salad-doc = schema_salad.makedoc:main schema-salad-tool = schema_salad.main:main schema-salad-2.6.20171201034858/schema_salad.egg-info/zip-safe0000644000175100017510000000000112555523506023313 0ustar peterpeter00000000000000 schema-salad-2.6.20171201034858/schema_salad.egg-info/SOURCES.txt0000644000175100017510000001013313211573301023531 0ustar peterpeter00000000000000MANIFEST.in Makefile README.rst gittaggers.py setup.cfg setup.py schema_salad/__init__.py schema_salad/__main__.py schema_salad/codegen.py schema_salad/codegen_base.py schema_salad/java_codegen.py schema_salad/jsonld_context.py schema_salad/main.py schema_salad/makedoc.py schema_salad/metaschema.py schema_salad/python_codegen.py schema_salad/python_codegen_support.py schema_salad/ref_resolver.py schema_salad/schema.py schema_salad/sourceline.py schema_salad/utils.py schema_salad/validate.py schema_salad.egg-info/PKG-INFO schema_salad.egg-info/SOURCES.txt schema_salad.egg-info/dependency_links.txt schema_salad.egg-info/entry_points.txt schema_salad.egg-info/pbr.json schema_salad.egg-info/requires.txt schema_salad.egg-info/top_level.txt schema_salad.egg-info/zip-safe schema_salad/metaschema/field_name.yml schema_salad/metaschema/field_name_proc.yml schema_salad/metaschema/field_name_schema.yml schema_salad/metaschema/field_name_src.yml schema_salad/metaschema/ident_res.yml schema_salad/metaschema/ident_res_proc.yml schema_salad/metaschema/ident_res_schema.yml schema_salad/metaschema/ident_res_src.yml schema_salad/metaschema/import_include.md schema_salad/metaschema/link_res.yml schema_salad/metaschema/link_res_proc.yml schema_salad/metaschema/link_res_schema.yml schema_salad/metaschema/link_res_src.yml schema_salad/metaschema/map_res.yml schema_salad/metaschema/map_res_proc.yml schema_salad/metaschema/map_res_schema.yml schema_salad/metaschema/map_res_src.yml schema_salad/metaschema/metaschema.html schema_salad/metaschema/metaschema.yml schema_salad/metaschema/metaschema2.yml schema_salad/metaschema/metaschema_base.yml schema_salad/metaschema/salad.md schema_salad/metaschema/typedsl_res.yml schema_salad/metaschema/typedsl_res_proc.yml schema_salad/metaschema/typedsl_res_schema.yml schema_salad/metaschema/typedsl_res_src.yml schema_salad/metaschema/vocab_res.yml schema_salad/metaschema/vocab_res_proc.yml schema_salad/metaschema/vocab_res_schema.yml schema_salad/metaschema/vocab_res_src.yml schema_salad/tests/#cg_metaschema.py# schema_salad/tests/.coverage schema_salad/tests/EDAM.owl schema_salad/tests/Process.yml schema_salad/tests/__init__.py schema_salad/tests/cwl-pre.yml schema_salad/tests/df schema_salad/tests/df2 schema_salad/tests/frag.yml schema_salad/tests/hello.txt schema_salad/tests/hellofield.yml schema_salad/tests/matcher.py schema_salad/tests/metaschema-pre.yml schema_salad/tests/mixin.yml schema_salad/tests/pt.yml schema_salad/tests/test_cg.py schema_salad/tests/test_cli_args.py schema_salad/tests/test_errors.py schema_salad/tests/test_examples.py schema_salad/tests/test_fetch.py schema_salad/tests/test_print_oneline.py schema_salad/tests/test_ref_resolver.py schema_salad/tests/test_validate.pyx schema_salad/tests/util.py schema_salad/tests/docimp/d1.yml schema_salad/tests/docimp/d2.md schema_salad/tests/docimp/d3.yml schema_salad/tests/docimp/d4.yml schema_salad/tests/docimp/d5.md schema_salad/tests/docimp/dpre.json schema_salad/tests/test_schema/CommandLineTool.yml schema_salad/tests/test_schema/CommonWorkflowLanguage.yml schema_salad/tests/test_schema/Process.yml schema_salad/tests/test_schema/Workflow.yml schema_salad/tests/test_schema/concepts.md schema_salad/tests/test_schema/contrib.md schema_salad/tests/test_schema/intro.md schema_salad/tests/test_schema/invocation.md schema_salad/tests/test_schema/metaschema_base.yml schema_salad/tests/test_schema/test1.cwl schema_salad/tests/test_schema/test10.cwl schema_salad/tests/test_schema/test11.cwl schema_salad/tests/test_schema/test12.cwl schema_salad/tests/test_schema/test13.cwl schema_salad/tests/test_schema/test14.cwl schema_salad/tests/test_schema/test15.cwl schema_salad/tests/test_schema/test16.cwl schema_salad/tests/test_schema/test17.cwl schema_salad/tests/test_schema/test18.cwl schema_salad/tests/test_schema/test19.cwl schema_salad/tests/test_schema/test2.cwl schema_salad/tests/test_schema/test3.cwl schema_salad/tests/test_schema/test4.cwl schema_salad/tests/test_schema/test5.cwl schema_salad/tests/test_schema/test6.cwl schema_salad/tests/test_schema/test7.cwl schema_salad/tests/test_schema/test8.cwl schema_salad/tests/test_schema/test9.cwlschema-salad-2.6.20171201034858/setup.py0000755000175100017510000000517313162250036017276 0ustar peterpeter00000000000000#!/usr/bin/env python import os import sys import setuptools.command.egg_info as egg_info_cmd from setuptools import setup, find_packages SETUP_DIR = os.path.dirname(__file__) README = os.path.join(SETUP_DIR, 'README.rst') try: import gittaggers tagger = gittaggers.EggInfoFromGit except ImportError: tagger = egg_info_cmd.egg_info needs_pytest = {'pytest', 'test', 'ptr'}.intersection(sys.argv) pytest_runner = ['pytest-runner'] if needs_pytest else [] if os.path.exists("requirements.txt"): requirements = [ r for r in open("requirements.txt").read().split("\n") if ";" not in r] else: # In tox, it will cover them anyway. requirements = [] install_requires = [ 'setuptools', 'requests >= 1.0', 'ruamel.yaml >= 0.12.4, < 0.15', 'rdflib >= 4.2.2, < 4.3.0', 'rdflib-jsonld >= 0.3.0, < 0.5.0', 'mistune >= 0.7.3, < 0.8', 'typing >= 3.5.3', 'CacheControl >= 0.11.7, < 0.12', 'lockfile >= 0.9', 'six >= 1.8.0'] extras_require={ ':python_version<"3"': ['avro == 1.8.1'], ':python_version>="3"': ['future', 'avro-cwl == 1.8.4'] # fork of avro for working with python3 } setup(name='schema-salad', version='2.6', description='Schema Annotations for Linked Avro Data (SALAD)', long_description=open(README).read(), author='Common workflow language working group', author_email='common-workflow-language@googlegroups.com', url="https://github.com/common-workflow-language/common-workflow-language", download_url="https://github.com/common-workflow-language/common-workflow-language", license='Apache 2.0', setup_requires=[] + pytest_runner, packages=["schema_salad", "schema_salad.tests"], package_data={'schema_salad': ['metaschema/*']}, include_package_data=True, install_requires=install_requires, extras_require=extras_require, test_suite='tests', tests_require=['pytest'], entry_points={ 'console_scripts': ["schema-salad-tool=schema_salad.main:main", "schema-salad-doc=schema_salad.makedoc:main"] }, zip_safe=True, cmdclass={'egg_info': tagger}, classifiers=[ "Environment :: Console", "Intended Audience :: Science/Research", "Operating System :: POSIX :: Linux", "Operating System :: MacOS :: MacOS X", "Development Status :: 4 - Beta", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6" ] )