pax_global_header00006660000000000000000000000064141247423040014513gustar00rootroot0000000000000052 comment=5d44cdfa3e70e1587ff45e8a02d9de3e87785b41 goavro-2.10.1/000077500000000000000000000000001412474230400130715ustar00rootroot00000000000000goavro-2.10.1/.gitignore000066400000000000000000000000071412474230400150560ustar00rootroot00000000000000*.test goavro-2.10.1/AUTHORS000066400000000000000000000026541412474230400141500ustar00rootroot00000000000000Goavro was originally created during the Fall of 2014 at LinkedIn, Corp., in New York City, New York, USA. The following persons, listed in alphabetical order, have participated with goavro development by contributing code and test cases. Alan Gardner Billy Hand Christian Blades Corey Scott Darshan Shaligram Dylan Wen Enrico Candino Fellyn Silliman James Crasta Jeff Haynie Joe Roth Karrick S. McDermott Kasey Klipsch Michael Johnson Murray Resinski Nicolas Kaiser Sebastien Launay Thomas Desrosiers kklipsch seborama A big thank you to these persons who provided testing and amazing feedback to goavro during its initial implementation: Dennis Ordanov Thomas Desrosiers Also a big thank you is extended to our supervisors who supported our efforts to bring goavro to the open source community: Greg Leffler Nick Berry goavro-2.10.1/LICENSE000066400000000000000000000261351412474230400141050ustar00rootroot00000000000000 Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. goavro-2.10.1/README.md000066400000000000000000000364371412474230400143650ustar00rootroot00000000000000# goavro Goavro is a library that encodes and decodes Avro data. ## Description * Encodes to and decodes from both binary and textual JSON Avro data. * `Codec` is stateless and is safe to use by multiple goroutines. With the exception of features not yet supported, goavro attempts to be fully compliant with the most recent version of the [Avro specification](http://avro.apache.org/docs/1.8.2/spec.html). ## Dependency Notice All usage of `gopkg.in` has been removed in favor of Go modules. Please update your import paths to `github.com/linkedin/goavro/v2`. v1 users can still use old versions of goavro by adding a constraint to your `go.mod` or `Gopkg.toml` file. ``` require ( github.com/linkedin/goavro v1.0.5 ) ``` ```toml [[constraint]] name = "github.com/linkedin/goavro" version = "=1.0.5" ``` ## Major Improvements in v2 over v1 ### Avro namespaces The original version of this library was written prior to my really understanding how Avro namespaces ought to work. After using Avro for a long time now, and after a lot of research, I think I grok Avro namespaces properly, and the library now correctly handles every test case the Apache Avro distribution has for namespaces, including being able to refer to a previously defined data type later on in the same schema. ### Getting Data into and out of Records The original version of this library required creating `goavro.Record` instances, and use of getters and setters to access a record's fields. When schemas were complex, this required a lot of work to debug and get right. The original version also required users to break schemas in chunks, and have a different schema for each record type. This was cumbersome, annoying, and error prone. The new version of this library eliminates the `goavro.Record` type, and accepts a native Go map for all records to be encoded. Keys are the field names, and values are the field values. Nothing could be more easy. Conversely, decoding Avro data yields a native Go map for the upstream client to pull data back out of. Furthermore, there is never a reason to ever have to break your schema down into record schemas. Merely feed the entire schema into the `NewCodec` function once when you create the `Codec`, then use it. This library knows how to parse the data provided to it and ensure data values for records and their fields are properly encoded and decoded. ### 3x--4x Performance Improvement The original version of this library was truly written with Go's idea of `io.Reader` and `io.Writer` composition in mind. Although composition is a powerful tool, the original library had to pull bytes off the `io.Reader`--often one byte at a time--check for read errors, decode the bytes, and repeat. This version, by using a native Go byte slice, both decoding and encoding complex Avro data here at LinkedIn is between three and four times faster than before. ### Avro JSON Support The original version of this library did not support JSON encoding or decoding, because it wasn't deemed useful for our internal use at the time. When writing the new version of the library I decided to tackle this issue once and for all, because so many engineers needed this functionality for their work. ### Better Handling of Record Field Default Values The original version of this library did not well handle default values for record fields. This version of the library uses a default value of a record field when encoding from native Go data to Avro data and the record field is not specified. Additionally, when decoding from Avro JSON data to native Go data, and a field is not specified, the default value will be used to populate the field. ## Contrast With Code Generation Tools If you have the ability to rebuild and redeploy your software whenever data schemas change, code generation tools might be the best solution for your application. There are numerous excellent tools for generating source code to translate data between native and Avro binary or textual data. One such tool is linked below. If a particular application is designed to work with a rarely changing schema, programs that use code generated functions can potentially be more performant than a program that uses goavro to create a `Codec` dynamically at run time. * [gogen-avro](https://github.com/alanctgardner/gogen-avro) I recommend benchmarking the resultant programs using typical data using both the code generated functions and using goavro to see which performs better. Not all code generated functions will out perform goavro for all data corpuses. If you don't have the ability to rebuild and redeploy software updates whenever a data schema change occurs, goavro could be a great fit for your needs. With goavro, your program can be given a new schema while running, compile it into a `Codec` on the fly, and immediately start encoding or decoding data using that `Codec`. Because Avro encoding specifies that encoded data always be accompanied by a schema this is not usually a problem. If the schema change is backwards compatible, and the portion of your program that handles the decoded data is still able to reference the decoded fields, there is nothing that needs to be done when the schema change is detected by your program when using goavro `Codec` instances to encode or decode data. ## Resources * [Avro CLI Examples](https://github.com/miguno/avro-cli-examples) * [Avro](https://avro.apache.org/) * [Google Snappy](https://google.github.io/snappy/) * [JavaScript Object Notation, JSON](https://www.json.org/) * [Kafka](https://kafka.apache.org) ## Usage Documentation is available via [![GoDoc](https://godoc.org/github.com/linkedin/goavro?status.svg)](https://godoc.org/github.com/linkedin/goavro). ```Go package main import ( "fmt" "github.com/linkedin/goavro/v2" ) func main() { codec, err := goavro.NewCodec(` { "type": "record", "name": "LongList", "fields" : [ {"name": "next", "type": ["null", "LongList"], "default": null} ] }`) if err != nil { fmt.Println(err) } // NOTE: May omit fields when using default value textual := []byte(`{"next":{"LongList":{}}}`) // Convert textual Avro data (in Avro JSON format) to native Go form native, _, err := codec.NativeFromTextual(textual) if err != nil { fmt.Println(err) } // Convert native Go form to binary Avro data binary, err := codec.BinaryFromNative(nil, native) if err != nil { fmt.Println(err) } // Convert binary Avro data back to native Go form native, _, err = codec.NativeFromBinary(binary) if err != nil { fmt.Println(err) } // Convert native Go form to textual Avro data textual, err = codec.TextualFromNative(nil, native) if err != nil { fmt.Println(err) } // NOTE: Textual encoding will show all fields, even those with values that // match their default values fmt.Println(string(textual)) // Output: {"next":{"LongList":{"next":null}}} } ``` Also please see the example programs in the `examples` directory for reference. ### ab2t The `ab2t` program is similar to the reference standard `avrocat` program and converts Avro OCF files to Avro JSON encoding. ### arw The Avro-ReWrite program, `arw`, can be used to rewrite an Avro OCF file while optionally changing the block counts, the compression algorithm. `arw` can also upgrade the schema provided the existing datum values can be encoded with the newly provided schema. ### avroheader The Avro Header program, `avroheader`, can be used to print various header information from an OCF file. ### splice The `splice` program can be used to splice together an OCF file from an Avro schema file and a raw Avro binary data file. ### Translating Data A `Codec` provides four methods for translating between a byte slice of either binary or textual Avro data and native Go data. The following methods convert data between native Go data and byte slices of the binary Avro representation: BinaryFromNative NativeFromBinary The following methods convert data between native Go data and byte slices of the textual Avro representation: NativeFromTextual TextualFromNative Each `Codec` also exposes the `Schema` method to return a simplified version of the JSON schema string used to create the `Codec`. #### Translating From Avro to Go Data Goavro does not use Go's structure tags to translate data between native Go types and Avro encoded data. When translating from either binary or textual Avro to native Go data, goavro returns primitive Go data values for corresponding Avro data values. The table below shows how goavro translates Avro types to Go types. | Avro | Go     | | ------------------ | ------------------------ | | `null` | `nil` | | `boolean` | `bool` | | `bytes` | `[]byte` | | `float` | `float32` | | `double` | `float64` | | `long` | `int64` | | `int` | `int32`   | | `string` | `string` | | `array` | `[]interface{}` | | `enum` | `string` | | `fixed` | `[]byte`       | | `map` and `record` | `map[string]interface{}` | | `union` | *see below*    | Because of encoding rules for Avro unions, when an union's value is `null`, a simple Go `nil` is returned. However when an union's value is non-`nil`, a Go `map[string]interface{}` with a single key is returned for the union. The map's single key is the Avro type name and its value is the datum's value. #### Translating From Go to Avro Data Goavro does not use Go's structure tags to translate data between native Go types and Avro encoded data. When translating from native Go to either binary or textual Avro data, goavro generally requires the same native Go data types as the decoder would provide, with some exceptions for programmer convenience. Goavro will accept any numerical data type provided there is no precision lost when encoding the value. For instance, providing `float64(3.0)` to an encoder expecting an Avro `int` would succeed, while sending `float64(3.5)` to the same encoder would return an error. When providing a slice of items for an encoder, the encoder will accept either `[]interface{}`, or any slice of the required type. For instance, when the Avro schema specifies: `{"type":"array","items":"string"}`, the encoder will accept either `[]interface{}`, or `[]string`. If given `[]int`, the encoder will return an error when it attempts to encode the first non-string array value using the string encoder. When providing a value for an Avro union, the encoder will accept `nil` for a `null` value. If the value is non-`nil`, it must be a `map[string]interface{}` with a single key-value pair, where the key is the Avro type name and the value is the datum's value. As a convenience, the `Union` function wraps any datum value in a map as specified above. ```Go func ExampleUnion() { codec, err := goavro.NewCodec(`["null","string","int"]`) if err != nil { fmt.Println(err) } buf, err := codec.TextualFromNative(nil, goavro.Union("string", "some string")) if err != nil { fmt.Println(err) } fmt.Println(string(buf)) // Output: {"string":"some string"} } ``` ## Limitations Goavro is a fully featured encoder and decoder of binary and textual JSON Avro data. It fully supports recursive data structures, unions, and namespacing. It does have a few limitations that have yet to be implemented. ### Aliases The Avro specification allows an implementation to optionally map a writer's schema to a reader's schema using aliases. Although goavro can compile schemas with aliases, it does not yet implement this feature. ### Kafka Streams [Kafka](http://kafka.apache.org) is the reason goavro was written. Similar to Avro Object Container Files being a layer of abstraction above Avro Data Serialization format, Kafka's use of Avro is a layer of abstraction that also sits above Avro Data Serialization format, but has its own schema. Like Avro Object Container Files, this has been implemented but removed until the API can be improved. ### Default Maximum Block Counts, and Block Sizes When decoding arrays, maps, and OCF files, the Avro specification states that the binary includes block counts and block sizes that specify how many items are in the next block, and how many bytes are in the next block. To prevent possible denial-of-service attacks on clients that use this library caused by attempting to decode maliciously crafted data, decoded block counts and sizes are compared against public library variables MaxBlockCount and MaxBlockSize. When the decoded values exceed these values, the decoder returns an error. Because not every upstream client is the same, we've chosen some sane defaults for these values, but left them as mutable variables, so that clients are able to override if deemed necessary for their purposes. Their initial default values are (`math.MaxInt32` or ~2.2GB). ### Schema Evolution Please see [my reasons why schema evolution is broken for Avro 1.x](https://github.com/linkedin/goavro/blob/master/SCHEMA-EVOLUTION.md). ## License ### Goavro license Copyright 2017 LinkedIn Corp. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ### Google Snappy license Copyright (c) 2011 The Snappy-Go Authors. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Google Inc. nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ## Third Party Dependencies ### Google Snappy Goavro links with [Google Snappy](http://google.github.io/snappy/) to provide Snappy compression and decompression support. goavro-2.10.1/SCHEMA-EVOLUTION.md000066400000000000000000000074101412474230400157370ustar00rootroot00000000000000From the Avro specification: default: A default value for this field, used when reading instances that lack this field (optional). Permitted values depend on the field's schema type, according to the table below. Default values for union fields correspond to the first schema in the union. Default values for bytes and fixed fields are JSON strings, where Unicode code points 0-255 are mapped to unsigned 8-bit byte values 0-255. I read the above to mean that the purpose of default values are to allow reading Avro data that was written without the fields, and not necessarily augmentation of data being serialized. So in general I agree with you in terms of purpose. One very important aspect of Avro is that the schema used to serialize the data should always remain with the data, so that a reader would always be able to read the schema and then be able to consume the data. I think most people still agree so far. However, this is where things get messy. Schema evolution is frequently cited when folks want to use a new version of the schema to read data that was once written using an older version of that schema. I do not believe the Avro specification properly handles schema evolution. Here's a simple example: ``` Record v0: name: string nickname: string, default: "" ``` ``` Record v1: name: string nickname: string, default: "" title: string, default: "" ``` Okay, now a binary stream of records is just a bunch of strings. Let's do that now. ``` 0x0A, A, l, i, c, e, 0x06, B, o, b, 0x0A, B, r, u, c, e, 0x0A, S, a, l, l, y, 0x06, A, n, n ``` How many records is that? It could be as many as 5 records, each of a single name and no nicknames. It could be as few as 2 records, one of them with a nickname and a title, and one with only a nickname, or a title. Now to drive home the nail that Avro schema evolution is broken, even if each record had a header that indicated how many bytes it would consume, we could know where one record began and ended, and how many records there are. But if we were to read a record with two strings in it, is the second string the nickname or the title? The Avro specification has no answer to that question, so neither do I. Effectively, Avro could be a great tool for serializing complex data, but it's broken in its current form, and to fix it would require it to break compatibility with itself, effectively rendering any binary data serialized in a previous version of Avro unreadable by new versions, unless it had some sort of version marker on the data so a library could branch. One great solution would be augmenting the binary encoding with a simple field number identifier. Let's imagine an Avro 2.x that had this feature, and would support schema evolution. Here's an example stream of bytes that could be unambiguously decoded using the new schema: ``` 0x02, 0x0A, A, l, i, c, e, 0x02, 0x06, B, o, B, 0x04, 0x0A, B, r, u, c, e, 0x02, 0x0C, C, h, a, r, l, i, e, 0x06, 0x04, M, r ``` In the above example of my fake Avro 2.0, this can be deterministically decoded because 0x02 indicates the following is field number 1 (name), followed by string length 5, followed by Alice. Then the decoder would see 0x02, marking field number 1 again, which means, "next record", followed by string length 3, followed by Bob, followed by 0x04, which means field number 2 (nickname), followed by string length 5, followed by Bruce. Followed by field number 1 (next record), followed by string length 6, followed by Charlie, followed by field number 3 (title), followed by string length 2, followed by Mr. In my hypothetical version of Avro 2, Avro can cope with schema evolution using record defaults and such. Sadly, Avro 1.x cannot and thus we should avoid using it if your use-case requires schema evolution. goavro-2.10.1/array.go000066400000000000000000000202671412474230400145450ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "fmt" "io" "math" "reflect" ) func makeArrayCodec(st map[string]*Codec, enclosingNamespace string, schemaMap map[string]interface{}, cb *codecBuilder) (*Codec, error) { // array type must have items itemSchema, ok := schemaMap["items"] if !ok { return nil, fmt.Errorf("Array ought to have items key") } itemCodec, err := buildCodec(st, enclosingNamespace, itemSchema, cb) if err != nil { return nil, fmt.Errorf("Array items ought to be valid Avro type: %s", err) } return &Codec{ typeName: &name{"array", nullNamespace}, nativeFromBinary: func(buf []byte) (interface{}, []byte, error) { var value interface{} var err error // block count and block size if value, buf, err = longNativeFromBinary(buf); err != nil { return nil, nil, fmt.Errorf("cannot decode binary array block count: %s", err) } blockCount := value.(int64) if blockCount < 0 { // NOTE: A negative block count implies there is a long encoded // block size following the negative block count. We have no use // for the block size in this decoder, so we read and discard // the value. if blockCount == math.MinInt64 { // The minimum number for any signed numerical type can never be made positive return nil, nil, fmt.Errorf("cannot decode binary array with block count: %d", blockCount) } blockCount = -blockCount // convert to its positive equivalent if _, buf, err = longNativeFromBinary(buf); err != nil { return nil, nil, fmt.Errorf("cannot decode binary array block size: %s", err) } } // Ensure block count does not exceed some sane value. if blockCount > MaxBlockCount { return nil, nil, fmt.Errorf("cannot decode binary array when block count exceeds MaxBlockCount: %d > %d", blockCount, MaxBlockCount) } // NOTE: While the attempt of a RAM optimization shown below is not // necessary, many encoders will encode all items in a single block. // We can optimize amount of RAM allocated by runtime for the array // by initializing the array for that number of items. arrayValues := make([]interface{}, 0, blockCount) for blockCount != 0 { // Decode `blockCount` datum values from buffer for i := int64(0); i < blockCount; i++ { if value, buf, err = itemCodec.nativeFromBinary(buf); err != nil { return nil, nil, fmt.Errorf("cannot decode binary array item %d: %s", i+1, err) } arrayValues = append(arrayValues, value) } // Decode next blockCount from buffer, because there may be more blocks if value, buf, err = longNativeFromBinary(buf); err != nil { return nil, nil, fmt.Errorf("cannot decode binary array block count: %s", err) } blockCount = value.(int64) if blockCount < 0 { // NOTE: A negative block count implies there is a long // encoded block size following the negative block count. We // have no use for the block size in this decoder, so we // read and discard the value. if blockCount == math.MinInt64 { // The minimum number for any signed numerical type can // never be made positive return nil, nil, fmt.Errorf("cannot decode binary array with block count: %d", blockCount) } blockCount = -blockCount // convert to its positive equivalent if _, buf, err = longNativeFromBinary(buf); err != nil { return nil, nil, fmt.Errorf("cannot decode binary array block size: %s", err) } } // Ensure block count does not exceed some sane value. if blockCount > MaxBlockCount { return nil, nil, fmt.Errorf("cannot decode binary array when block count exceeds MaxBlockCount: %d > %d", blockCount, MaxBlockCount) } } return arrayValues, buf, nil }, binaryFromNative: func(buf []byte, datum interface{}) ([]byte, error) { arrayValues, err := convertArray(datum) if err != nil { return nil, fmt.Errorf("cannot encode binary array: %s", err) } arrayLength := int64(len(arrayValues)) var alreadyEncoded, remainingInBlock int64 for i, item := range arrayValues { if remainingInBlock == 0 { // start a new block remainingInBlock = arrayLength - alreadyEncoded if remainingInBlock > MaxBlockCount { // limit block count to MacBlockCount remainingInBlock = MaxBlockCount } buf, _ = longBinaryFromNative(buf, remainingInBlock) } if buf, err = itemCodec.binaryFromNative(buf, item); err != nil { return nil, fmt.Errorf("cannot encode binary array item %d: %v: %s", i+1, item, err) } remainingInBlock-- alreadyEncoded++ } return longBinaryFromNative(buf, 0) // append trailing 0 block count to signal end of Array }, nativeFromTextual: func(buf []byte) (interface{}, []byte, error) { var arrayValues []interface{} var value interface{} var err error var b byte if buf, err = advanceAndConsume(buf, '['); err != nil { return nil, nil, fmt.Errorf("cannot decode textual array: %s", err) } if buf, _ = advanceToNonWhitespace(buf); len(buf) == 0 { return nil, nil, fmt.Errorf("cannot decode textual array: %s", io.ErrShortBuffer) } // NOTE: Special case for empty array if buf[0] == ']' { return arrayValues, buf[1:], nil } // NOTE: Also terminates when read ']' byte. for len(buf) > 0 { // decode value value, buf, err = itemCodec.nativeFromTextual(buf) if err != nil { return nil, nil, fmt.Errorf("cannot decode textual array: %s", err) } arrayValues = append(arrayValues, value) // either comma or closing curly brace if buf, _ = advanceToNonWhitespace(buf); len(buf) == 0 { return nil, nil, fmt.Errorf("cannot decode textual array: %s", io.ErrShortBuffer) } switch b = buf[0]; b { case ']': return arrayValues, buf[1:], nil case ',': // no-op default: return nil, nil, fmt.Errorf("cannot decode textual array: expected ',' or ']'; received: %q", b) } // NOTE: consume comma from above if buf, _ = advanceToNonWhitespace(buf[1:]); len(buf) == 0 { return nil, nil, fmt.Errorf("cannot decode textual array: %s", io.ErrShortBuffer) } } return nil, buf, io.ErrShortBuffer }, textualFromNative: func(buf []byte, datum interface{}) ([]byte, error) { arrayValues, err := convertArray(datum) if err != nil { return nil, fmt.Errorf("cannot encode textual array: %s", err) } var atLeastOne bool buf = append(buf, '[') for i, item := range arrayValues { atLeastOne = true // Encode value buf, err = itemCodec.textualFromNative(buf, item) if err != nil { // field was specified in datum; therefore its value was invalid return nil, fmt.Errorf("cannot encode textual array item %d; %v: %s", i+1, item, err) } buf = append(buf, ',') } if atLeastOne { return append(buf[:len(buf)-1], ']'), nil } return append(buf, ']'), nil }, }, nil } // convertArray converts interface{} to []interface{} if possible. func convertArray(datum interface{}) ([]interface{}, error) { arrayValues, ok := datum.([]interface{}) if ok { return arrayValues, nil } // NOTE: When given a slice of any other type, zip values to // items as a convenience to client. v := reflect.ValueOf(datum) if v.Kind() != reflect.Slice { return nil, fmt.Errorf("cannot create []interface{}: expected slice; received: %T", datum) } // NOTE: Two better alternatives to the current algorithm are: // (1) mutate the reflection tuple underneath to convert the // []int, for example, to []interface{}, with O(1) complexity // (2) use copy builtin to zip the data items over with O(n) complexity, // but more efficient than what's below. // Suggestions? arrayValues = make([]interface{}, v.Len()) for idx := 0; idx < v.Len(); idx++ { arrayValues[idx] = v.Index(idx).Interface() } return arrayValues, nil } goavro-2.10.1/array_test.go000066400000000000000000000160201412474230400155740ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "testing" ) func TestArraySchema(t *testing.T) { testSchemaValid(t, `{"type":"array","items":"bytes"}`) testSchemaInvalid(t, `{"type":"array","item":"int"}`, "Array ought to have items key") testSchemaInvalid(t, `{"type":"array","items":"integer"}`, "Array items ought to be valid Avro type") testSchemaInvalid(t, `{"type":"array","items":3}`, "Array items ought to be valid Avro type") testSchemaInvalid(t, `{"type":"array","items":int}`, "invalid character") // type name must be quoted } func TestArrayDecodeInitialBlockCountCannotDecode(t *testing.T) { testBinaryDecodeFail(t, `{"type":"array","items":"int"}`, nil, "block count") } func TestArrayDecodeInitialBlockCountZero(t *testing.T) { testBinaryDecodePass(t, `{"type":"array","items":"int"}`, []interface{}{}, []byte{0}) } func TestArrayDecodeInitialBlockCountNegative(t *testing.T) { testBinaryDecodePass(t, `{"type":"array","items":"int"}`, []interface{}{3}, []byte{1, 2, 6, 0}) } func TestArrayDecodeInitialBlockCountTooLarge(t *testing.T) { testBinaryDecodeFail(t, `{"type":"array","items":"int"}`, morePositiveThanMaxBlockCount, "block count") } func TestArrayDecodeInitialBlockCountNegativeTooLarge(t *testing.T) { testBinaryDecodeFail(t, `{"type":"array","items":"int"}`, append(moreNegativeThanMaxBlockCount, byte(0)), "block count") } func TestArrayDecodeInitialBlockCountTooNegative(t *testing.T) { // -(uint8(-128)) == -128 testBinaryDecodeFail(t, `{"type":"array","items":"int"}`, append(mostNegativeBlockCount, byte(0)), "block count") } func TestArrayDecodeNextBlockCountCannotDecode(t *testing.T) { testBinaryDecodeFail(t, `{"type":"array","items":"int"}`, []byte{2, 6}, "block count") } func TestArrayDecodeNextBlockCountNegative(t *testing.T) { testBinaryDecodePass(t, `{"type":"array","items":"int"}`, []interface{}{3, 3}, []byte{2, 6, 1, 2, 6, 0}) } func TestArrayDecodeNextBlockCountTooLarge(t *testing.T) { testBinaryDecodeFail(t, `{"type":"array","items":"int"}`, append([]byte{2, 6}, morePositiveThanMaxBlockCount...), "block count") } func TestArrayDecodeNextBlockCountNegativeTooLarge(t *testing.T) { testBinaryDecodeFail(t, `{"type":"array","items":"int"}`, append([]byte{2, 6}, append(moreNegativeThanMaxBlockCount, []byte{2, 6, 0}...)...), "block count") } func TestArrayDecodeNextBlockCountTooNegative(t *testing.T) { testBinaryDecodeFail(t, `{"type":"array","items":"int"}`, append([]byte{2, 6}, append(mostNegativeBlockCount, []byte{2, 6, 0}...)...), "block count") } func TestArrayNull(t *testing.T) { testBinaryCodecPass(t, `{"type":"array","items":"null"}`, []interface{}{}, []byte{0}) testBinaryCodecPass(t, `{"type":"array","items":"null"}`, []interface{}{nil}, []byte{2, 0}) testBinaryCodecPass(t, `{"type":"array","items":"null"}`, []interface{}{nil, nil}, []byte{4, 0}) } func TestArrayReceiveSliceEmptyInterface(t *testing.T) { testBinaryCodecPass(t, `{"type":"array","items":"boolean"}`, []interface{}{}, []byte{0}) testBinaryCodecPass(t, `{"type":"array","items":"boolean"}`, []interface{}{false}, []byte{2, 0, 0}) testBinaryCodecPass(t, `{"type":"array","items":"boolean"}`, []interface{}{true}, []byte{2, 1, 0}) testBinaryCodecPass(t, `{"type":"array","items":"boolean"}`, []interface{}{false, false}, []byte{4, 0, 0, 0}) testBinaryCodecPass(t, `{"type":"array","items":"boolean"}`, []interface{}{true, true}, []byte{4, 1, 1, 0}) } func TestArrayBinaryReceiveSliceInt(t *testing.T) { testBinaryCodecPass(t, `{"type":"array","items":"int"}`, []int{}, []byte{0}) testBinaryCodecPass(t, `{"type":"array","items":"int"}`, []int{1}, []byte("\x02\x02\x00")) testBinaryCodecPass(t, `{"type":"array","items":"int"}`, []int{1, 2}, []byte("\x04\x02\x04\x00")) } func TestArrayTextualReceiveSliceInt(t *testing.T) { testTextCodecPass(t, `{"type":"array","items":"int"}`, []int{}, []byte(`[]`)) testTextCodecPass(t, `{"type":"array","items":"int"}`, []int{1}, []byte(`[1]`)) testTextCodecPass(t, `{"type":"array","items":"int"}`, []int{1, 2}, []byte(`[1,2]`)) } func TestArrayBytes(t *testing.T) { testBinaryCodecPass(t, `{"type":"array","items":"bytes"}`, []interface{}(nil), []byte{0}) // item count == 0 testBinaryCodecPass(t, `{"type":"array","items":"bytes"}`, []interface{}{[]byte("foo")}, []byte("\x02\x06foo\x00")) // item count == 1, item 1 size == 3, foo, no more items testBinaryCodecPass(t, `{"type":"array","items":"bytes"}`, []interface{}{[]byte("foo"), []byte("bar")}, []byte("\x04\x06foo\x06bar\x00")) testBinaryCodecPass(t, `{"type":"array","items":"bytes"}`, [][]byte(nil), []byte{0}) // item count == 0 testBinaryCodecPass(t, `{"type":"array","items":"bytes"}`, [][]byte{[]byte("foo")}, []byte("\x02\x06foo\x00")) // item count == 1, item 1 size == 3, foo, no more items testBinaryCodecPass(t, `{"type":"array","items":"bytes"}`, [][]byte{[]byte("foo"), []byte("bar")}, []byte("\x04\x06foo\x06bar\x00")) } func TestArrayEncodeError(t *testing.T) { // provided slice of primitive types that are not compatible with schema testBinaryEncodeFailBadDatumType(t, `{"type":"array","items":"int"}`, []string{"1"}) testBinaryEncodeFailBadDatumType(t, `{"type":"array","items":"int"}`, []string{"1", "2"}) } func TestArrayEncodeErrorFIXME(t *testing.T) { // NOTE: Would be better if returns error, however, because only the size is encoded, the // items encoder is never invoked to detect it is the wrong slice type if false { testBinaryEncodeFailBadDatumType(t, `{"type":"array","items":"int"}`, []string{}) } else { testBinaryCodecPass(t, `{"type":"array","items":"int"}`, []string{}, []byte{0}) } } func TestArrayTextDecodeFail(t *testing.T) { schema := `{"type":"array","items":"string"}` testTextDecodeFail(t, schema, []byte(` "v1" , "v2" ] `), "expected: '['") testTextDecodeFail(t, schema, []byte(` [ 13 , "v2" ] `), "expected initial \"") testTextDecodeFail(t, schema, []byte(` [ "v1 , "v2" ] `), "expected ',' or ']'") testTextDecodeFail(t, schema, []byte(` [ "v1" "v2" ] `), "expected ',' or ']'") testTextDecodeFail(t, schema, []byte(` [ "v1" , 13 ] `), "expected initial \"") testTextDecodeFail(t, schema, []byte(` [ "v1" , "v2 ] `), "expected final \"") testTextDecodeFail(t, schema, []byte(` [ "v1" , "v2" `), "short buffer") } func TestArrayTextCodecPass(t *testing.T) { schema := `{"type":"array","items":"string"}` datum := []interface{}{"⌘ ", "value2"} testTextEncodePass(t, schema, datum, []byte(`["\u0001\u2318 ","value2"]`)) testTextDecodePass(t, schema, datum, []byte(` [ "\u0001\u2318 " , "value2" ]`)) testTextCodecPass(t, schema, []interface{}{}, []byte(`[]`)) // empty array } goavro-2.10.1/benchmark_test.go000066400000000000000000000045101412474230400164110ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "io/ioutil" "testing" ) func BenchmarkNewCodecUsingV2(b *testing.B) { schema, err := ioutil.ReadFile("fixtures/quickstop.avsc") if err != nil { b.Fatal(err) } b.ResetTimer() for i := 0; i < b.N; i++ { _ = newCodecUsingV2(b, string(schema)) } } func BenchmarkNativeFromAvroUsingV2(b *testing.B) { avroBlob, err := ioutil.ReadFile("fixtures/quickstop-null.avro") if err != nil { b.Fatal(err) } b.ResetTimer() for i := 0; i < b.N; i++ { _, _ = nativeFromAvroUsingV2(b, avroBlob) } } func BenchmarkBinaryFromNativeUsingV2(b *testing.B) { avroBlob, err := ioutil.ReadFile("fixtures/quickstop-null.avro") if err != nil { b.Fatal(err) } nativeData, codec := nativeFromAvroUsingV2(b, avroBlob) b.ResetTimer() for i := 0; i < b.N; i++ { _ = binaryFromNativeUsingV2(b, codec, nativeData) } } func BenchmarkNativeFromBinaryUsingV2(b *testing.B) { avroBlob, err := ioutil.ReadFile("fixtures/quickstop-null.avro") if err != nil { b.Fatal(err) } nativeData, codec := nativeFromAvroUsingV2(b, avroBlob) binaryData := binaryFromNativeUsingV2(b, codec, nativeData) b.ResetTimer() for i := 0; i < b.N; i++ { _ = nativeFromBinaryUsingV2(b, codec, binaryData) } } func BenchmarkTextualFromNativeUsingV2(b *testing.B) { avroBlob, err := ioutil.ReadFile("fixtures/quickstop-null.avro") if err != nil { b.Fatal(err) } nativeData, codec := nativeFromAvroUsingV2(b, avroBlob) b.ResetTimer() for i := 0; i < b.N; i++ { _ = textFromNativeUsingV2(b, codec, nativeData) } } func BenchmarkNativeFromTextualUsingV2(b *testing.B) { avroBlob, err := ioutil.ReadFile("fixtures/quickstop-null.avro") if err != nil { b.Fatal(err) } nativeData, codec := nativeFromAvroUsingV2(b, avroBlob) textData := textFromNativeUsingV2(b, codec, nativeData) b.ResetTimer() for i := 0; i < b.N; i++ { _ = nativeFromTextUsingV2(b, codec, textData) } } goavro-2.10.1/binaryReader.go000066400000000000000000000132111412474230400160250ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "fmt" "io" "math" ) // bytesBinaryReader reads bytes from io.Reader and returns byte slice of // specified size or the error encountered while trying to read those bytes. func bytesBinaryReader(ior io.Reader) ([]byte, error) { size, err := longBinaryReader(ior) if err != nil { return nil, fmt.Errorf("cannot read bytes: cannot read size: %s", err) } if size < 0 { return nil, fmt.Errorf("cannot read bytes: size is negative: %d", size) } if size > MaxBlockSize { return nil, fmt.Errorf("cannot read bytes: size exceeds MaxBlockSize: %d > %d", size, MaxBlockSize) } buf := make([]byte, size) _, err = io.ReadAtLeast(ior, buf, int(size)) if err != nil { return nil, fmt.Errorf("cannot read bytes: %s", err) } return buf, nil } // longBinaryReader reads bytes from io.Reader until has complete long value, or // read error. func longBinaryReader(ior io.Reader) (int64, error) { var value uint64 var shift uint var err error var b byte // NOTE: While benchmarks show it's more performant to invoke ReadByte when // available, testing whether a variable's data type implements a particular // method is quite slow too. So perform the test once, and branch to the // appropriate loop based on the results. if byteReader, ok := ior.(io.ByteReader); ok { for { if b, err = byteReader.ReadByte(); err != nil { return 0, err // NOTE: must send back unaltered error to detect io.EOF } value |= uint64(b&intMask) << shift if b&intFlag == 0 { return (int64(value>>1) ^ -int64(value&1)), nil } shift += 7 } } // NOTE: ior does not also implement io.ByteReader, so we must allocate a // byte slice with a single byte, and read each byte into the slice. buf := make([]byte, 1) for { if _, err = ior.Read(buf); err != nil { return 0, err // NOTE: must send back unaltered error to detect io.EOF } b = buf[0] value |= uint64(b&intMask) << shift if b&intFlag == 0 { return (int64(value>>1) ^ -int64(value&1)), nil } shift += 7 } } // metadataBinaryReader reads bytes from io.Reader until has entire map value, // or read error. func metadataBinaryReader(ior io.Reader) (map[string][]byte, error) { var err error var value interface{} // block count and block size if value, err = longBinaryReader(ior); err != nil { return nil, fmt.Errorf("cannot read map block count: %s", err) } blockCount := value.(int64) if blockCount < 0 { if blockCount == math.MinInt64 { // The minimum number for any signed numerical type can never be // made positive return nil, fmt.Errorf("cannot read map with block count: %d", blockCount) } // NOTE: A negative block count implies there is a long encoded block // size following the negative block count. We have no use for the block // size in this decoder, so we read and discard the value. blockCount = -blockCount // convert to its positive equivalent if _, err = longBinaryReader(ior); err != nil { return nil, fmt.Errorf("cannot read map block size: %s", err) } } // Ensure block count does not exceed some sane value. if blockCount > MaxBlockCount { return nil, fmt.Errorf("cannot read map when block count exceeds MaxBlockCount: %d > %d", blockCount, MaxBlockCount) } // NOTE: While the attempt of a RAM optimization shown below is not // necessary, many encoders will encode all items in a single block. We can // optimize amount of RAM allocated by runtime for the array by initializing // the array for that number of items. mapValues := make(map[string][]byte, blockCount) for blockCount != 0 { // Decode `blockCount` datum values from buffer for i := int64(0); i < blockCount; i++ { // first decode the key string keyBytes, err := bytesBinaryReader(ior) if err != nil { return nil, fmt.Errorf("cannot read map key: %s", err) } key := string(keyBytes) if _, ok := mapValues[key]; ok { return nil, fmt.Errorf("cannot read map: duplicate key: %q", key) } // metadata values are always bytes buf, err := bytesBinaryReader(ior) if err != nil { return nil, fmt.Errorf("cannot read map value for key %q: %s", key, err) } mapValues[key] = buf } // Decode next blockCount from buffer, because there may be more blocks if value, err = longBinaryReader(ior); err != nil { return nil, fmt.Errorf("cannot read map block count: %s", err) } blockCount = value.(int64) if blockCount < 0 { if blockCount == math.MinInt64 { // The minimum number for any signed numerical type can never be // made positive return nil, fmt.Errorf("cannot read map with block count: %d", blockCount) } // NOTE: A negative block count implies there is a long encoded // block size following the negative block count. We have no use for // the block size in this decoder, so we read and discard the value. blockCount = -blockCount // convert to its positive equivalent if _, err = longBinaryReader(ior); err != nil { return nil, fmt.Errorf("cannot read map block size: %s", err) } } // Ensure block count does not exceed some sane value. if blockCount > MaxBlockCount { return nil, fmt.Errorf("cannot read map when block count exceeds MaxBlockCount: %d > %d", blockCount, MaxBlockCount) } } return mapValues, nil } goavro-2.10.1/binary_test.go000066400000000000000000000074251412474230400157530ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "bytes" "fmt" "math" "testing" ) var morePositiveThanMaxBlockCount, morePositiveThanMaxBlockSize, moreNegativeThanMaxBlockCount, mostNegativeBlockCount []byte func init() { c, err := NewCodec(`"long"`) if err != nil { panic(err) } morePositiveThanMaxBlockCount, err = c.BinaryFromNative(nil, (MaxBlockCount + 1)) if err != nil { panic(err) } morePositiveThanMaxBlockSize, err = c.BinaryFromNative(nil, (MaxBlockSize + 1)) if err != nil { panic(err) } moreNegativeThanMaxBlockCount, err = c.BinaryFromNative(nil, -(MaxBlockCount + 1)) if err != nil { panic(err) } mostNegativeBlockCount, err = c.BinaryFromNative(nil, int64(math.MinInt64)) if err != nil { panic(err) } } func testBinaryDecodeFail(t *testing.T, schema string, buf []byte, errorMessage string) { t.Helper() c, err := NewCodec(schema) if err != nil { t.Fatal(err) } value, newBuffer, err := c.NativeFromBinary(buf) ensureError(t, err, errorMessage) if value != nil { t.Errorf("GOT: %v; WANT: %v", value, nil) } if !bytes.Equal(buf, newBuffer) { t.Errorf("GOT: %v; WANT: %v", newBuffer, buf) } } func testBinaryEncodeFail(t *testing.T, schema string, datum interface{}, errorMessage string) { t.Helper() c, err := NewCodec(schema) if err != nil { t.Fatal(err) } buf, err := c.BinaryFromNative(nil, datum) ensureError(t, err, errorMessage) if buf != nil { t.Errorf("GOT: %v; WANT: %v", buf, nil) } } func testBinaryEncodeFailBadDatumType(t *testing.T, schema string, datum interface{}) { t.Helper() testBinaryEncodeFail(t, schema, datum, "received: ") } func testBinaryDecodeFailShortBuffer(t *testing.T, schema string, buf []byte) { t.Helper() testBinaryDecodeFail(t, schema, buf, "short buffer") } func testBinaryDecodePass(t *testing.T, schema string, datum interface{}, encoded []byte) { t.Helper() codec, err := NewCodec(schema) if err != nil { t.Fatal(err) } value, remaining, err := codec.NativeFromBinary(encoded) if err != nil { t.Fatalf("schema: %s; %s", schema, err) } // remaining ought to be empty because there is nothing remaining to be // decoded if actual, expected := len(remaining), 0; actual != expected { t.Errorf("schema: %s; Datum: %v; Actual: %#v; Expected: %#v", schema, datum, actual, expected) } // for testing purposes, to prevent big switch statement, convert each to // string and compare. if actual, expected := fmt.Sprintf("%v", value), fmt.Sprintf("%v", datum); actual != expected { t.Errorf("schema: %s; Datum: %v; Actual: %#v; Expected: %#v", schema, datum, actual, expected) } } func testBinaryEncodePass(t *testing.T, schema string, datum interface{}, expected []byte) { t.Helper() codec, err := NewCodec(schema) if err != nil { t.Fatalf("Schma: %q %s", schema, err) } actual, err := codec.BinaryFromNative(nil, datum) if err != nil { t.Fatalf("schema: %s; Datum: %v; %s", schema, datum, err) } if !bytes.Equal(actual, expected) { t.Errorf("schema: %s; Datum: %v; Actual: %#v; Expected: %#v", schema, datum, actual, expected) } } // testBinaryCodecPass does a bi-directional codec check, by encoding datum to // bytes, then decoding bytes back to datum. func testBinaryCodecPass(t *testing.T, schema string, datum interface{}, buf []byte) { t.Helper() testBinaryDecodePass(t, schema, datum, buf) testBinaryEncodePass(t, schema, datum, buf) } goavro-2.10.1/boolean.go000066400000000000000000000037131412474230400150430ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "bytes" "errors" "fmt" "io" ) func booleanNativeFromBinary(buf []byte) (interface{}, []byte, error) { if len(buf) < 1 { return nil, nil, io.ErrShortBuffer } var b byte b, buf = buf[0], buf[1:] switch b { case byte(0): return false, buf, nil case byte(1): return true, buf, nil default: return nil, nil, fmt.Errorf("cannot decode binary boolean: expected: Go byte(0) or byte(1); received: byte(%d)", b) } } func booleanBinaryFromNative(buf []byte, datum interface{}) ([]byte, error) { value, ok := datum.(bool) if !ok { return nil, fmt.Errorf("cannot encode binary boolean: expected: Go bool; received: %T", datum) } var b byte if value { b = 1 } return append(buf, b), nil } func booleanNativeFromTextual(buf []byte) (interface{}, []byte, error) { if len(buf) < 4 { return nil, nil, fmt.Errorf("cannot decode textual boolean: %s", io.ErrShortBuffer) } if bytes.Equal(buf[:4], []byte("true")) { return true, buf[4:], nil } if len(buf) < 5 { return nil, nil, fmt.Errorf("cannot decode textual boolean: %s", io.ErrShortBuffer) } if bytes.Equal(buf[:5], []byte("false")) { return false, buf[5:], nil } return nil, nil, errors.New("expected false or true") } func booleanTextualFromNative(buf []byte, datum interface{}) ([]byte, error) { value, ok := datum.(bool) if !ok { return nil, fmt.Errorf("boolean: expected: Go bool; received: %T", datum) } if value { return append(buf, "true"...), nil } return append(buf, "false"...), nil } goavro-2.10.1/boolean_test.go000066400000000000000000000023371412474230400161030ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import "testing" func TestSchemaPrimitiveCodecBoolean(t *testing.T) { testSchemaPrimativeCodec(t, `"boolean"`) } func TestPrimitiveBooleanBinary(t *testing.T) { testBinaryEncodeFailBadDatumType(t, `"boolean"`, 0) testBinaryEncodeFailBadDatumType(t, `"boolean"`, 1) testBinaryDecodeFailShortBuffer(t, `"boolean"`, nil) testBinaryCodecPass(t, `"boolean"`, false, []byte{0}) testBinaryCodecPass(t, `"boolean"`, true, []byte{1}) } func TestPrimitiveBooleanText(t *testing.T) { testTextEncodeFailBadDatumType(t, `"boolean"`, 0) testTextEncodeFailBadDatumType(t, `"boolean"`, 1) testTextDecodeFailShortBuffer(t, `"boolean"`, nil) testTextCodecPass(t, `"boolean"`, false, []byte("false")) testTextCodecPass(t, `"boolean"`, true, []byte("true")) } goavro-2.10.1/bytes.go000066400000000000000000000403601412474230400145510ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "encoding/hex" "errors" "fmt" "io" "os" "unicode" "unicode/utf16" "unicode/utf8" ) //////////////////////////////////////// // Binary Decode //////////////////////////////////////// func bytesNativeFromBinary(buf []byte) (interface{}, []byte, error) { if len(buf) < 1 { return nil, nil, fmt.Errorf("cannot decode binary bytes: %s", io.ErrShortBuffer) } var decoded interface{} var err error if decoded, buf, err = longNativeFromBinary(buf); err != nil { return nil, nil, fmt.Errorf("cannot decode binary bytes: %s", err) } size := decoded.(int64) // always returns int64 if size < 0 { return nil, nil, fmt.Errorf("cannot decode binary bytes: negative size: %d", size) } if size > int64(len(buf)) { return nil, nil, fmt.Errorf("cannot decode binary bytes: %s", io.ErrShortBuffer) } return buf[:size], buf[size:], nil } func stringNativeFromBinary(buf []byte) (interface{}, []byte, error) { d, b, err := bytesNativeFromBinary(buf) if err != nil { return nil, nil, fmt.Errorf("cannot decode binary string: %s", err) } return string(d.([]byte)), b, nil } //////////////////////////////////////// // Binary Encode //////////////////////////////////////// func bytesBinaryFromNative(buf []byte, datum interface{}) ([]byte, error) { var someBytes []byte switch d := datum.(type) { case []byte: someBytes = d case string: someBytes = []byte(d) default: return nil, fmt.Errorf("cannot encode binary bytes: expected: []byte or string; received: %T", datum) } buf, _ = longBinaryFromNative(buf, len(someBytes)) // only fails when given non integer return append(buf, someBytes...), nil // append datum bytes } func stringBinaryFromNative(buf []byte, datum interface{}) ([]byte, error) { var someBytes []byte switch d := datum.(type) { case []byte: someBytes = d case string: someBytes = []byte(d) default: return nil, fmt.Errorf("cannot encode binary bytes: expected: string; received: %T", datum) } buf, _ = longBinaryFromNative(buf, len(someBytes)) // only fails when given non integer return append(buf, someBytes...), nil // append datum bytes } //////////////////////////////////////// // Text Decode //////////////////////////////////////// func bytesNativeFromTextual(buf []byte) (interface{}, []byte, error) { buflen := len(buf) if buflen < 2 { return nil, nil, fmt.Errorf("cannot decode textual bytes: %s", io.ErrShortBuffer) } if buf[0] != '"' { return nil, nil, fmt.Errorf("cannot decode textual bytes: expected initial \"; found: %#U", buf[0]) } var newBytes []byte var escaped bool // Loop through bytes following initial double quote, but note we will // return immediately when find unescaped double quote. for i := 1; i < buflen; i++ { b := buf[i] if escaped { escaped = false if b2, ok := unescapeSpecialJSON(b); ok { newBytes = append(newBytes, b2) continue } if b == 'u' { // NOTE: Need at least 4 more bytes to read uint16, but subtract // 1 because do not want to count the trailing quote and // subtract another 1 because already consumed u but have yet to // increment i. if i > buflen-6 { return nil, nil, fmt.Errorf("cannot decode textual bytes: %s", io.ErrShortBuffer) } // NOTE: Avro bytes represent binary data, and do not // necessarily represent text. Therefore, Avro bytes are not // encoded in UTF-16. Each \u is followed by 4 hexadecimal // digits, the first and second of which must be 0. v, err := parseUint64FromHexSlice(buf[i+3 : i+5]) if err != nil { return nil, nil, fmt.Errorf("cannot decode textual bytes: %s", err) } i += 4 // absorb 4 characters: one 'u' and three of the digits newBytes = append(newBytes, byte(v)) continue } newBytes = append(newBytes, b) continue } if b == '\\' { escaped = true continue } if b == '"' { return newBytes, buf[i+1:], nil } newBytes = append(newBytes, b) } return nil, nil, fmt.Errorf("cannot decode textual bytes: expected final \"; found: %#U", buf[buflen-1]) } func stringNativeFromTextual(buf []byte) (interface{}, []byte, error) { buflen := len(buf) if buflen < 2 { return nil, nil, fmt.Errorf("cannot decode textual string: %s", io.ErrShortBuffer) } if buf[0] != '"' { return nil, nil, fmt.Errorf("cannot decode textual string: expected initial \"; found: %#U", buf[0]) } var newBytes []byte var escaped bool // Loop through bytes following initial double quote, but note we will // return immediately when find unescaped double quote. for i := 1; i < buflen; i++ { b := buf[i] if escaped { escaped = false if b2, ok := unescapeSpecialJSON(b); ok { newBytes = append(newBytes, b2) continue } if b == 'u' { // NOTE: Need at least 4 more bytes to read uint16, but subtract // 1 because do not want to count the trailing quote and // subtract another 1 because already consumed u but have yet to // increment i. if i > buflen-6 { return nil, nil, fmt.Errorf("cannot decode textual string: %s", io.ErrShortBuffer) } v, err := parseUint64FromHexSlice(buf[i+1 : i+5]) if err != nil { return nil, nil, fmt.Errorf("cannot decode textual string: %s", err) } i += 4 // absorb 4 characters: one 'u' and three of the digits nbl := len(newBytes) newBytes = append(newBytes, []byte{0, 0, 0, 0}...) // grow to make room for UTF-8 encoded rune r := rune(v) if utf16.IsSurrogate(r) { i++ // absorb final hexadecimal digit from previous value // Expect second half of surrogate pair if i > buflen-6 || buf[i] != '\\' || buf[i+1] != 'u' { return nil, nil, errors.New("cannot decode textual string: missing second half of surrogate pair") } v, err = parseUint64FromHexSlice(buf[i+2 : i+6]) if err != nil { return nil, nil, fmt.Errorf("cannot decode textual string: %s", err) } i += 5 // absorb 5 characters: two for '\u', and 3 of the 4 digits // Get code point by combining high and low surrogate bits r = utf16.DecodeRune(r, rune(v)) } width := utf8.EncodeRune(newBytes[nbl:], r) // append UTF-8 encoded version of code point newBytes = newBytes[:nbl+width] // trim off excess bytes continue } newBytes = append(newBytes, b) continue } if b == '\\' { escaped = true continue } if b == '"' { return string(newBytes), buf[i+1:], nil } newBytes = append(newBytes, b) } if escaped { return nil, nil, fmt.Errorf("cannot decode textual string: %s", io.ErrShortBuffer) } return nil, nil, fmt.Errorf("cannot decode textual string: expected final \"; found: %x", buf[buflen-1]) } func unescapeUnicodeString(some string) (string, error) { if some == "" { return "", nil } buf := []byte(some) buflen := len(buf) var i int var newBytes []byte var escaped bool // Loop through bytes following initial double quote, but note we will // return immediately when find unescaped double quote. for i = 0; i < buflen; i++ { b := buf[i] if escaped { escaped = false if b == 'u' { // NOTE: Need at least 4 more bytes to read uint16, but subtract // 1 because do not want to count the trailing quote and // subtract another 1 because already consumed u but have yet to // increment i. if i > buflen-6 { return "", fmt.Errorf("cannot replace escaped characters with UTF-8 equivalent: %s", io.ErrShortBuffer) } v, err := parseUint64FromHexSlice(buf[i+1 : i+5]) if err != nil { return "", fmt.Errorf("cannot replace escaped characters with UTF-8 equivalent: %s", err) } i += 4 // absorb 4 characters: one 'u' and three of the digits nbl := len(newBytes) newBytes = append(newBytes, []byte{0, 0, 0, 0}...) // grow to make room for UTF-8 encoded rune r := rune(v) if utf16.IsSurrogate(r) { i++ // absorb final hexadecimal digit from previous value // Expect second half of surrogate pair if i > buflen-6 || buf[i] != '\\' || buf[i+1] != 'u' { return "", errors.New("cannot replace escaped characters with UTF-8 equivalent: missing second half of surrogate pair") } v, err = parseUint64FromHexSlice(buf[i+2 : i+6]) if err != nil { return "", fmt.Errorf("cannot replace escaped characters with UTF-8 equivalents: %s", err) } i += 5 // absorb 5 characters: two for '\u', and 3 of the 4 digits // Get code point by combining high and low surrogate bits r = utf16.DecodeRune(r, rune(v)) } width := utf8.EncodeRune(newBytes[nbl:], r) // append UTF-8 encoded version of code point newBytes = newBytes[:nbl+width] // trim off excess bytes continue } newBytes = append(newBytes, b) continue } if b == '\\' { escaped = true continue } newBytes = append(newBytes, b) } if escaped { return "", fmt.Errorf("cannot replace escaped characters with UTF-8 equivalents: %s", io.ErrShortBuffer) } return string(newBytes), nil } func parseUint64FromHexSlice(buf []byte) (uint64, error) { var value uint64 for _, b := range buf { diff := uint64(b - '0') if diff < 10 { value = (value << 4) | diff continue } b10 := b + 10 diff = uint64(b10 - 'A') if diff < 10 { return 0, hex.InvalidByteError(b) } if diff < 16 { value = (value << 4) | diff continue } diff = uint64(b10 - 'a') if diff < 10 { return 0, hex.InvalidByteError(b) } if diff < 16 { value = (value << 4) | diff continue } return 0, hex.InvalidByteError(b) } return value, nil } func unescapeSpecialJSON(b byte) (byte, bool) { // NOTE: The following 8 special JSON characters must be escaped: switch b { case '"', '\\', '/': return b, true case 'b': return '\b', true case 'f': return '\f', true case 'n': return '\n', true case 'r': return '\r', true case 't': return '\t', true } return b, false } //////////////////////////////////////// // Text Encode //////////////////////////////////////// func bytesTextualFromNative(buf []byte, datum interface{}) ([]byte, error) { var someBytes []byte switch d := datum.(type) { case []byte: someBytes = d case string: someBytes = []byte(d) default: return nil, fmt.Errorf("cannot encode textual bytes: expected: []byte or string; received: %T", datum) } buf = append(buf, '"') // prefix buffer with double quote for _, b := range someBytes { if escaped, ok := escapeSpecialJSON(b); ok { buf = append(buf, escaped...) continue } if r := rune(b); r < utf8.RuneSelf && unicode.IsPrint(r) { buf = append(buf, b) continue } // This Code Point _could_ be encoded as a single byte, however, it's // above standard ASCII range (b > 127), therefore must encode using its // four-byte hexadecimal equivalent, which will always start with the // high byte 00 buf = appendUnicodeHex(buf, uint16(b)) } return append(buf, '"'), nil // postfix buffer with double quote } func stringTextualFromNative(buf []byte, datum interface{}) ([]byte, error) { var someString string switch d := datum.(type) { case []byte: someString = string(d) case string: someString = d default: return nil, fmt.Errorf("cannot encode textual string: expected: []byte or string; received: %T", datum) } buf = append(buf, '"') // prefix buffer with double quote for _, r := range someString { if r < utf8.RuneSelf { if escaped, ok := escapeSpecialJSON(byte(r)); ok { buf = append(buf, escaped...) continue } if unicode.IsPrint(r) { buf = append(buf, byte(r)) continue } } // NOTE: Attempt to encode code point as UTF-16 surrogate pair r1, r2 := utf16.EncodeRune(r) if r1 != unicode.ReplacementChar || r2 != unicode.ReplacementChar { // code point does require surrogate pair, and thus two uint16 values buf = appendUnicodeHex(buf, uint16(r1)) buf = appendUnicodeHex(buf, uint16(r2)) continue } // Code Point does not require surrogate pair. buf = appendUnicodeHex(buf, uint16(r)) } return append(buf, '"'), nil // postfix buffer with double quote } func appendUnicodeHex(buf []byte, v uint16) []byte { // Start with '\u' prefix: buf = append(buf, sliceUnicode...) // And tack on 4 hexadecimal digits: buf = append(buf, hexDigits[(v&0xF000)>>12]) buf = append(buf, hexDigits[(v&0xF00)>>8]) buf = append(buf, hexDigits[(v&0xF0)>>4]) buf = append(buf, hexDigits[(v&0xF)]) return buf } const hexDigits = "0123456789ABCDEF" func escapeSpecialJSON(b byte) ([]byte, bool) { // NOTE: The following 8 special JSON characters must be escaped: switch b { case '"': return sliceQuote, true case '\\': return sliceBackslash, true case '/': return sliceSlash, true case '\b': return sliceBackspace, true case '\f': return sliceFormfeed, true case '\n': return sliceNewline, true case '\r': return sliceCarriageReturn, true case '\t': return sliceTab, true } return nil, false } // While slices in Go are never constants, we can initialize them once and reuse // them many times. We define these slices at library load time and reuse them // when encoding JSON. var ( sliceQuote = []byte("\\\"") sliceBackslash = []byte("\\\\") sliceSlash = []byte("\\/") sliceBackspace = []byte("\\b") sliceFormfeed = []byte("\\f") sliceNewline = []byte("\\n") sliceCarriageReturn = []byte("\\r") sliceTab = []byte("\\t") sliceUnicode = []byte("\\u") ) // DEBUG -- remove function prior to committing func decodedStringFromJSON(buf []byte) (string, []byte, error) { fmt.Fprintf(os.Stderr, "decodedStringFromJSON(%v)\n", buf) buflen := len(buf) if buflen < 2 { return "", buf, fmt.Errorf("cannot decode string: %s", io.ErrShortBuffer) } if buf[0] != '"' { return "", buf, fmt.Errorf("cannot decode string: expected initial '\"'; found: %#U", buf[0]) } var newBytes []byte var escaped, ok bool // Loop through bytes following initial double quote, but note we will // return immediately when find unescaped double quote. for i := 1; i < buflen; i++ { b := buf[i] if escaped { escaped = false if b, ok = unescapeSpecialJSON(b); ok { newBytes = append(newBytes, b) continue } if b == 'u' { // NOTE: Need at least 4 more bytes to read uint16, but subtract // 1 because do not want to count the trailing quote and // subtract another 1 because already consumed u but have yet to // increment i. if i > buflen-6 { return "", buf[i+1:], fmt.Errorf("cannot decode string: %s", io.ErrShortBuffer) } v, err := parseUint64FromHexSlice(buf[i+1 : i+5]) if err != nil { return "", buf[i+1:], fmt.Errorf("cannot decode string: %s", err) } i += 4 // absorb 4 characters: one 'u' and three of the digits nbl := len(newBytes) newBytes = append(newBytes, 0, 0, 0, 0) // grow to make room for UTF-8 encoded rune r := rune(v) if utf16.IsSurrogate(r) { i++ // absorb final hexidecimal digit from previous value // Expect second half of surrogate pair if i > buflen-6 || buf[i] != '\\' || buf[i+1] != 'u' { return "", buf[i+1:], errors.New("cannot decode string: missing second half of surrogate pair") } v, err = parseUint64FromHexSlice(buf[i+2 : i+6]) if err != nil { return "", buf[i+1:], fmt.Errorf("cannot decode string: cannot decode second half of surrogate pair: %s", err) } i += 5 // absorb 5 characters: two for '\u', and 3 of the 4 digits // Get code point by combining high and low surrogate bits r = utf16.DecodeRune(r, rune(v)) } width := utf8.EncodeRune(newBytes[nbl:], r) // append UTF-8 encoded version of code point newBytes = newBytes[:nbl+width] // trim off excess bytes continue } newBytes = append(newBytes, b) continue } if b == '\\' { escaped = true continue } if b == '"' { return string(newBytes), buf[i+1:], nil } newBytes = append(newBytes, b) } return "", buf, fmt.Errorf("cannot decode string: expected final '\"'; found: %#U", buf[buflen-1]) } goavro-2.10.1/bytes_test.go000066400000000000000000000203361412474230400156110ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "encoding/json" "strings" "testing" ) func TestSchemaPrimitiveCodecBytes(t *testing.T) { testSchemaPrimativeCodec(t, `"bytes"`) } func TestPrimitiveBytesBinary(t *testing.T) { testBinaryEncodeFailBadDatumType(t, `"bytes"`, 13) testBinaryDecodeFailShortBuffer(t, `"bytes"`, nil) testBinaryDecodeFailShortBuffer(t, `"bytes"`, []byte{2}) testBinaryCodecPass(t, `"bytes"`, []byte(""), []byte("\x00")) testBinaryCodecPass(t, `"bytes"`, []byte("some bytes"), []byte("\x14some bytes")) } func TestPrimitiveBytesText(t *testing.T) { testTextEncodeFailBadDatumType(t, `"bytes"`, 42) testTextDecodeFailShortBuffer(t, `"bytes"`, []byte(``)) testTextDecodeFailShortBuffer(t, `"bytes"`, []byte(`"`)) testTextDecodeFail(t, `"bytes"`, []byte(`..`), "expected initial \"") testTextDecodeFail(t, `"bytes"`, []byte(`".`), "expected final \"") testTextCodecPass(t, `"bytes"`, []byte(""), []byte("\"\"")) testTextCodecPass(t, `"bytes"`, []byte("a"), []byte("\"a\"")) testTextCodecPass(t, `"bytes"`, []byte("ab"), []byte("\"ab\"")) testTextCodecPass(t, `"bytes"`, []byte("a\"b"), []byte("\"a\\\"b\"")) testTextCodecPass(t, `"bytes"`, []byte("a\\b"), []byte("\"a\\\\b\"")) testTextCodecPass(t, `"bytes"`, []byte("a/b"), []byte("\"a\\/b\"")) testTextCodecPass(t, `"bytes"`, []byte("a\bb"), []byte(`"a\bb"`)) testTextCodecPass(t, `"bytes"`, []byte("a\fb"), []byte(`"a\fb"`)) testTextCodecPass(t, `"bytes"`, []byte("a\nb"), []byte(`"a\nb"`)) testTextCodecPass(t, `"bytes"`, []byte("a\rb"), []byte(`"a\rb"`)) testTextCodecPass(t, `"bytes"`, []byte("a\tb"), []byte(`"a\tb"`)) testTextCodecPass(t, `"bytes"`, []byte("a b"), []byte(`"a\tb"`)) // tab byte between a and b testTextDecodeFail(t, `"bytes"`, []byte("\"\\u\""), "short buffer") testTextDecodeFail(t, `"bytes"`, []byte("\"\\u.\""), "short buffer") testTextDecodeFail(t, `"bytes"`, []byte("\"\\u..\""), "short buffer") testTextDecodeFail(t, `"bytes"`, []byte("\"\\u...\""), "short buffer") testTextDecodeFail(t, `"bytes"`, []byte("\"\\u////\""), "invalid byte") // < '0' testTextDecodeFail(t, `"bytes"`, []byte("\"\\u::::\""), "invalid byte") // > '9' testTextDecodeFail(t, `"bytes"`, []byte("\"\\u@@@@\""), "invalid byte") // < 'A' testTextDecodeFail(t, `"bytes"`, []byte("\"\\uGGGG\""), "invalid byte") // > 'F' testTextDecodeFail(t, `"bytes"`, []byte("\"\\u````\""), "invalid byte") // < 'a' testTextDecodeFail(t, `"bytes"`, []byte("\"\\ugggg\""), "invalid byte") // > 'f' testTextCodecPass(t, `"bytes"`, []byte("⌘ "), []byte("\"\\u0001\\u00E2\\u008C\\u0098 \"")) testTextCodecPass(t, `"bytes"`, []byte("😂"), []byte(`"\u00F0\u009F\u0098\u0082"`)) } func TestSchemaPrimitiveStringCodec(t *testing.T) { testSchemaPrimativeCodec(t, `"string"`) } func TestPrimitiveStringBinary(t *testing.T) { testBinaryEncodeFailBadDatumType(t, `"string"`, 42) testBinaryDecodeFailShortBuffer(t, `"string"`, nil) testBinaryDecodeFailShortBuffer(t, `"string"`, []byte{2}) testBinaryCodecPass(t, `"string"`, "", []byte("\x00")) testBinaryCodecPass(t, `"string"`, "some string", []byte("\x16some string")) } func TestPrimitiveStringText(t *testing.T) { testTextEncodeFailBadDatumType(t, `"string"`, 42) testTextDecodeFailShortBuffer(t, `"string"`, []byte(``)) testTextDecodeFailShortBuffer(t, `"string"`, []byte(`"`)) testTextDecodeFail(t, `"string"`, []byte(`..`), "expected initial \"") testTextDecodeFail(t, `"string"`, []byte(`".`), "expected final \"") testTextCodecPass(t, `"string"`, "", []byte("\"\"")) testTextCodecPass(t, `"string"`, "a", []byte("\"a\"")) testTextCodecPass(t, `"string"`, "ab", []byte("\"ab\"")) testTextCodecPass(t, `"string"`, "a\"b", []byte("\"a\\\"b\"")) testTextCodecPass(t, `"string"`, "a\\b", []byte("\"a\\\\b\"")) testTextCodecPass(t, `"string"`, "a/b", []byte("\"a\\/b\"")) testTextCodecPass(t, `"string"`, "a\bb", []byte(`"a\bb"`)) testTextCodecPass(t, `"string"`, "a\fb", []byte(`"a\fb"`)) testTextCodecPass(t, `"string"`, "a\nb", []byte(`"a\nb"`)) testTextCodecPass(t, `"string"`, "a\rb", []byte(`"a\rb"`)) testTextCodecPass(t, `"string"`, "a\tb", []byte(`"a\tb"`)) testTextCodecPass(t, `"string"`, "a b", []byte(`"a\tb"`)) // tab byte between a and b testTextDecodeFail(t, `"string"`, []byte("\"\\u\""), "short buffer") testTextDecodeFail(t, `"string"`, []byte("\"\\u.\""), "short buffer") testTextDecodeFail(t, `"string"`, []byte("\"\\u..\""), "short buffer") testTextDecodeFail(t, `"string"`, []byte("\"\\u...\""), "short buffer") testTextDecodeFail(t, `"string"`, []byte("\"\\u////\""), "invalid byte") // < '0' testTextDecodeFail(t, `"string"`, []byte("\"\\u::::\""), "invalid byte") // > '9' testTextDecodeFail(t, `"string"`, []byte("\"\\u@@@@\""), "invalid byte") // < 'A' testTextDecodeFail(t, `"string"`, []byte("\"\\uGGGG\""), "invalid byte") // > 'F' testTextDecodeFail(t, `"string"`, []byte("\"\\u````\""), "invalid byte") // < 'a' testTextDecodeFail(t, `"string"`, []byte("\"\\ugggg\""), "invalid byte") // > 'f' testTextCodecPass(t, `"string"`, "⌘ ", []byte("\"\\u0001\\u2318 \"")) testTextCodecPass(t, `"string"`, "™ ", []byte("\"\\u0001\\u2122 \"")) testTextCodecPass(t, `"string"`, "ℯ ", []byte("\"\\u0001\\u212F \"")) testTextCodecPass(t, `"string"`, "😂 ", []byte("\"\\u0001\\uD83D\\uDE02 \"")) testTextDecodeFail(t, `"string"`, []byte("\"\\"), "short buffer") testTextDecodeFail(t, `"string"`, []byte("\"\\uD83D\""), "surrogate pair") testTextDecodeFail(t, `"string"`, []byte("\"\\uD83D\\u\""), "surrogate pair") testTextDecodeFail(t, `"string"`, []byte("\"\\uD83D\\uD\""), "surrogate pair") testTextDecodeFail(t, `"string"`, []byte("\"\\uD83D\\uDE\""), "surrogate pair") testTextDecodeFail(t, `"string"`, []byte("\"\\uD83D\\uDE0\""), "invalid byte") } func TestUnescapeUnicode(t *testing.T) { checkGood := func(t *testing.T, argument, want string) { got, err := unescapeUnicodeString(argument) if err != nil { t.Fatal(err) } if got != want { t.Errorf("GOT: %q; WANT: %q", got, want) } } checkBad := func(t *testing.T, argument, want string) { _, got := unescapeUnicodeString(argument) if got == nil || !strings.Contains(got.Error(), want) { t.Errorf("GOT: %v; WANT: %v", got, want) } } checkBad(t, "\\u0000", "short buffer") checkBad(t, "\\uinvalid", "invalid byte") checkBad(t, "\\ud83d\\ude0", "missing second half of surrogate pair") checkBad(t, "\\ud83d\\uinvalid", "invalid byte") checkBad(t, "\\", "short buffer") checkGood(t, "", "") checkGood(t, "\\\\", "\\") checkGood(t, "\u0041\u0062\u0063", "Abc") checkGood(t, "\u0001\\uD83D\\uDE02 ", "😂 ") checkGood(t, "Hello, \u0022World!\"", "Hello, \"World!\"") checkGood(t, "\u263a\ufe0f", "☺️") checkGood(t, "\u65e5\u672c\u8a9e", "日本語") } func TestJSONUnmarshalStrings(t *testing.T) { cases := []struct { arg string want string }{ {arg: `"A1"`, want: "A1"}, {arg: `"\u0042\u0032"`, want: "B2"}, // backslashes have no meaning in back-tick string constant } for _, c := range cases { var raw interface{} if err := json.Unmarshal([]byte(c.arg), &raw); err != nil { t.Errorf("CASE: %s; ERROR: %s", c.arg, err) return } got, ok := raw.(string) if !ok { t.Errorf("CASE: %s; GOT: %T; WANT: string", c.arg, got) return } if got != c.want { t.Errorf("GOT: %s; WANT: %q", got, c.want) } } } func TestBytesCodecAcceptsString(t *testing.T) { schema := `{"type":"bytes"}` t.Run("binary", func(t *testing.T) { testBinaryEncodePass(t, schema, "abcd", []byte("\x08abcd")) }) t.Run("text", func(t *testing.T) { testTextEncodePass(t, schema, "abcd", []byte(`"abcd"`)) }) } func TestStringCodecAcceptsBytes(t *testing.T) { schema := `{"type":"string"}` t.Run("binary", func(t *testing.T) { testBinaryEncodePass(t, schema, []byte("abcd"), []byte("\x08abcd")) }) t.Run("text", func(t *testing.T) { testTextEncodePass(t, schema, []byte("abcd"), []byte(`"abcd"`)) }) } goavro-2.10.1/canonical.go000066400000000000000000000130531412474230400153510ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "fmt" "sort" "strconv" "strings" ) // pcfProcessor is a function type that given a parsed JSON object, returns its // Parsing Canonical Form according to the Avro specification. type pcfProcessor func(s interface{}) (string, error) // parsingCanonialForm returns the "Parsing Canonical Form" (pcf) for a parsed // JSON structure of a valid Avro schema, or an error describing the schema // error. func parsingCanonicalForm(schema interface{}, parentNamespace string, typeLookup map[string]string) (string, error) { switch val := schema.(type) { case map[string]interface{}: // JSON objects are decoded as a map of strings to empty interfaces return pcfObject(val, parentNamespace, typeLookup) case []interface{}: // JSON arrays are decoded as a slice of empty interfaces return pcfArray(val, parentNamespace, typeLookup) case string: // JSON string values are decoded as a Go string return pcfString(val, typeLookup) case float64: // JSON numerical values are decoded as Go float64 return pcfNumber(val) default: return "", fmt.Errorf("cannot parse schema with invalid schema type; ought to be map[string]interface{}, []interface{}, string, or float64; received: %T: %v", schema, schema) } } // pcfNumber returns the parsing canonical form for a numerical value. func pcfNumber(val float64) (string, error) { return strconv.FormatFloat(val, 'g', -1, 64), nil } // pcfString returns the parsing canonical form for a string value. func pcfString(val string, typeLookup map[string]string) (string, error) { if canonicalName, ok := typeLookup[val]; ok { return `"` + canonicalName + `"`, nil } return `"` + val + `"`, nil } // pcfArray returns the parsing canonical form for a JSON array. func pcfArray(val []interface{}, parentNamespace string, typeLookup map[string]string) (string, error) { items := make([]string, len(val)) for i, el := range val { p, err := parsingCanonicalForm(el, parentNamespace, typeLookup) if err != nil { return "", err } items[i] = p } return "[" + strings.Join(items, ",") + "]", nil } // pcfObject returns the parsing canonical form for a JSON object. func pcfObject(jsonMap map[string]interface{}, parentNamespace string, typeLookup map[string]string) (string, error) { pairs := make(stringPairs, 0, len(jsonMap)) // Remember the namespace to fully qualify names later var namespace string if namespaceJSON, ok := jsonMap["namespace"]; ok { if namespaceStr, ok := namespaceJSON.(string); ok { // and it's value is string (otherwise invalid schema) if parentNamespace == "" { namespace = namespaceStr } else { namespace = parentNamespace + "." + namespaceStr } parentNamespace = namespace } } else if objectType, ok := jsonMap["type"]; ok && objectType == "record" { namespace = parentNamespace } for k, v := range jsonMap { // Reduce primitive schemas to their simple form. if len(jsonMap) == 1 && k == "type" { if t, ok := v.(string); ok { return "\"" + t + "\"", nil } } // Only keep relevant attributes (strip 'doc', 'alias', 'namespace') if _, ok := fieldOrder[k]; !ok { continue } // Add namespace to a non-qualified name. if k == "name" && namespace != "" { // Check if the name isn't already qualified. if t, ok := v.(string); ok && !strings.ContainsRune(t, '.') { v = namespace + "." + t typeLookup[t] = v.(string) } } // Only fixed type allows size, and we must convert a string size to a // float. if k == "size" { if s, ok := v.(string); ok { s, err := strconv.ParseUint(s, 10, 0) if err != nil { // should never get here because already validated schema return "", fmt.Errorf("Fixed size ought to be number greater than zero: %v", s) } v = float64(s) } } pk, err := parsingCanonicalForm(k, parentNamespace, typeLookup) if err != nil { return "", err } pv, err := parsingCanonicalForm(v, parentNamespace, typeLookup) if err != nil { return "", err } pairs = append(pairs, stringPair{k, pk + ":" + pv}) } // Sort keys by their order in specification. sort.Sort(byAvroFieldOrder(pairs)) return "{" + strings.Join(pairs.Bs(), ",") + "}", nil } // stringPair represents a pair of string values. type stringPair struct { A string B string } // stringPairs is a sortable slice of pairs of strings. type stringPairs []stringPair // Bs returns an array of second values of an array of pairs. func (sp *stringPairs) Bs() []string { items := make([]string, len(*sp)) for i, el := range *sp { items[i] = el.B } return items } // fieldOrder defines fields that show up in canonical schema and specifies // their precedence. var fieldOrder = map[string]int{ "name": 1, "type": 2, "fields": 3, "symbols": 4, "items": 5, "values": 6, "size": 7, } // byAvroFieldOrder is equipped with a sort order of fields according to the // specification. type byAvroFieldOrder []stringPair func (s byAvroFieldOrder) Len() int { return len(s) } func (s byAvroFieldOrder) Swap(i, j int) { s[i], s[j] = s[j], s[i] } func (s byAvroFieldOrder) Less(i, j int) bool { return fieldOrder[s[i].A] < fieldOrder[s[j].A] } goavro-2.10.1/canonical_test.go000066400000000000000000000216241412474230400164130ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "testing" ) func TestCanonicalSchema(t *testing.T) { // Test cases are taken from the reference implementation here: // https://github.com/apache/avro/blob/master/share/test/data/schema-tests.txt cases := []struct { Schema string Canonical string }{ { Schema: `"null"`, Canonical: `"null"`, }, { Schema: `{"type":"null"}`, Canonical: `"null"`, }, { Schema: `"boolean"`, Canonical: `"boolean"`, }, { Schema: `{"type":"boolean"}`, Canonical: `"boolean"`, }, { Schema: `"int"`, Canonical: `"int"`, }, { Schema: `{"type":"int"}`, Canonical: `"int"`, }, { Schema: `"long"`, Canonical: `"long"`, }, { Schema: `{"type":"long"}`, Canonical: `"long"`, }, { Schema: `"float"`, Canonical: `"float"`, }, { Schema: `{"type":"float"}`, Canonical: `"float"`, }, { Schema: `"double"`, Canonical: `"double"`, }, { Schema: `{"type":"double"}`, Canonical: `"double"`, }, { Schema: `"bytes"`, Canonical: `"bytes"`, }, { Schema: `{"type":"bytes"}`, Canonical: `"bytes"`, }, { Schema: `"string"`, Canonical: `"string"`, }, { Schema: `{"type":"string"}`, Canonical: `"string"`, }, /* // Supported by the reference implementation but not by goavro at this point { Schema: "[ ]", Canonical: "[]", }, */ { Schema: `[ "int" ]`, Canonical: `["int"]`, }, { Schema: `[ "int" , {"type":"boolean"} ]`, Canonical: `["int","boolean"]`, }, // The following 7 test cases differ from the reference implementation since goavro doesn't // currently support empty fields array. A field name "dummy" is added since these tests are // testing other aspects of canonicalization than empty field array. { Schema: `{"fields":[{"name":"dummy","type":"int"}], "type":"record", "name":"foo"}`, Canonical: `{"name":"foo","type":"record","fields":[{"name":"dummy","type":"int"}]}`, }, { Schema: `{"fields":[{"name":"dummy","type":"int"}], "type":"record", "name":"foo", "namespace":"x.y"}`, Canonical: `{"name":"x.y.foo","type":"record","fields":[{"name":"dummy","type":"int"}]}`, }, { Schema: `{"fields":[{"name":"dummy","type":"int"}], "type":"record", "name":"foo", "namespace":"x.y"}`, Canonical: `{"name":"x.y.foo","type":"record","fields":[{"name":"dummy","type":"int"}]}`, }, { Schema: `{"fields":[{"name":"dummy","type":"int"}], "type":"record", "name":"a.b.foo", "namespace":"x.y"}`, Canonical: `{"name":"a.b.foo","type":"record","fields":[{"name":"dummy","type":"int"}]}`, }, { Schema: `{"fields":[{"name":"dummy","type":"int"}], "type":"record", "name":"foo", "doc":"Useful info"}`, Canonical: `{"name":"foo","type":"record","fields":[{"name":"dummy","type":"int"}]}`, }, { Schema: `{"fields":[{"name":"dummy","type":"int"}], "type":"record", "name":"foo", "aliases":["foo","bar"]}`, Canonical: `{"name":"foo","type":"record","fields":[{"name":"dummy","type":"int"}]}`, }, { Schema: `{"fields":[{"name":"dummy","type":"int"}], "type":"record", "name":"foo", "doc":"foo", "aliases":["foo","bar"]}`, Canonical: `{"name":"foo","type":"record","fields":[{"name":"dummy","type":"int"}]}`, }, { Schema: `{"fields":[{"type":{"type":"boolean"}, "name":"f1"}], "type":"record", "name":"foo"}`, Canonical: `{"name":"foo","type":"record","fields":[{"name":"f1","type":"boolean"}]}`, }, { Schema: `{"fields": [ {"type": "boolean", "aliases": [], "name": "f1", "default": true}, {"order": "descending", "name": "f2", "doc": "Hello", "type": "int"} ], "type": "record", "name": "foo"}`, Canonical: `{"name":"foo","type":"record","fields":[{"name":"f1","type":"boolean"},{"name":"f2","type":"int"}]}`, }, { Schema: `{"type":"enum", "name":"foo", "symbols":["A1"]}`, Canonical: `{"name":"foo","type":"enum","symbols":["A1"]}`, }, { Schema: `{"namespace":"x.y.z", "type":"enum", "name":"foo", "doc":"foo bar", "symbols":["A1", "A2"]}`, Canonical: `{"name":"x.y.z.foo","type":"enum","symbols":["A1","A2"]}`, }, { Schema: `{"name":"foo","type":"fixed","size":15}`, Canonical: `{"name":"foo","type":"fixed","size":15}`, }, { Schema: `{"namespace":"x.y.z", "type":"fixed", "name":"foo", "doc":"foo bar", "size":32}`, Canonical: `{"name":"x.y.z.foo","type":"fixed","size":32}`, }, { Schema: `{ "items":{"type":"null"}, "type":"array"}`, Canonical: `{"type":"array","items":"null"}`, }, { Schema: `{ "values":"string", "type":"map"}`, Canonical: `{"type":"map","values":"string"}`, }, { Schema: ` {"name":"PigValue","type":"record", "fields":[{"name":"value", "type":["null", "int", "long", "PigValue"]}]}`, Canonical: `{"name":"PigValue","type":"record","fields":[{"name":"value","type":["null","int","long","PigValue"]}]}`, }, // [INTEGERS] Eliminate quotes around and any leading zeros in front of // JSON integer literals (which appear in the size attributes of fixed // schemas). { Schema: `{"size":"15","type":"fixed","name":"foo"}`, Canonical: `{"name":"foo","type":"fixed","size":15}`, }, // [STRINGS] For all JSON string literals in the schema text, replace // any escaped characters (e.g., \uXXXX escapes) with their UTF-8 // equivalents. { // primitive Schema: `"\u0069\u006e\u0074"`, Canonical: `"int"`, }, { // primitive wrapped in JSON object Schema: `{"type":"\u0069\u006e\u0074"}`, Canonical: `"int"`, }, { // array items Schema: `{"type":"array","items":"\u0069\u006e\u0074"}`, Canonical: `{"type":"array","items":"int"}`, }, { // enum symbols Schema: `{"type":"enum","symbols":["\u0047\u006f","\u0041\u0076\u0072\u006f"],"name":"\u0046\u006f\u006f"}`, Canonical: `{"name":"Foo","type":"enum","symbols":["Go","Avro"]}`, }, { // fixed name Schema: `{"size":16,"type":"fixed","name":"\u0046\u006f\u006f"}`, Canonical: `{"name":"Foo","type":"fixed","size":16}`, }, { // map values Schema: `{"values":"\u0069\u006e\u0074","type":"map"}`, Canonical: `{"type":"map","values":"int"}`, }, { // record name Schema: `{"fields":[{"name":"hi","type":"int"}], "type":"record", "name":"\u0046\u006f\u006f", "namespace":"x.y"}`, Canonical: `{"name":"x.y.Foo","type":"record","fields":[{"name":"hi","type":"int"}]}`, }, { // record namespace Schema: `{"fields":[{"name":"hi","type":"int"}], "type":"record", "name":"Foo", "namespace":"\u0078\u002e\u0079"}`, Canonical: `{"name":"x.y.Foo","type":"record","fields":[{"name":"hi","type":"int"}]}`, }, { // record field name Schema: `{"fields":[{"name":"\u0068\u0069","type":"int"}], "type":"record", "name":"Foo", "namespace":"x.y"}`, Canonical: `{"name":"x.y.Foo","type":"record","fields":[{"name":"hi","type":"int"}]}`, }, { // record field type Schema: `{"fields":[{"name":"hi","type":"\u0069\u006e\u0074"}], "type":"record", "name":"Foo", "namespace":"x.y"}`, Canonical: `{"name":"x.y.Foo","type":"record","fields":[{"name":"hi","type":"int"}]}`, }, { // union children Schema: `["\u006e\u0075\u006c\u006c","\u0069\u006e\u0074"]`, Canonical: `["null","int"]`, }, { // propagate namespace "bar" to subtype "baz", i.e., "bar.baz" Schema: `{"type":"record","name":"foo","namespace":"bar","fields":[{"type":"record","name":"baz","fields":[{"name":"hi","type":"int"}]}]}`, Canonical: `{"name":"bar.foo","type":"record","fields":[{"name":"bar.baz","type":"record","fields":[{"name":"hi","type":"int"}]}]}`, }, { // replace the second reference to type "baz" in bye to "bar.baz" Schema: `{"type":"record","name":"foo","namespace":"bar","fields":[{"type":"record","name":"baz","fields":[{"name":"hi","type":"int"}]},{"name":"bye", "type":["null","baz"]}]}`, Canonical: `{"name":"bar.foo","type":"record","fields":[{"name":"bar.baz","type":"record","fields":[{"name":"hi","type":"int"}]},{"name":"bye","type":["null","bar.baz"]}]}`, }, } for _, c := range cases { codec, err := NewCodec(c.Schema) if err != nil { t.Errorf("Unable to create codec for schema: %s\nwith error: %s", c.Schema, err) } else { if got, want := codec.CanonicalSchema(), c.Canonical; got != want { t.Errorf("Test failed for schema: %s\n\tgot canonical:\t\t%s\n\texpected canonical:\t%s", c.Schema, got, want) } } } } goavro-2.10.1/codec.go000066400000000000000000000571261412474230400145100ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "bytes" "encoding/binary" "encoding/json" "fmt" "math" "strconv" ) var ( // MaxBlockCount is the maximum number of data items allowed in a single // block that will be decoded from a binary stream, whether when reading // blocks to decode an array or a map, or when reading blocks from an OCF // stream. This check is to ensure decoding binary data will not cause the // library to over allocate RAM, potentially creating a denial of service on // the system. // // If a particular application needs to decode binary Avro data that // potentially has more data items in a single block, then this variable may // be modified at your discretion. MaxBlockCount = int64(math.MaxInt32) // MaxBlockSize is the maximum number of bytes that will be allocated for a // single block of data items when decoding from a binary stream. This check // is to ensure decoding binary data will not cause the library to over // allocate RAM, potentially creating a denial of service on the system. // // If a particular application needs to decode binary Avro data that // potentially has more bytes in a single block, then this variable may be // modified at your discretion. MaxBlockSize = int64(math.MaxInt32) ) // Codec supports decoding binary and text Avro data to Go native data types, // and conversely encoding Go native data types to binary or text Avro data. A // Codec is created as a stateless structure that can be safely used in multiple // go routines simultaneously. type Codec struct { soeHeader []byte // single-object-encoding header schemaOriginal string schemaCanonical string typeName *name nativeFromTextual func([]byte) (interface{}, []byte, error) binaryFromNative func([]byte, interface{}) ([]byte, error) nativeFromBinary func([]byte) (interface{}, []byte, error) textualFromNative func([]byte, interface{}) ([]byte, error) Rabin uint64 } // codecBuilder holds the 3 kinds of codec builders so they can be // replaced if needed // and so they can be passed down the call stack during codec building type codecBuilder struct { mapBuilder func(st map[string]*Codec, enclosingNamespace string, schemaMap map[string]interface{}, cb *codecBuilder) (*Codec, error) stringBuilder func(st map[string]*Codec, enclosingNamespace string, typeName string, schemaMap map[string]interface{}, cb *codecBuilder) (*Codec, error) sliceBuilder func(st map[string]*Codec, enclosingNamespace string, schemaArray []interface{}, cb *codecBuilder) (*Codec, error) } // NewCodec returns a Codec used to translate between a byte slice of either // binary or textual Avro data and native Go data. // // Creating a `Codec` is fast, but ought to be performed exactly once per Avro // schema to process. Once a `Codec` is created, it may be used multiple times // to convert data between native form and binary Avro representation, or // between native form and textual Avro representation. // // A particular `Codec` can work with only one Avro schema. However, // there is no practical limit to how many `Codec`s may be created and // used in a program. Internally a `Codec` is merely a named tuple of // four function pointers, and maintains no runtime state that is mutated // after instantiation. In other words, `Codec`s may be safely used by // many go routines simultaneously, as your program requires. // // codec, err := goavro.NewCodec(` // { // "type": "record", // "name": "LongList", // "fields" : [ // {"name": "next", "type": ["null", "LongList"], "default": null} // ] // }`) // if err != nil { // fmt.Println(err) // } func NewCodec(schemaSpecification string) (*Codec, error) { return NewCodecFrom(schemaSpecification, &codecBuilder{ buildCodecForTypeDescribedByMap, buildCodecForTypeDescribedByString, buildCodecForTypeDescribedBySlice, }) } func NewCodecForStandardJSON(schemaSpecification string) (*Codec, error) { return NewCodecFrom(schemaSpecification, &codecBuilder{ buildCodecForTypeDescribedByMap, buildCodecForTypeDescribedByString, buildCodecForTypeDescribedBySliceJSON, }) } func NewCodecFrom(schemaSpecification string, cb *codecBuilder) (*Codec, error) { var schema interface{} if err := json.Unmarshal([]byte(schemaSpecification), &schema); err != nil { return nil, fmt.Errorf("cannot unmarshal schema JSON: %s", err) } // bootstrap a symbol table with primitive type codecs for the new codec st := newSymbolTable() c, err := buildCodec(st, nullNamespace, schema, cb) if err != nil { return nil, err } c.schemaCanonical, err = parsingCanonicalForm(schema, "", make(map[string]string)) if err != nil { return nil, err // should not get here because schema was validated above } c.Rabin = rabin([]byte(c.schemaCanonical)) c.soeHeader = []byte{0xC3, 0x01, 0, 0, 0, 0, 0, 0, 0, 0} binary.LittleEndian.PutUint64(c.soeHeader[2:], c.Rabin) c.schemaOriginal = schemaSpecification return c, nil } func newSymbolTable() map[string]*Codec { return map[string]*Codec{ "boolean": { typeName: &name{"boolean", nullNamespace}, schemaOriginal: "boolean", schemaCanonical: "boolean", binaryFromNative: booleanBinaryFromNative, nativeFromBinary: booleanNativeFromBinary, nativeFromTextual: booleanNativeFromTextual, textualFromNative: booleanTextualFromNative, }, "bytes": { typeName: &name{"bytes", nullNamespace}, schemaOriginal: "bytes", schemaCanonical: "bytes", binaryFromNative: bytesBinaryFromNative, nativeFromBinary: bytesNativeFromBinary, nativeFromTextual: bytesNativeFromTextual, textualFromNative: bytesTextualFromNative, }, "double": { typeName: &name{"double", nullNamespace}, schemaOriginal: "double", schemaCanonical: "double", binaryFromNative: doubleBinaryFromNative, nativeFromBinary: doubleNativeFromBinary, nativeFromTextual: doubleNativeFromTextual, textualFromNative: doubleTextualFromNative, }, "float": { typeName: &name{"float", nullNamespace}, schemaOriginal: "float", schemaCanonical: "float", binaryFromNative: floatBinaryFromNative, nativeFromBinary: floatNativeFromBinary, nativeFromTextual: floatNativeFromTextual, textualFromNative: floatTextualFromNative, }, "int": { typeName: &name{"int", nullNamespace}, schemaOriginal: "int", schemaCanonical: "int", binaryFromNative: intBinaryFromNative, nativeFromBinary: intNativeFromBinary, nativeFromTextual: intNativeFromTextual, textualFromNative: intTextualFromNative, }, "long": { typeName: &name{"long", nullNamespace}, schemaOriginal: "long", schemaCanonical: "long", binaryFromNative: longBinaryFromNative, nativeFromBinary: longNativeFromBinary, nativeFromTextual: longNativeFromTextual, textualFromNative: longTextualFromNative, }, "null": { typeName: &name{"null", nullNamespace}, schemaOriginal: "null", schemaCanonical: "null", binaryFromNative: nullBinaryFromNative, nativeFromBinary: nullNativeFromBinary, nativeFromTextual: nullNativeFromTextual, textualFromNative: nullTextualFromNative, }, "string": { typeName: &name{"string", nullNamespace}, schemaOriginal: "string", schemaCanonical: "string", binaryFromNative: stringBinaryFromNative, nativeFromBinary: stringNativeFromBinary, nativeFromTextual: stringNativeFromTextual, textualFromNative: stringTextualFromNative, }, // Start of compiled logical types using format typeName.logicalType where there is // no dependence on schema. "long.timestamp-millis": { typeName: &name{"long.timestamp-millis", nullNamespace}, schemaOriginal: "long", schemaCanonical: "long", nativeFromTextual: nativeFromTimeStampMillis(longNativeFromTextual), binaryFromNative: timeStampMillisFromNative(longBinaryFromNative), nativeFromBinary: nativeFromTimeStampMillis(longNativeFromBinary), textualFromNative: timeStampMillisFromNative(longTextualFromNative), }, "long.timestamp-micros": { typeName: &name{"long.timestamp-micros", nullNamespace}, schemaOriginal: "long", schemaCanonical: "long", nativeFromTextual: nativeFromTimeStampMicros(longNativeFromTextual), binaryFromNative: timeStampMicrosFromNative(longBinaryFromNative), nativeFromBinary: nativeFromTimeStampMicros(longNativeFromBinary), textualFromNative: timeStampMicrosFromNative(longTextualFromNative), }, "int.time-millis": { typeName: &name{"int.time-millis", nullNamespace}, schemaOriginal: "int", schemaCanonical: "int", nativeFromTextual: nativeFromTimeMillis(intNativeFromTextual), binaryFromNative: timeMillisFromNative(intBinaryFromNative), nativeFromBinary: nativeFromTimeMillis(intNativeFromBinary), textualFromNative: timeMillisFromNative(intTextualFromNative), }, "long.time-micros": { typeName: &name{"long.time-micros", nullNamespace}, schemaOriginal: "long", schemaCanonical: "long", nativeFromTextual: nativeFromTimeMicros(longNativeFromTextual), binaryFromNative: timeMicrosFromNative(longBinaryFromNative), nativeFromBinary: nativeFromTimeMicros(longNativeFromBinary), textualFromNative: timeMicrosFromNative(longTextualFromNative), }, "int.date": { typeName: &name{"int.date", nullNamespace}, schemaOriginal: "int", schemaCanonical: "int", nativeFromTextual: nativeFromDate(intNativeFromTextual), binaryFromNative: dateFromNative(intBinaryFromNative), nativeFromBinary: nativeFromDate(intNativeFromBinary), textualFromNative: dateFromNative(intTextualFromNative), }, } } // BinaryFromNative appends the binary encoded byte slice representation of the // provided native datum value to the provided byte slice in accordance with the // Avro schema supplied when creating the Codec. It is supplied a byte slice to // which to append the binary encoded data along with the actual data to encode. // On success, it returns a new byte slice with the encoded bytes appended, and // a nil error value. On error, it returns the original byte slice, and the // error message. // // func ExampleBinaryFromNative() { // codec, err := goavro.NewCodec(` // { // "type": "record", // "name": "LongList", // "fields" : [ // {"name": "next", "type": ["null", "LongList"], "default": null} // ] // }`) // if err != nil { // fmt.Println(err) // } // // // Convert native Go form to binary Avro data // binary, err := codec.BinaryFromNative(nil, map[string]interface{}{ // "next": map[string]interface{}{ // "LongList": map[string]interface{}{ // "next": map[string]interface{}{ // "LongList": map[string]interface{}{ // // NOTE: May omit fields when using default value // }, // }, // }, // }, // }) // if err != nil { // fmt.Println(err) // } // // fmt.Printf("%#v", binary) // // Output: []byte{0x2, 0x2, 0x0} // } func (c *Codec) BinaryFromNative(buf []byte, datum interface{}) ([]byte, error) { newBuf, err := c.binaryFromNative(buf, datum) if err != nil { return buf, err // if error, return original byte slice } return newBuf, nil } // NativeFromBinary returns a native datum value from the binary encoded byte // slice in accordance with the Avro schema supplied when creating the Codec. On // success, it returns the decoded datum, a byte slice containing the remaining // undecoded bytes, and a nil error value. On error, it returns nil for // the datum value, the original byte slice, and the error message. // // func ExampleNativeFromBinary() { // codec, err := goavro.NewCodec(` // { // "type": "record", // "name": "LongList", // "fields" : [ // {"name": "next", "type": ["null", "LongList"], "default": null} // ] // }`) // if err != nil { // fmt.Println(err) // } // // // Convert native Go form to binary Avro data // binary := []byte{0x2, 0x2, 0x0} // // native, _, err := codec.NativeFromBinary(binary) // if err != nil { // fmt.Println(err) // } // // fmt.Printf("%v", native) // // Output: map[next:map[LongList:map[next:map[LongList:map[next:]]]]] // } func (c *Codec) NativeFromBinary(buf []byte) (interface{}, []byte, error) { value, newBuf, err := c.nativeFromBinary(buf) if err != nil { return nil, buf, err // if error, return original byte slice } return value, newBuf, nil } // NativeFromSingle converts Avro data from Single-Object-Encoded format from // the provided byte slice to Go native data types in accordance with the Avro // schema supplied when creating the Codec. On success, it returns the decoded // datum, along with a new byte slice with the decoded bytes consumed, and a nil // error value. On error, it returns nil for the datum value, the original byte // slice, and the error message. // // func decode(codec *goavro.Codec, buf []byte) error { // datum, _, err := codec.NativeFromSingle(buf) // if err != nil { // return err // } // _, err = fmt.Println(datum) // return err // } func (c *Codec) NativeFromSingle(buf []byte) (interface{}, []byte, error) { fingerprint, newBuf, err := FingerprintFromSOE(buf) if err != nil { return nil, buf, err } if !bytes.Equal(buf[:len(c.soeHeader)], c.soeHeader) { return nil, buf, ErrWrongCodec(fingerprint) } value, newBuf, err := c.nativeFromBinary(newBuf) if err != nil { return nil, buf, err // if error, return original byte slice } return value, newBuf, nil } // NativeFromTextual converts Avro data in JSON text format from the provided byte // slice to Go native data types in accordance with the Avro schema supplied // when creating the Codec. On success, it returns the decoded datum, along with // a new byte slice with the decoded bytes consumed, and a nil error value. On // error, it returns nil for the datum value, the original byte slice, and the // error message. // // func ExampleNativeFromTextual() { // codec, err := goavro.NewCodec(` // { // "type": "record", // "name": "LongList", // "fields" : [ // {"name": "next", "type": ["null", "LongList"], "default": null} // ] // }`) // if err != nil { // fmt.Println(err) // } // // // Convert native Go form to text Avro data // text := []byte(`{"next":{"LongList":{"next":{"LongList":{"next":null}}}}}`) // // native, _, err := codec.NativeFromTextual(text) // if err != nil { // fmt.Println(err) // } // // fmt.Printf("%v", native) // // Output: map[next:map[LongList:map[next:map[LongList:map[next:]]]]] // } func (c *Codec) NativeFromTextual(buf []byte) (interface{}, []byte, error) { value, newBuf, err := c.nativeFromTextual(buf) if err != nil { return nil, buf, err // if error, return original byte slice } return value, newBuf, nil } // SingleFromNative appends the single-object-encoding byte slice representation // of the provided native datum value to the provided byte slice in accordance // with the Avro schema supplied when creating the Codec. It is supplied a byte // slice to which to append the header and binary encoded data, along with the // actual data to encode. On success, it returns a new byte slice with the // encoded bytes appended, and a nil error value. On error, it returns the // original byte slice, and the error message. // // func ExampleSingleItemEncoding() { // codec, err := goavro.NewCodec(`"int"`) // if err != nil { // fmt.Fprintf(os.Stderr, "%s\n", err) // return // } // // buf, err := codec.SingleFromNative(nil, 3) // if err != nil { // fmt.Fprintf(os.Stderr, "%s\n", err) // return // } // // fmt.Println(buf) // // Output: [195 1 143 92 57 63 26 213 117 114 6] // } func (c *Codec) SingleFromNative(buf []byte, datum interface{}) ([]byte, error) { newBuf, err := c.binaryFromNative(append(buf, c.soeHeader...), datum) if err != nil { return buf, err } return newBuf, nil } // TextualFromNative converts Go native data types to Avro data in JSON text format in // accordance with the Avro schema supplied when creating the Codec. It is // supplied a byte slice to which to append the encoded data and the actual data // to encode. On success, it returns a new byte slice with the encoded bytes // appended, and a nil error value. On error, it returns the original byte // slice, and the error message. // // func ExampleTextualFromNative() { // codec, err := goavro.NewCodec(` // { // "type": "record", // "name": "LongList", // "fields" : [ // {"name": "next", "type": ["null", "LongList"], "default": null} // ] // }`) // if err != nil { // fmt.Println(err) // } // // // Convert native Go form to text Avro data // text, err := codec.TextualFromNative(nil, map[string]interface{}{ // "next": map[string]interface{}{ // "LongList": map[string]interface{}{ // "next": map[string]interface{}{ // "LongList": map[string]interface{}{ // // NOTE: May omit fields when using default value // }, // }, // }, // }, // }) // if err != nil { // fmt.Println(err) // } // // fmt.Printf("%s", text) // // Output: {"next":{"LongList":{"next":{"LongList":{"next":null}}}}} // } func (c *Codec) TextualFromNative(buf []byte, datum interface{}) ([]byte, error) { newBuf, err := c.textualFromNative(buf, datum) if err != nil { return buf, err // if error, return original byte slice } return newBuf, nil } // Schema returns the original schema used to create the Codec. func (c *Codec) Schema() string { return c.schemaOriginal } // CanonicalSchema returns the Parsing Canonical Form of the schema according to // the Avro specification. func (c *Codec) CanonicalSchema() string { return c.schemaCanonical } // SchemaCRC64Avro returns a signed 64-bit integer Rabin fingerprint for the // canonical schema. This method returns the signed 64-bit cast of the unsigned // 64-bit schema Rabin fingerprint. // // DEPRECATED: This method has been replaced by the Rabin structure Codec field // and is provided for backward compatibility only. func (c *Codec) SchemaCRC64Avro() int64 { return int64(c.Rabin) } // convert a schema data structure to a codec, prefixing with specified // namespace func buildCodec(st map[string]*Codec, enclosingNamespace string, schema interface{}, cb *codecBuilder) (*Codec, error) { switch schemaType := schema.(type) { case map[string]interface{}: return cb.mapBuilder(st, enclosingNamespace, schemaType, cb) case string: return cb.stringBuilder(st, enclosingNamespace, schemaType, nil, cb) case []interface{}: return cb.sliceBuilder(st, enclosingNamespace, schemaType, cb) default: return nil, fmt.Errorf("unknown schema type: %T", schema) } } // Reach into the map, grabbing its "type". Use that to create the codec. func buildCodecForTypeDescribedByMap(st map[string]*Codec, enclosingNamespace string, schemaMap map[string]interface{}, cb *codecBuilder) (*Codec, error) { t, ok := schemaMap["type"] if !ok { return nil, fmt.Errorf("missing type: %v", schemaMap) } switch v := t.(type) { case string: // Already defined types may be abbreviated with its string name. // EXAMPLE: "type":"array" // EXAMPLE: "type":"enum" // EXAMPLE: "type":"fixed" // EXAMPLE: "type":"int" // EXAMPLE: "type":"record" // EXAMPLE: "type":"somePreviouslyDefinedCustomTypeString" return cb.stringBuilder(st, enclosingNamespace, v, schemaMap, cb) case map[string]interface{}: return cb.mapBuilder(st, enclosingNamespace, v, cb) case []interface{}: return cb.sliceBuilder(st, enclosingNamespace, v, cb) default: return nil, fmt.Errorf("type ought to be either string, map[string]interface{}, or []interface{}; received: %T", t) } } func buildCodecForTypeDescribedByString(st map[string]*Codec, enclosingNamespace string, typeName string, schemaMap map[string]interface{}, cb *codecBuilder) (*Codec, error) { isLogicalType := false searchType := typeName // logicalType will be non-nil for those fields without a logicalType property set if lt := schemaMap["logicalType"]; lt != nil { isLogicalType = true searchType = fmt.Sprintf("%s.%s", typeName, lt) } // NOTE: When codec already exists, return it. This includes both primitive and // logicalType codecs added in NewCodec, and user-defined types, added while // building the codec. if cd, ok := st[searchType]; ok { return cd, nil } // Avro specification allows abbreviation of type name inside a namespace. if enclosingNamespace != "" { if cd, ok := st[enclosingNamespace+"."+typeName]; ok { return cd, nil } } // There are only a small handful of complex Avro data types. switch searchType { case "array": return makeArrayCodec(st, enclosingNamespace, schemaMap, cb) case "enum": return makeEnumCodec(st, enclosingNamespace, schemaMap) case "fixed": return makeFixedCodec(st, enclosingNamespace, schemaMap) case "map": return makeMapCodec(st, enclosingNamespace, schemaMap, cb) case "record": return makeRecordCodec(st, enclosingNamespace, schemaMap, cb) case "bytes.decimal": return makeDecimalBytesCodec(st, enclosingNamespace, schemaMap) case "fixed.decimal": return makeDecimalFixedCodec(st, enclosingNamespace, schemaMap) default: if isLogicalType { delete(schemaMap, "logicalType") return buildCodecForTypeDescribedByString(st, enclosingNamespace, typeName, schemaMap, cb) } return nil, fmt.Errorf("unknown type name: %q", searchType) } } // notion of enclosing namespace changes when record, enum, or fixed create a // new namespace, for child objects. func registerNewCodec(st map[string]*Codec, schemaMap map[string]interface{}, enclosingNamespace string) (*Codec, error) { n, err := newNameFromSchemaMap(enclosingNamespace, schemaMap) if err != nil { return nil, err } c := &Codec{typeName: n} st[n.fullName] = c return c, nil } // ErrWrongCodec is returned when an attempt is made to decode a single-object // encoded value using the wrong codec. type ErrWrongCodec uint64 func (e ErrWrongCodec) Error() string { return "wrong codec: " + strconv.FormatUint(uint64(e), 10) } // ErrNotSingleObjectEncoded is returned when an attempt is made to decode a // single-object encoded value from a buffer that does not have the correct // magic prefix. type ErrNotSingleObjectEncoded string func (e ErrNotSingleObjectEncoded) Error() string { return "cannot decode buffer as single-object encoding: " + string(e) } goavro-2.10.1/codec_test.go000066400000000000000000000173471412474230400155500ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "bytes" "fmt" "os" "testing" ) func ExampleCodecCanonicalSchema() { schema := `{"type":"map","values":{"type":"enum","name":"foo","symbols":["alpha","bravo"]}}` codec, err := NewCodec(schema) if err != nil { fmt.Println(err) } else { fmt.Println(codec.CanonicalSchema()) } // Output: {"type":"map","values":{"name":"foo","type":"enum","symbols":["alpha","bravo"]}} } func TestCodecRabin(t *testing.T) { cases := []struct { Schema string Rabin uint64 }{ { Schema: `"null"`, Rabin: 0x63dd24e7cc258f8a, }, { Schema: `"boolean"`, Rabin: 0x9f42fc78a4d4f764, }, { Schema: `"int"`, Rabin: 0x7275d51a3f395c8f, }, { Schema: `"long"`, Rabin: 0xd054e14493f41db7, }, { Schema: `"float"`, Rabin: 0x4d7c02cb3ea8d790, }, { Schema: `"double"`, Rabin: 0x8e7535c032ab957e, }, { Schema: `"bytes"`, Rabin: 0x4fc016dac3201965, }, { Schema: `"string"`, Rabin: 0x8f014872634503c7, }, { Schema: `[ "int" ]`, Rabin: 0xb763638a48b2fb03, }, { Schema: `[ "int" , {"type":"boolean"} ]`, Rabin: 0x4ad63578080c1602, }, { Schema: `{"fields":[], "type":"record", "name":"foo"}`, Rabin: 0xbd0c50c84319be7e, }, { Schema: `{"fields":[], "type":"record", "name":"foo", "namespace":"x.y"}`, Rabin: 0x521d1a6b830ec4ab, }, { Schema: `{"fields":[], "type":"record", "name":"a.b.foo", "namespace":"x.y"}`, Rabin: 0xbfefe5be5021e2b2, }, { Schema: `{"fields":[], "type":"record", "name":"foo", "doc":"Useful info"}`, Rabin: 0xbd0c50c84319be7e, }, { Schema: `{"fields":[], "type":"record", "name":"foo", "aliases":["foo","bar"]}`, Rabin: 0xbd0c50c84319be7e, }, { Schema: `{"fields":[], "type":"record", "name":"foo", "doc":"foo", "aliases":["foo","bar"]}`, Rabin: 0xbd0c50c84319be7e, }, { Schema: `{"fields":[{"type":{"type":"boolean"}, "name":"f1"}], "type":"record", "name":"foo"}`, Rabin: 0x6cd8eaf1c968a33b, }, { Schema: `{ "fields":[{"type":"boolean", "aliases":[], "name":"f1", "default":true}, {"order":"descending","name":"f2","doc":"Hello","type":"int"}], "type":"record", "name":"foo"}`, Rabin: 0xbc8d05bd57f4934a, }, { Schema: `{"type":"enum", "name":"foo", "symbols":["A1"]}`, Rabin: 0xa7fc039e15aa3169, }, { Schema: `{"namespace":"x.y.z", "type":"enum", "name":"foo", "doc":"foo bar", "symbols":["A1", "A2"]}`, Rabin: 0xc2433ae5f4999d8b, }, { Schema: `{"name":"foo","type":"fixed","size":15}`, Rabin: 0x18602ec3ed31a504, }, { Schema: `{"namespace":"x.y.z", "type":"fixed", "name":"foo", "doc":"foo bar", "size":32}`, Rabin: 0xd579d47693a6171e, }, { Schema: `{ "items":{"type":"null"}, "type":"array"}`, Rabin: 0xf7d13f2f68170a6d, }, { Schema: `{ "values":"string", "type":"map"}`, Rabin: 0x86ce965d92864572, }, { Schema: `{"name":"PigValue","type":"record", "fields":[{"name":"value", "type":["null", "int", "long", "PigValue"]}]}`, Rabin: 0xe795dc6656b7e95b, }, } for _, c := range cases { codec, err := NewCodec(c.Schema) if err != nil { t.Fatalf("CASE: %s; cannot create code: %s", c.Schema, err) } if got, want := codec.Rabin, c.Rabin; got != want { t.Errorf("CASE: %s; GOT: %#x; WANT: %#x", c.Schema, got, want) } } } func TestSingleObjectEncoding(t *testing.T) { t.Run("int", func(*testing.T) { schema := `"int"` codec, err := NewCodec(schema) if err != nil { t.Fatalf("cannot create code: %s", err) } t.Run("encoding", func(t *testing.T) { t.Run("does not modify source buf when cannot encode", func(t *testing.T) { buf := []byte{0xDE, 0xAD, 0xBE, 0xEF} buf, err = codec.SingleFromNative(buf, "strings cannot be encoded as int") ensureError(t, err, "cannot encode binary int") if got, want := buf, []byte("\xDE\xAD\xBE\xEF"); !bytes.Equal(got, want) { t.Errorf("GOT: %v; WANT: %v", got, want) } }) t.Run("appends header then encoded data", func(t *testing.T) { const original = "\x01\x02\x03\x04" buf := []byte(original) buf, err = codec.SingleFromNative(buf, 3) ensureError(t, err) fp := "\xC3\x01" + "\x8F\x5C\x39\x3F\x1A\xD5\x75\x72" if got, want := buf, []byte(original+fp+"\x06"); !bytes.Equal(got, want) { t.Errorf("\nGOT:\n\t%v;\nWANT:\n\t%v", got, want) } }) }) t.Run("decoding", func(t *testing.T) { const original = "" buf := []byte(original) buf, err = codec.SingleFromNative(nil, 3) ensureError(t, err) buf = append(buf, "\xDE\xAD"...) // append some junk datum, newBuf, err := codec.NativeFromSingle(buf) ensureError(t, err) if got, want := datum, int32(3); got != want { t.Errorf("GOT: %v; WANT: %v", got, want) } // ensure junk is not disturbed if got, want := newBuf, []byte("\xDE\xAD"); !bytes.Equal(got, want) { t.Errorf("\nGOT:\n\t%q;\nWANT:\n\t%q", got, want) } }) }) t.Run("record round trip", func(t *testing.T) { codec, err := NewCodec(` { "type": "record", "name": "LongList", "fields" : [ {"name": "next", "type": ["null", "LongList"], "default": null} ] } `) ensureError(t, err) // NOTE: May omit fields when using default value initial := `{"next":{"LongList":{}}}` // NOTE: Textual encoding will show all fields, even those with values that // match their default values final := `{"next":{"LongList":{"next":null}}}` // Convert textual Avro data (in Avro JSON format) to native Go form datum, _, err := codec.NativeFromTextual([]byte(initial)) ensureError(t, err) // Convert native Go form to single-object encoding form buf, err := codec.SingleFromNative(nil, datum) ensureError(t, err) // Convert single-object encoding form back to native Go form datum, _, err = codec.NativeFromSingle(buf) ensureError(t, err) // Convert native Go form to textual Avro data buf, err = codec.TextualFromNative(nil, datum) ensureError(t, err) if got, want := string(buf), final; got != want { t.Fatalf("GOT: %v; WANT: %v", got, want) } }) } func ExampleSingleItemEncoding() { codec, err := NewCodec(`"int"`) if err != nil { fmt.Fprintf(os.Stderr, "%s\n", err) return } buf, err := codec.SingleFromNative(nil, 3) if err != nil { fmt.Fprintf(os.Stderr, "%s\n", err) return } fmt.Println(buf) // Output: [195 1 143 92 57 63 26 213 117 114 6] } func ExampleSingleItemDecoding() { codec1, err := NewCodec(`"int"`) if err != nil { fmt.Fprintf(os.Stderr, "%s\n", err) return } // Create a map of fingerprint values to corresponding Codec instances. codex := make(map[uint64]*Codec) codex[codec1.Rabin] = codec1 // Later on when you want to decode such a slice of bytes as a Single-Object // Encoding, obtain the Rabin fingerprint of the schema used to encode the // data. buf := []byte{195, 1, 143, 92, 57, 63, 26, 213, 117, 114, 6} fingerprint, newBuf, err := FingerprintFromSOE(buf) if err != nil { fmt.Fprintf(os.Stderr, "%s\n", err) return } // Get a previously stored Codec from the codex map. codec2, ok := codex[fingerprint] if !ok { fmt.Fprintf(os.Stderr, "unknown codec: %d\n", fingerprint) return } // Use the fetched Codec to decode the buffer as a SOE. datum, _, err := codec2.NativeFromBinary(newBuf) if err != nil { fmt.Fprintf(os.Stderr, "%s\n", err) return } fmt.Println(datum) // Output: 3 } goavro-2.10.1/debug_development.go000066400000000000000000000003571412474230400171150ustar00rootroot00000000000000// +build goavro_debug package goavro import ( "fmt" "os" ) // debug formats and prints arguments to stderr for development builds func debug(f string, a ...interface{}) { os.Stderr.Write([]byte("goavro: " + fmt.Sprintf(f, a...))) } goavro-2.10.1/debug_release.go000066400000000000000000000002661412474230400162120ustar00rootroot00000000000000// +build !goavro_debug package goavro // debug is a no-op for release builds, and the function call is optimized out // by the compiler. func debug(_ string, _ ...interface{}) {} goavro-2.10.1/doc.go000066400000000000000000000037001412474230400141650ustar00rootroot00000000000000/* Package goavro is a library that encodes and decodes Avro data. Goavro provides methods to encode native Go data into both binary and textual JSON Avro data, and methods to decode both binary and textual JSON Avro data to native Go data. Goavro also provides methods to read and write Object Container File (OCF) formatted files, and the library contains example programs to read and write OCF files. Usage Example: package main import ( "fmt" "github.com/linkedin/goavro" ) func main() { codec, err := goavro.NewCodec(` { "type": "record", "name": "LongList", "fields" : [ {"name": "next", "type": ["null", "LongList", {"type": "long", "logicalType": "timestamp-millis"}], "default": null} ] }`) if err != nil { fmt.Println(err) } // NOTE: May omit fields when using default value textual := []byte(`{"next":{"LongList":{}}}`) // Convert textual Avro data (in Avro JSON format) to native Go form native, _, err := codec.NativeFromTextual(textual) if err != nil { fmt.Println(err) } // Convert native Go form to binary Avro data binary, err := codec.BinaryFromNative(nil, native) if err != nil { fmt.Println(err) } // Convert binary Avro data back to native Go form native, _, err = codec.NativeFromBinary(binary) if err != nil { fmt.Println(err) } // Convert native Go form to textual Avro data textual, err = codec.TextualFromNative(nil, native) if err != nil { fmt.Println(err) } // NOTE: Textual encoding will show all fields, even those with values that // match their default values fmt.Println(string(textual)) // Output: {"next":{"LongList":{"next":null}}} } */ package goavro goavro-2.10.1/ensure_test.go000066400000000000000000000037051412474230400157650ustar00rootroot00000000000000package goavro // NOTE: This file was copied from https://github.com/karrick/gorill import ( "fmt" "strings" "testing" ) func ensureBuffer(tb testing.TB, buf []byte, n int, want string) { tb.Helper() if got, want := n, len(want); got != want { tb.Fatalf("GOT: %v; WANT: %v", got, want) } if got, want := string(buf[:n]), want; got != want { tb.Errorf("GOT: %v; WANT: %v", got, want) } } func ensureError(tb testing.TB, err error, contains ...string) { tb.Helper() if len(contains) == 0 || (len(contains) == 1 && contains[0] == "") { if err != nil { tb.Fatalf("GOT: %v; WANT: %v", err, contains) } } else if err == nil { tb.Errorf("GOT: %v; WANT: %v", err, contains) } else { for _, stub := range contains { if stub != "" && !strings.Contains(err.Error(), stub) { tb.Errorf("GOT: %v; WANT: %q", err, stub) } } } } func ensurePanic(tb testing.TB, want string, callback func()) { tb.Helper() defer func() { r := recover() if r == nil { tb.Fatalf("GOT: %v; WANT: %v", r, want) return } if got := fmt.Sprintf("%v", r); got != want { tb.Fatalf("GOT: %v; WANT: %v", got, want) } }() callback() } // ensureNoPanic prettifies the output so one knows which test case caused a // panic. func ensureNoPanic(tb testing.TB, label string, callback func()) { tb.Helper() defer func() { if r := recover(); r != nil { tb.Fatalf("TEST: %s: GOT: %v", label, r) } }() callback() } func ensureStringSlicesMatch(tb testing.TB, actual, expected []string) { tb.Helper() if got, want := len(actual), len(expected); got != want { tb.Errorf("GOT: %v; WANT: %v", got, want) } la := len(actual) le := len(expected) for i := 0; i < la || i < le; i++ { if i < la { if i < le { if got, want := actual[i], expected[i]; got != want { tb.Errorf("GOT: %q; WANT: %q", got, want) } } else { tb.Errorf("GOT: %q (extra)", actual[i]) } } else if i < le { tb.Errorf("WANT: %q (missing)", expected[i]) } } } goavro-2.10.1/enum.go000066400000000000000000000074411412474230400143720ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "fmt" "io" ) // enum does not have child objects, therefore whatever namespace it defines is // just to store its name in the symbol table. func makeEnumCodec(st map[string]*Codec, enclosingNamespace string, schemaMap map[string]interface{}) (*Codec, error) { c, err := registerNewCodec(st, schemaMap, enclosingNamespace) if err != nil { return nil, fmt.Errorf("Enum ought to have valid name: %s", err) } // enum type must have symbols s1, ok := schemaMap["symbols"] if !ok { return nil, fmt.Errorf("Enum %q ought to have symbols key", c.typeName) } s2, ok := s1.([]interface{}) if !ok || len(s2) == 0 { return nil, fmt.Errorf("Enum %q symbols ought to be non-empty array of strings: %v", c.typeName, s1) } symbols := make([]string, len(s2)) for i, s := range s2 { symbol, ok := s.(string) if !ok { return nil, fmt.Errorf("Enum %q symbol %d ought to be non-empty string; received: %T", c.typeName, i+1, symbol) } if err := checkString(symbol); err != nil { return nil, fmt.Errorf("Enum %q symbol %d ought to %s", c.typeName, i+1, err) } symbols[i] = symbol } c.nativeFromBinary = func(buf []byte) (interface{}, []byte, error) { var value interface{} var err error var index int64 if value, buf, err = longNativeFromBinary(buf); err != nil { return nil, nil, fmt.Errorf("cannot decode binary enum %q index: %s", c.typeName, err) } index = value.(int64) if index < 0 || index >= int64(len(symbols)) { return nil, nil, fmt.Errorf("cannot decode binary enum %q: index ought to be between 0 and %d; read index: %d", c.typeName, len(symbols)-1, index) } return symbols[index], buf, nil } c.binaryFromNative = func(buf []byte, datum interface{}) ([]byte, error) { someString, ok := datum.(string) if !ok { return nil, fmt.Errorf("cannot encode binary enum %q: expected string; received: %T", c.typeName, datum) } for i, symbol := range symbols { if symbol == someString { return longBinaryFromNative(buf, i) } } return nil, fmt.Errorf("cannot encode binary enum %q: value ought to be member of symbols: %v; %q", c.typeName, symbols, someString) } c.nativeFromTextual = func(buf []byte) (interface{}, []byte, error) { if buf, _ = advanceToNonWhitespace(buf); len(buf) == 0 { return nil, nil, fmt.Errorf("cannot decode textual enum: %s", io.ErrShortBuffer) } // decode enum string var value interface{} var err error value, buf, err = stringNativeFromTextual(buf) if err != nil { return nil, nil, fmt.Errorf("cannot decode textual enum: expected key: %s", err) } someString := value.(string) for _, symbol := range symbols { if symbol == someString { return someString, buf, nil } } return nil, nil, fmt.Errorf("cannot decode textual enum %q: value ought to be member of symbols: %v; %q", c.typeName, symbols, someString) } c.textualFromNative = func(buf []byte, datum interface{}) ([]byte, error) { someString, ok := datum.(string) if !ok { return nil, fmt.Errorf("cannot encode textual enum %q: expected string; received: %T", c.typeName, datum) } for _, symbol := range symbols { if symbol == someString { return stringTextualFromNative(buf, someString) } } return nil, fmt.Errorf("cannot encode textual enum %q: value ought to be member of symbols: %v; %q", c.typeName, symbols, someString) } return c, nil } goavro-2.10.1/enum_test.go000066400000000000000000000176141412474230400154340ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "encoding/json" "fmt" "testing" ) func TestSchemaEnum(t *testing.T) { testSchemaValid(t, `{"type":"enum","name":"foo","symbols":["alpha","bravo"]}`) } func TestEnumName(t *testing.T) { testSchemaInvalid(t, `{"type":"enum","symbols":["alpha","bravo"]}`, "Enum ought to have valid name: schema ought to have name key") testSchemaInvalid(t, `{"type":"enum","name":3}`, "Enum ought to have valid name: schema name ought to be non-empty string") testSchemaInvalid(t, `{"type":"enum","name":""}`, "Enum ought to have valid name: schema name ought to be non-empty string") testSchemaInvalid(t, `{"type":"enum","name":"&foo","symbols":["alpha","bravo"]}`, "Enum ought to have valid name: schema name ought to start with") testSchemaInvalid(t, `{"type":"enum","name":"foo&","symbols":["alpha","bravo"]}`, "Enum ought to have valid name: schema name ought to have second and remaining") } func TestEnumSymbols(t *testing.T) { testSchemaInvalid(t, `{"type":"enum","name":"e1"}`, `Enum "e1" ought to have symbols key`) testSchemaInvalid(t, `{"type":"enum","name":"e1","symbols":3}`, `Enum "e1" symbols ought to be non-empty array of strings`) testSchemaInvalid(t, `{"type":"enum","name":"e1","symbols":[]}`, `Enum "e1" symbols ought to be non-empty array of strings`) } func TestEnumSymbolInvalid(t *testing.T) { testSchemaInvalid(t, `{"type":"enum","name":"e1","symbols":[3]}`, `Enum "e1" symbol 1 ought to be non-empty string`) testSchemaInvalid(t, `{"type":"enum","name":"e1","symbols":[""]}`, `Enum "e1" symbol 1 ought to be non-empty string`) testSchemaInvalid(t, `{"type":"enum","name":"e1","symbols":["string-with-invalid-characters"]}`, `Enum "e1" symbol 1 ought to have second and remaining`) } func TestEnumDecodeError(t *testing.T) { testBinaryDecodeFail(t, `{"type":"enum","name":"e1","symbols":["alpha","bravo"]}`, nil, "short buffer") testBinaryDecodeFail(t, `{"type":"enum","name":"e1","symbols":["alpha","bravo"]}`, []byte("\x01"), `cannot decode binary enum "e1": index ought to be between 0 and 1`) testBinaryDecodeFail(t, `{"type":"enum","name":"e1","symbols":["alpha","bravo"]}`, []byte("\x04"), `cannot decode binary enum "e1": index ought to be between 0 and 1`) } func TestEnumEncodeError(t *testing.T) { testBinaryEncodeFail(t, `{"type":"enum","name":"e1","symbols":["alpha","bravo"]}`, 13, `cannot encode binary enum "e1": expected string; received: int`) testBinaryEncodeFail(t, `{"type":"enum","name":"e1","symbols":["alpha","bravo"]}`, "charlie", `cannot encode binary enum "e1": value ought to be member of symbols`) } func TestEnumEncode(t *testing.T) { testBinaryCodecPass(t, `{"type":"enum","name":"e1","symbols":["alpha","bravo"]}`, "alpha", []byte("\x00")) testBinaryCodecPass(t, `{"type":"enum","name":"e1","symbols":["alpha","bravo"]}`, "bravo", []byte("\x02")) } func TestEnumTextCodec(t *testing.T) { testTextCodecPass(t, `{"type":"enum","name":"e1","symbols":["alpha","bravo"]}`, "alpha", []byte(`"alpha"`)) testTextCodecPass(t, `{"type":"enum","name":"e1","symbols":["alpha","bravo"]}`, "bravo", []byte(`"bravo"`)) testTextEncodeFail(t, `{"type":"enum","name":"e1","symbols":["alpha","bravo"]}`, "charlie", `cannot encode textual enum "e1": value ought to be member of symbols`) testTextDecodeFail(t, `{"type":"enum","name":"e1","symbols":["alpha","bravo"]}`, []byte(`"charlie"`), `cannot decode textual enum "e1": value ought to be member of symbols`) } func TestGH233(t *testing.T) { // here's the fail case // testTextCodecPass(t, `{"type":"record","name":"FooBar","namespace":"com.foo.bar","fields":[{"name":"event","type":["null",{"type":"enum","name":"FooBarEvent","symbols":["CREATED","UPDATED"]}]}]}`, map[string]interface{}{"event": Union("FooBarEvent", "CREATED")}, []byte(`{"event":{"FooBarEvent":"CREATED"}}`)) // remove the namespace and it passes testTextCodecPass(t, `{"type":"record","name":"FooBar","fields":[{"name":"event","type":["null",{"type":"enum","name":"FooBarEvent","symbols":["CREATED","UPDATED"]}]}]}`, map[string]interface{}{"event": Union("FooBarEvent", "CREATED")}, []byte(`{"event":{"FooBarEvent":"CREATED"}}`)) // experiments // the basic enum testTextCodecPass(t, `{"type":"enum","name":"FooBarEvent","symbols":["CREATED","UPDATED"]}`, "CREATED", []byte(`"CREATED"`)) // the basic enum with namespace testTextCodecPass(t, `{"type":"enum","name":"FooBarEvent","namespace":"com.foo.bar","symbols":["CREATED","UPDATED"]}`, "CREATED", []byte(`"CREATED"`)) // union with enum testTextCodecPass(t, `["null",{"type":"enum","name":"FooBarEvent","symbols":["CREATED","UPDATED"]}]`, Union("FooBarEvent", "CREATED"), []byte(`{"FooBarEvent":"CREATED"}`)) // FAIL: union with enum with namespace: cannot determine codec: "FooBarEvent" // testTextCodecPass(t, `["null",{"type":"enum","name":"FooBarEvent","namespace":"com.foo.bar","symbols":["CREATED","UPDATED"]}]`, Union("FooBarEvent", "CREATED"), []byte(`{"FooBarEvent":"CREATED"}`)) // conclusion, union is not handling namespaces correctly // try union with record instead of enum (records and enums both have namespaces) // get a basic record going testTextCodecPass(t, `{"type":"record","name":"LongList","fields":[{"name":"next","type":["null","LongList"],"default":null}]}`, map[string]interface{}{"next": Union("LongList", map[string]interface{}{"next": nil})}, []byte(`{"next":{"LongList":{"next":null}}}`)) // add a namespace to the record // fails in the same way cannot determine codec: "LongList" for key: "next" // testTextCodecPass(t, `{"type":"record","name":"LongList","namespace":"com.foo.bar","fields":[{"name":"next","type":["null","LongList"],"default":null}]}`, map[string]interface{}{"next": Union("LongList", map[string]interface{}{"next": nil})}, []byte(`{"next":{"LongList":{"next":null}}}`)) // // experiments on syntax solutions // testTextCodecPass(t, `["null",{"type":"enum","name":"com.foo.bar.FooBarEvent","symbols":["CREATED","UPDATED"]}]`, Union("com.foo.bar.FooBarEvent", "CREATED"), []byte(`{"FooBarEvent":"CREATED"}`)) // thie TestUnionMapRecordFitsInRecord tests binary from Native, but not native from textual // that's where the error is happening // if the namespace is specified in the incoming name it works testTextCodecPass(t, `{"type":"record","name":"ns1.LongList","fields":[{"name":"next","type":["null","LongList"],"default":null}]}`, map[string]interface{}{"next": Union("ns1.LongList", map[string]interface{}{"next": nil})}, []byte(`{"next":{"ns1.LongList":{"next":null}}}`)) // try the failcase with the namespace specified on the input testTextCodecPass(t, `{"type":"record","name":"FooBar","namespace":"com.foo.bar","fields":[{"name":"event","type":["null",{"type":"enum","name":"FooBarEvent","symbols":["CREATED","UPDATED"]}]}]}`, map[string]interface{}{"event": Union("com.foo.bar.FooBarEvent", "CREATED")}, []byte(`{"event":{"com.foo.bar.FooBarEvent":"CREATED"}}`)) } func ExampleCheckSolutionGH233() { const avroSchema = ` { "type": "record", "name": "FooBar", "namespace": "com.foo.bar", "fields": [ { "name": "event", "type": [ "null", { "type": "enum", "name": "FooBarEvent", "symbols": ["CREATED", "UPDATED"] } ] } ] } ` codec, _ := NewCodec(avroSchema) const avroJson = `{"event":{"com.foo.bar.FooBarEvent":"CREATED"}}` native, _, err := codec.NativeFromTextual([]byte(avroJson)) if err != nil { panic(err) } blob, err := json.Marshal(native) if err != nil { panic(err) } fmt.Println(string(blob)) // Output: {"event":{"com.foo.bar.FooBarEvent":"CREATED"}} } goavro-2.10.1/examples/000077500000000000000000000000001412474230400147075ustar00rootroot00000000000000goavro-2.10.1/examples/165/000077500000000000000000000000001412474230400152225ustar00rootroot00000000000000goavro-2.10.1/examples/165/main.go000066400000000000000000000103651412474230400165020ustar00rootroot00000000000000// #165 // // This exemplifies three ways to encode an Avro record. Note that I did not // say "Go struct" because there is no struct in this example. `goavro` expects // data that is to be encoded as an Avro record to be given in the form of a // `map[string]interface{}`, so create the map, populate whichever key-value // pairs that the Avro record type requires, and pass it on to one of the // encoding methods. // // Note that there are three ways to encode Avro data into binary. The first // way is to use the BinaryFromNative method, which simply encodes the provided // value as a sequence of bytes, appending the new bytes to the provided byte // slice, and returning the new byte slice. This binary data is completely // unusable by any process that wants to decode the bytes unless the original // schema that was used to encode the data is known when trying to decode the // bytes. // // The second example is using Avro's Single-Object Encoding specification, // where a magic byte sequence, then the schema's fingerprint is first appended // to the provided byte slice, then finally the binary encoded bytes of the data // is appended. This method is useful for processes where the decoding reader // will pull off a chunk of bytes, use the fingerprint to look up the schema in // some sort of schema registry, then use that schema to decode the bytes that // follow. This method is used by Kafka producers and consumers, where rather // than shoving the schema text on the wire for each method is wasteful compared // to shoving a tiny schema fingerprint on the wire. This method only uses 10 // more bytes to uniquely identify the schema. // // Finally the third example uses the Avro Object Container File format to // encode the data, where the OCF file has a copy of the schema used to encode // the file. Because the original schema prefixes the entire file, any Avro // reader can decode the contents of the entire file without having to look up // its schema in a registry. package main import ( "os" "github.com/linkedin/goavro" ) const loginEventAvroSchema = `{"type": "record", "name": "LoginEvent", "fields": [{"name": "Username", "type": "string"}]}` func main() { codec, err := goavro.NewCodec(loginEventAvroSchema) if err != nil { panic(err) } m := map[string]interface{}{ "Username": "superman", } // Let's dip our feet into just encoding a single item into binary format. // There is not much to do with the output from binary if you intend on // creating an OCF file, because OCF will do this encoding for us. The // result is an unadorned stream of binary bytes that can never be decoded // unless you happen to know the schema that was used to encode it. binary, err := codec.BinaryFromNative(nil, m) if err != nil { panic(err) } _ = binary // Next, let's try encoding the same item using Single-Object Encoding, // another format that is useful when sending a bunch of objects into a // Kafka stream. Note this method prefixes the binary bytes with a schema // fingerprint, used by the reader on the stream to lookup the contents of // the schema used to encode the value. Again, unless the reader can fetch // the schema contents from a schema source-of-truth, this binary sequence // will never be decodable. single, err := codec.SingleFromNative(nil, m) if err != nil { panic(err) } _ = single // Next, let's make an OCF file from the values. The OCF format prefixes // the entire file with the required schema that was used to encode the // data, so it is readable from any Avro decoder that can read OCF files. // No other source of information is needed to decode the file created by // this process, unlike the above two examples. Also note that we do not // send OCF the encoded blobs to write, but just append the values and it // will encode each of the values for us. var values []map[string]interface{} values = append(values, m) values = append(values, map[string]interface{}{"Username": "batman"}) values = append(values, map[string]interface{}{"Username": "wonder woman"}) f, err := os.Create("event.avro") if err != nil { panic(err) } ocfw, err := goavro.NewOCFWriter(goavro.OCFConfig{ W: f, Codec: codec, }) if err != nil { panic(err) } if err = ocfw.Append(values); err != nil { panic(err) } } goavro-2.10.1/examples/ab2t/000077500000000000000000000000001412474230400155375ustar00rootroot00000000000000goavro-2.10.1/examples/ab2t/main.go000066400000000000000000000036751412474230400170250ustar00rootroot00000000000000package main import ( "bufio" "flag" "fmt" "io" "os" "path/filepath" "sync" "github.com/linkedin/goavro/v2" ) func usage() { executable, err := os.Executable() if err != nil { executable = os.Args[0] } base := filepath.Base(executable) fmt.Fprintf(os.Stderr, "Usage of %s:\n", base) fmt.Fprintf(os.Stderr, "\t%s [file1.avro [file2.avro [file3.avro]]]\n", base) fmt.Fprintf(os.Stderr, "\tWhen filename is hyphen, %s will read from its standard input.\n", base) flag.PrintDefaults() os.Exit(2) } func main() { args := os.Args[1:] if len(args) == 0 { usage() } for _, arg := range args { if arg == "-" { stat, err := os.Stdin.Stat() if err != nil { bail(err) } if (stat.Mode() & os.ModeCharDevice) != 0 { usage() } if err = dumpFromReader(os.Stdin); err != nil { bail(err) } if err = os.Stdin.Close(); err != nil { bail(err) } continue } fh, err := os.Open(arg) if err != nil { bail(err) } if err := dumpFromReader(bufio.NewReader(fh)); err != nil { bail(err) } if err := fh.Close(); err != nil { bail(err) } } } func dumpFromReader(ior io.Reader) error { ocf, err := goavro.NewOCFReader(ior) if err != nil { return err } codec := ocf.Codec() data := make(chan interface{}, 100) finishedOutput := new(sync.WaitGroup) finishedOutput.Add(1) go textualFromNative(codec, data, finishedOutput) for ocf.Scan() { datum, err := ocf.Read() if err != nil { fmt.Fprintf(os.Stderr, "%s\n", err) continue } data <- datum } close(data) finishedOutput.Wait() return ocf.Err() } func textualFromNative(codec *goavro.Codec, data <-chan interface{}, finishedOutput *sync.WaitGroup) { for datum := range data { buf, err := codec.TextualFromNative(nil, datum) if err != nil { fmt.Fprintf(os.Stderr, "%s\n", err) continue } fmt.Println(string(buf)) } finishedOutput.Done() } func bail(err error) { fmt.Fprintf(os.Stderr, "%s\n", err) os.Exit(1) } goavro-2.10.1/examples/arw/000077500000000000000000000000001412474230400155005ustar00rootroot00000000000000goavro-2.10.1/examples/arw/README.md000066400000000000000000000037131412474230400167630ustar00rootroot00000000000000# arw Avro ReWrite Provide command line utility to rewrite an Avro Object Container File (OCF), while changing the block count, the compression algorithm, or upgrading the schema. Note that when upgrading the schema, the new schema must be able to properly encode the data read using the old schema. Why would a person want to upgrade the schema for an existing OCF? Perhaps if one wants to append data to it using the new schema. Example use: ``` arw -summary -bc 100 -compression deflate -schema new-schema.avsc source.avro destination.avro ``` If summary option, `-summary`, is provided, `arw` will provide summary information while rewriting the OCF. If verbose option, `-v`, is provided, `arw` will provide verbose information while rewriting the OCF. Specifying verbose implies the summary option. If block count option, `-bc`, is provided, then each block will have no more items than specified. If omitted, then `arw` will re-encode blocks of the same length as found in `source.avro`. For instance, if the first block had 10 items, and the second has 15, then the `destination.avro` file will also have 10 items in the first block and 15 items in the second block. If compression option is omitted, then `arw` will use the same compression algorithm as found in `source.avro`. If schema option is omitted, then `arw` will write the new Avro file using the same schema as found in `source.avro`. If provided, `arw` will read the source Avro file using its provided schema, but attempt to encode and write the destination Avro file using the newly provided schema. If an item fails to encode using the new schema, the process will be aborted and an error message will be provided. If `source.avro` is a hyphen character, `-`, then `arw` will read from standard input. If `destination.avro` is a hyphen character, then `arw` will write to standard output. Invoking `arw` without any of the options simply copies the OCF file, verifying the contents of the data along the way. goavro-2.10.1/examples/arw/main.go000066400000000000000000000134131412474230400167550ustar00rootroot00000000000000package main import ( "errors" "flag" "fmt" "io" "io/ioutil" "os" "path/filepath" "github.com/linkedin/goavro/v2" ) func bail(err error) { fmt.Fprintf(os.Stderr, "%s\n", err) os.Exit(1) } func usage(err error) { if err != nil { fmt.Fprintf(os.Stderr, "%s\n", err) } executable, err := os.Executable() if err != nil { executable = os.Args[0] } base := filepath.Base(executable) fmt.Fprintf(os.Stderr, "Usage of %s:\n", base) fmt.Fprintf(os.Stderr, "\t%s [-v] [-summary] [-bc N] [-compression null|deflate|snappy] [-schema new-schema.avsc] source.avro destination.avro\n", base) fmt.Fprintf(os.Stderr, "\tWhen source.avro pathname is hyphen, %s will read from its standard input.\n", base) fmt.Fprintf(os.Stderr, "\tWhen destination.avro pathname is hyphen, %s will write to its standard output.\n", base) flag.PrintDefaults() os.Exit(2) } var ( blockCount *int compressionName, schemaPathname *string summary, verbose *bool ) func init() { compressionName = flag.String("compression", "", "compression codec ('null', 'deflate', 'snappy'; default: use source compression)") blockCount = flag.Int("bc", 0, "max count of items in each block (default: use source block boundaries)") schemaPathname = flag.String("schema", "", "pathname to new schema (default: use source schema)") summary = flag.Bool("summary", false, "print summary information to stderr") verbose = flag.Bool("v", false, "print verbose information to stderr (implies: -summary)") } func main() { flag.Parse() if count := len(flag.Args()); count != 2 { usage(fmt.Errorf("wrong number of arguments: %d", count)) } if *blockCount < 0 { usage(fmt.Errorf("count must be greater or equal to 0: %d", *blockCount)) } if *verbose { *summary = true } var err error var fromF io.ReadCloser var toF io.WriteCloser if srcPathname := flag.Arg(0); srcPathname == "-" { stat, err := os.Stdin.Stat() if err != nil { bail(err) } if (stat.Mode() & os.ModeCharDevice) != 0 { usage(errors.New("cannot read from standard input when connected to terminal")) } fromF = os.Stdin if *summary { fmt.Fprintf(os.Stderr, "reading from stdin\n") } } else { fromF, err = os.Open(srcPathname) if err != nil { bail(err) } defer func(ioc io.Closer) { if err := ioc.Close(); err != nil { bail(err) } }(fromF) if *summary { fmt.Fprintf(os.Stderr, "reading from %s\n", flag.Arg(0)) } } if destPathname := flag.Arg(1); destPathname == "-" { stat, err := os.Stdout.Stat() if err != nil { bail(err) } // if *verbose { // DEBUG // fmt.Fprintf(os.Stderr, "standard output mode: %v\n", stat.Mode()) // } if (stat.Mode() & os.ModeCharDevice) != 0 { usage(errors.New("cannot send to standard output when connected to terminal")) } toF = os.Stdout if *summary { fmt.Fprintf(os.Stderr, "writing to stdout\n") } } else { toF, err = os.Create(destPathname) if err != nil { bail(err) } defer func(ioc io.Closer) { if err := ioc.Close(); err != nil { bail(err) } }(toF) if *summary { fmt.Fprintf(os.Stderr, "writing to %s\n", flag.Arg(1)) } } // NOTE: Convert fromF to OCFReader ocfr, err := goavro.NewOCFReader(fromF) if err != nil { bail(err) } inputCompressionName := ocfr.CompressionName() outputCompressionName := inputCompressionName if *compressionName != "" { outputCompressionName = *compressionName } if *summary { fmt.Fprintf(os.Stderr, "input compression algorithm: %s\n", inputCompressionName) fmt.Fprintf(os.Stderr, "output compression algorithm: %s\n", outputCompressionName) } // NOTE: Either use schema from reader, or attempt to use new schema var outputSchema string if *schemaPathname == "" { outputSchema = ocfr.Codec().Schema() } else { schemaBytes, err := ioutil.ReadFile(*schemaPathname) if err != nil { bail(err) } outputSchema = string(schemaBytes) } // NOTE: Convert toF to OCFWriter ocfw, err := goavro.NewOCFWriter(goavro.OCFConfig{ W: toF, CompressionName: outputCompressionName, Schema: outputSchema, }) if err != nil { bail(err) } if err := transcode(ocfr, ocfw); err != nil { bail(err) } } func transcode(from *goavro.OCFReader, to *goavro.OCFWriter) error { var blocksRead, blocksWritten, itemsRead int var block []interface{} if *blockCount > 0 { block = make([]interface{}, 0, *blockCount) } for from.Scan() { datum, err := from.Read() if err != nil { break } itemsRead++ block = append(block, datum) endOfBlock := from.RemainingBlockItems() == 0 if endOfBlock { blocksRead++ if *verbose { fmt.Fprintf(os.Stderr, "read block with %d items\n", len(block)) } } // NOTE: When blockCount is 0, user wants each destination block to have // the same number of items as its corresponding source block. However, // when blockCount is greater than 0, user wants specified block count // sizes. if (*blockCount == 0 && endOfBlock) || (*blockCount > 0 && len(block) == *blockCount) { if err := writeBlock(to, block); err != nil { return err } blocksWritten++ block = block[:0] // set slice length to 0 in order to re-use allocated underlying array } } var err error // append all remaining items (condition can only be true used when *blockCount > 0) if len(block) > 0 { if err = writeBlock(to, block); err == nil { blocksWritten++ } } // if no write error, then return any read error encountered if err == nil { err = from.Err() } if *summary { fmt.Fprintf(os.Stderr, "read %d items\n", itemsRead) fmt.Fprintf(os.Stderr, "wrote %d blocks\n", blocksWritten) } return err } func writeBlock(to *goavro.OCFWriter, block []interface{}) error { if *verbose { fmt.Fprintf(os.Stderr, "writing block with %d items\n", len(block)) } return to.Append(block) } goavro-2.10.1/examples/avroheader/000077500000000000000000000000001412474230400170275ustar00rootroot00000000000000goavro-2.10.1/examples/avroheader/main.go000066400000000000000000000037651412474230400203150ustar00rootroot00000000000000package main import ( "flag" "fmt" "io" "os" "path/filepath" "github.com/linkedin/goavro/v2" ) var ( showCount = flag.Bool("count", false, "show count of data items") showSchema = flag.Bool("schema", false, "show data schema") ) func usage() { executable, err := os.Executable() if err != nil { executable = os.Args[0] } base := filepath.Base(executable) fmt.Fprintf(os.Stderr, "Usage of %s:\n", base) fmt.Fprintf(os.Stderr, "\t%s [-count] [-schema] [file1.avro...]\n", base) fmt.Fprintf(os.Stderr, "\tAs a special case, when there are no filename arguments, %s will read\n", base) fmt.Fprintf(os.Stderr, "\tfrom its standard input.\n") flag.PrintDefaults() os.Exit(2) } func main() { flag.Parse() args := flag.Args() if len(args) == 0 { stat, err := os.Stdin.Stat() if err != nil { bail(err) } if (stat.Mode() & os.ModeCharDevice) != 0 { usage() } if err := headerFromReader(os.Stdin, ""); err != nil { bail(err) } } for _, arg := range args { fh, err := os.Open(arg) if err != nil { bail(err) } if len(args) > 1 { arg += ": " } else { arg = "" } if err := headerFromReader(fh, arg); err != nil { bail(err) } if err := fh.Close(); err != nil { bail(err) } } } func headerFromReader(ior io.Reader, prefix string) error { ocfr, err := goavro.NewOCFReader(ior) if err != nil { return err } fmt.Printf("%sCompression Algorithm (avro.codec): %q\n", prefix, ocfr.CompressionName()) if *showSchema { fmt.Printf("%sSchema (avro.schema):\n%s\n", prefix, ocfr.Codec().Schema()) } if !*showCount { return nil } var decoded, errors int for ocfr.Scan() { _, err := ocfr.Read() if err != nil { fmt.Fprintf(os.Stderr, "%s\n", err) errors++ continue } decoded++ } if decoded > 0 { fmt.Printf("%sSuccessfully decoded: %d\n", prefix, decoded) } if errors > 0 { fmt.Printf("%sCannot decode: %d\n", prefix, errors) } return ocfr.Err() } func bail(err error) { fmt.Fprintf(os.Stderr, "%s\n", err) os.Exit(1) } goavro-2.10.1/examples/nested/000077500000000000000000000000001412474230400161715ustar00rootroot00000000000000goavro-2.10.1/examples/nested/main.go000066400000000000000000000070471412474230400174540ustar00rootroot00000000000000package main import ( "fmt" "io/ioutil" "os" "reflect" "github.com/linkedin/goavro/v2" ) var ( codec *goavro.Codec ) func init() { schema, err := ioutil.ReadFile("schema.avsc") if err != nil { panic(err) } //Create Schema Once codec, err = goavro.NewCodec(string(schema)) if err != nil { panic(err) } } func main() { //Sample Data user := &User{ FirstName: "John", LastName: "Snow", Address: &Address{ Address1: "1106 Pennsylvania Avenue", City: "Wilmington", State: "DE", Zip: 19806, }, } fmt.Printf("user in=%+v\n", user) ///Convert Binary From Native binary, err := codec.BinaryFromNative(nil, user.ToStringMap()) if err != nil { panic(err) } ///Convert Native from Binary native, _, err := codec.NativeFromBinary(binary) if err != nil { panic(err) } //Convert it back tp Native userOut := StringMapToUser(native.(map[string]interface{})) fmt.Printf("user out=%+v\n", userOut) if ok := reflect.DeepEqual(user, userOut); !ok { fmt.Fprintf(os.Stderr, "struct Compare Failed ok=%t\n", ok) os.Exit(1) } } // User holds information about a user. type User struct { FirstName string LastName string Errors []string Address *Address } // Address holds information about an address. type Address struct { Address1 string Address2 string City string State string Zip int } // ToStringMap returns a map representation of the User. func (u *User) ToStringMap() map[string]interface{} { datumIn := map[string]interface{}{ "FirstName": string(u.FirstName), "LastName": string(u.LastName), } if len(u.Errors) > 0 { datumIn["Errors"] = goavro.Union("array", u.Errors) } else { datumIn["Errors"] = goavro.Union("null", nil) } if u.Address != nil { addDatum := map[string]interface{}{ "Address1": string(u.Address.Address1), "City": string(u.Address.City), "State": string(u.Address.State), "Zip": int(u.Address.Zip), } if u.Address.Address2 != "" { addDatum["Address2"] = goavro.Union("string", u.Address.Address2) } else { addDatum["Address2"] = goavro.Union("null", nil) } //important need namespace and record name datumIn["Address"] = goavro.Union("my.namespace.com.address", addDatum) } else { datumIn["Address"] = goavro.Union("null", nil) } return datumIn } // StringMapToUser returns a User from a map representation of the User. func StringMapToUser(data map[string]interface{}) *User { ind := &User{} for k, v := range data { switch k { case "FirstName": if value, ok := v.(string); ok { ind.FirstName = value } case "LastName": if value, ok := v.(string); ok { ind.LastName = value } case "Errors": if value, ok := v.(map[string]interface{}); ok { for _, item := range value["array"].([]interface{}) { ind.Errors = append(ind.Errors, item.(string)) } } case "Address": if vmap, ok := v.(map[string]interface{}); ok { //important need namespace and record name if cookieSMap, ok := vmap["my.namespace.com.address"].(map[string]interface{}); ok { add := &Address{} for k, v := range cookieSMap { switch k { case "Address1": if value, ok := v.(string); ok { add.Address1 = value } case "Address2": if value, ok := v.(string); ok { add.Address2 = value } case "City": if value, ok := v.(string); ok { add.City = value } case "Zip": if value, ok := v.(int); ok { add.Zip = value } } } ind.Address = add } } } } return ind } goavro-2.10.1/examples/nested/schema.avsc000066400000000000000000000012731412474230400203120ustar00rootroot00000000000000{ "namespace": "my.namespace.com", "type": "record", "name": "indentity", "fields": [ { "name": "FirstName", "type": "string"}, { "name": "LastName", "type": "string"}, { "name": "Errors", "type": ["null", {"type":"array", "items":"string"}], "default": null }, { "name": "Address", "type": ["null",{ "namespace": "my.namespace.com", "type": "record", "name": "address", "fields": [ { "name": "Address1", "type": "string" }, { "name": "Address2", "type": ["null", "string"], "default": null }, { "name": "City", "type": "string" }, { "name": "State", "type": "string" }, { "name": "Zip", "type": "int" } ] }],"default":null} ] }goavro-2.10.1/examples/roundtrip/000077500000000000000000000000001412474230400167355ustar00rootroot00000000000000goavro-2.10.1/examples/roundtrip/main.go000066400000000000000000000101731412474230400202120ustar00rootroot00000000000000package main import ( "bufio" bin "encoding/binary" hex "encoding/hex" "flag" "fmt" "io" "io/ioutil" "os" "github.com/linkedin/goavro/v2" ) // roundtrip is a tool for checking avro // // incoming data is assumed to be standard json // incoming json is required to be one json object per line // use `jq -c .` if you need to. get it into one line // // you can write out your avro in binary form and stop there // which is useful for cases where you might want to send it off into other tools // // you can also do a roundtrip of decode/encode // which allows you to see if your avro schema matches your expectations // // If you want to use an encoded schemaid then specify a schemid with -sid // it will be encoded per a common standard (one null byte, 16 bytes of schemaid) // Its NOT the standard SOE // SOE should be added // Probably OCF should be added too // // EXAMPLE // // kubectl get events -w -o json | jq -c . | ./roundtrip -sid aa6b1ca0e1ee2d885bfbc747f4a4011b -avsc event-schema.json ) -rt func MakeAvroHeader(schemaid string) (header []byte, err error) { dst, err := hex.DecodeString(schemaid) if err != nil { return } header = append(header, byte(0)) header = append(header, dst...) return } func main() { var avsc = flag.String("avsc", "", "the avro schema") var data = flag.String("data", "-", "(default stdin) the data that corresponds to the avro schema or error - ONE LINE PER DATA ITEM") var schemaid = flag.String("sid", "", "the schemaid which is normally the md5hash of rht schema itself") var roundtrip = flag.Bool("rt", false, "do full round trip to try to rebuild the original data string") var xxd = flag.String("bin", "", "write out the binary data to this file - look at it with xxd if you want to") var appendBin = flag.Bool("append", false, "append to the output binary file instead of trunc") flag.Parse() _avsc, err := ioutil.ReadFile(*avsc) if err != nil { panic(fmt.Sprintf("Failed to read avsc file:%s:error:%v:", *avsc, err)) } codec, err := goavro.NewCodecForStandardJSON(string(_avsc)) if err != nil { panic(err) } var _data io.Reader if *data == "-" { _data = os.Stdin } else { file, err := os.Open(*data) if err != nil { panic(fmt.Sprintf("Failed to open data file:%s:error:%v:", *data, err)) } _data = bufio.NewReader(file) defer file.Close() } binOut := struct { file *os.File do bool }{} if len(*xxd) > 0 { bits := os.O_WRONLY | os.O_CREATE if *appendBin { bits |= os.O_APPEND } else { bits |= os.O_TRUNC } binOut.file, err = os.OpenFile(*xxd, bits, 0600) if err != nil { panic(err) } defer binOut.file.Close() binOut.do = true } scanner := bufio.NewScanner(_data) for scanner.Scan() { dat := scanner.Text() if len(dat) == 0 { fmt.Println("skipping empty line") continue } fmt.Println("RT in") fmt.Println(dat) textual := []byte(dat) fmt.Printf("encoding for schemaid:%s:\n", *schemaid) avroNative, _, err := codec.NativeFromTextual(textual) if err != nil { fmt.Println(dat) panic(err) } header, err := MakeAvroHeader(*schemaid) if err != nil { fmt.Println(string(textual)) panic(err) } avrobin, err := codec.BinaryFromNative(nil, avroNative) if err != nil { fmt.Println(dat) panic(err) } // trying to minimize operations within the loop // so do only a quick boolean check here if binOut.do { for _, buf := range [][]byte{header, avrobin} { err = bin.Write(binOut.file, bin.LittleEndian, buf) if err != nil { fmt.Println(dat) panic(err) } } } if *roundtrip { // this will scramble the order // since it makes new go maps // when it takes the binary into native rtnativeval, _, err := codec.NativeFromBinary(avrobin) if err != nil { fmt.Println(dat) panic(err) } // Convert native Go form to textual Avro data textual, err = codec.TextualFromNative(nil, rtnativeval) if err != nil { fmt.Println(dat) panic(err) } fmt.Println("RT out") fmt.Println(string(textual)) } } if err := scanner.Err(); err != nil { fmt.Println("scanner error") panic(err) } fmt.Println("Done with loop - no more data") } goavro-2.10.1/examples/soe/000077500000000000000000000000001412474230400154755ustar00rootroot00000000000000goavro-2.10.1/examples/soe/main.go000066400000000000000000000047631412474230400167620ustar00rootroot00000000000000package main import ( "fmt" "github.com/linkedin/goavro" ) func main() { codex := initCodex() err := decode(codex, []byte("\xC3\x01"+"\x8F\x5C\x39\x3F\x1A\xD5\x75\x72"+"\x06")) if err != nil { panic(err) } err = decode(codex, []byte("\xC3\x01"+"\xC7\x03\x45\x63\x72\x48\x01\x8F"+"\x0ahello")) if err != nil { panic(err) } } // initCodex returns a codex with a small handful of example Codec instances. func initCodex() map[uint64]*goavro.Codec { codex := make(map[uint64]*goavro.Codec) for _, primitive := range []string{"int", "long", "boolean", "float", "double", "string"} { codec, err := goavro.NewCodec(`"` + primitive + `"`) if err != nil { panic(err) } codex[codec.Rabin] = codec } return codex } // decode attempts to decode the bytes in buf using one of the Codec instances // in codex. The buf must start with the single-object encoding prefix, // followed by the unsigned 64-bit Rabin fingerprint of the canonical schema // used to encode the datum, finally followed by the encoded bytes. This is a // simplified example of fetching the fingerprint from the SOE buffer, using // that fingerprint to select a Codec from a dictionary of Codec instances, // called codex in this case, and finally sends the buf to be decoded by that // Codec. func decode(codex map[uint64]*goavro.Codec, buf []byte) error { // Perform a sanity check on the buffer, then return the Rabin fingerprint // of the schema used to encode the data. fingerprint, newBuf, err := goavro.FingerprintFromSOE(buf) if err != nil { panic(err) return err } // Get a previously stored Codec from the codex map. codec, ok := codex[fingerprint] if !ok { return fmt.Errorf("unknown codec: %#x", fingerprint) } // Use the fetched Codec to decode the buffer as a SOE. var datum interface{} // Both of the following branches work, but provided to illustrate two // use-cases. if true { // Faster because SOE magic prefix and schema fingerprint already // checked and used to fetch the Codec. Just need to decode the binary // bytes remaining after the prefix were removed. datum, _, err = codec.NativeFromBinary(newBuf) } else { // This way re-checks the SOE magic prefix and Codec fingerprint, doing // repetitive work, but provided as an example for cases when there is // only a single schema, a single Codec, and you do not use // the FingerprintFromSOE function above. datum, _, err = codec.NativeFromSingle(buf) } if err != nil { panic(err) } _, err = fmt.Println(datum) return err } goavro-2.10.1/examples/splice/000077500000000000000000000000001412474230400161665ustar00rootroot00000000000000goavro-2.10.1/examples/splice/main.go000066400000000000000000000031051412474230400174400ustar00rootroot00000000000000package main import ( "flag" "fmt" "io" "io/ioutil" "os" "path/filepath" "github.com/linkedin/goavro/v2" ) func bail(err error) { fmt.Fprintf(os.Stderr, "%s\n", err) os.Exit(1) } func usage() { executable, err := os.Executable() if err != nil { executable = os.Args[0] } base := filepath.Base(executable) fmt.Fprintf(os.Stderr, "Usage of %s:\n", base) fmt.Fprintf(os.Stderr, "\t%s [-compression null|deflate|snappy] schema.avsc input.dat output.avro\n", base) flag.PrintDefaults() os.Exit(2) } func main() { compressionName := flag.String("compression", "null", "compression codec ('null', 'deflate', 'snappy'; default: 'null')") flag.Parse() if len(flag.Args()) != 3 { usage() } schemaBytes, err := ioutil.ReadFile(flag.Arg(0)) if err != nil { bail(err) } codec, err := goavro.NewCodec(string(schemaBytes)) if err != nil { bail(err) } dataBytes, err := ioutil.ReadFile(flag.Arg(1)) if err != nil { bail(err) } fh, err := os.Create(flag.Arg(2)) if err != nil { bail(err) } defer func(ioc io.Closer) { if err := ioc.Close(); err != nil { bail(err) } }(fh) ocfw, err := goavro.NewOCFWriter(goavro.OCFConfig{ W: fh, Codec: codec, CompressionName: *compressionName, }) if err != nil { bail(err) } var datum interface{} for len(dataBytes) > 0 { datum, dataBytes, err = codec.NativeFromBinary(dataBytes) if err != nil { if err == io.EOF { err = nil break } bail(err) } if err = ocfw.Append([]interface{}{datum}); err != nil { bail(err) } } if err != nil { bail(err) } } goavro-2.10.1/fixed.go000066400000000000000000000075361412474230400145320ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "fmt" "strconv" ) // Fixed does not have child objects, therefore whatever namespace it defines is // just to store its name in the symbol table. func makeFixedCodec(st map[string]*Codec, enclosingNamespace string, schemaMap map[string]interface{}) (*Codec, error) { c, err := registerNewCodec(st, schemaMap, enclosingNamespace) if err != nil { return nil, fmt.Errorf("Fixed ought to have valid name: %s", err) } size, err := sizeFromSchemaMap(c.typeName, schemaMap) if err != nil { return nil, err } c.nativeFromBinary = func(buf []byte) (interface{}, []byte, error) { if buflen := uint(len(buf)); size > buflen { return nil, nil, fmt.Errorf("cannot decode binary fixed %q: schema size exceeds remaining buffer size: %d > %d (short buffer)", c.typeName, size, buflen) } return buf[:size], buf[size:], nil } c.binaryFromNative = func(buf []byte, datum interface{}) ([]byte, error) { var someBytes []byte switch d := datum.(type) { case []byte: someBytes = d case string: someBytes = []byte(d) default: return nil, fmt.Errorf("cannot encode binary fixed %q: expected []byte or string; received: %T", c.typeName, datum) } if count := uint(len(someBytes)); count != size { return nil, fmt.Errorf("cannot encode binary fixed %q: datum size ought to equal schema size: %d != %d", c.typeName, count, size) } return append(buf, someBytes...), nil } c.nativeFromTextual = func(buf []byte) (interface{}, []byte, error) { if buflen := uint(len(buf)); size > buflen { return nil, nil, fmt.Errorf("cannot decode textual fixed %q: schema size exceeds remaining buffer size: %d > %d (short buffer)", c.typeName, size, buflen) } var datum interface{} var err error datum, buf, err = bytesNativeFromTextual(buf) if err != nil { return nil, buf, err } datumBytes := datum.([]byte) if count := uint(len(datumBytes)); count != size { return nil, nil, fmt.Errorf("cannot decode textual fixed %q: datum size ought to equal schema size: %d != %d", c.typeName, count, size) } return datum, buf, err } c.textualFromNative = func(buf []byte, datum interface{}) ([]byte, error) { var someBytes []byte switch d := datum.(type) { case []byte: someBytes = d case string: someBytes = []byte(d) default: return nil, fmt.Errorf("cannot encode textual fixed %q: expected []byte or string; received: %T", c.typeName, datum) } if count := uint(len(someBytes)); count != size { return nil, fmt.Errorf("cannot encode textual fixed %q: datum size ought to equal schema size: %d != %d", c.typeName, count, size) } return bytesTextualFromNative(buf, someBytes) } return c, nil } func sizeFromSchemaMap(typeName *name, schemaMap map[string]interface{}) (uint, error) { // Fixed type must have size sizeRaw, ok := schemaMap["size"] if !ok { return 0, fmt.Errorf("Fixed %q ought to have size key", typeName) } var size uint switch val := sizeRaw.(type) { case string: s, err := strconv.ParseUint(val, 10, 0) if err != nil { return 0, fmt.Errorf("Fixed %q size ought to be number greater than zero: %v", typeName, sizeRaw) } size = uint(s) case float64: if val <= 0 { return 0, fmt.Errorf("Fixed %q size ought to be number greater than zero: %v", typeName, sizeRaw) } size = uint(val) default: return 0, fmt.Errorf("Fixed %q size ought to be number greater than zero: %v", typeName, sizeRaw) } return size, nil } goavro-2.10.1/fixed_test.go000066400000000000000000000074721412474230400155700ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "testing" ) func TestSchemaFixed(t *testing.T) { testSchemaValid(t, `{"type": "fixed", "size": 16, "name": "md5"}`) testSchemaValid(t, `{"type":"fixed","name":"f1","size":"16"}`) } func TestFixedName(t *testing.T) { testSchemaInvalid(t, `{"type":"fixed","size":16}`, "Fixed ought to have valid name: schema ought to have name key") testSchemaInvalid(t, `{"type":"fixed","name":3}`, "Fixed ought to have valid name: schema name ought to be non-empty string") testSchemaInvalid(t, `{"type":"fixed","name":""}`, "Fixed ought to have valid name: schema name ought to be non-empty string") testSchemaInvalid(t, `{"type":"fixed","name":"&foo","size":16}`, "Fixed ought to have valid name: schema name ought to start with") testSchemaInvalid(t, `{"type":"fixed","name":"foo&","size":16}`, "Fixed ought to have valid name: schema name ought to have second and remaining") } func TestFixedSize(t *testing.T) { testSchemaInvalid(t, `{"type":"fixed","name":"f1"}`, `Fixed "f1" ought to have size key`) testSchemaInvalid(t, `{"type":"fixed","name":"f1","size":-1}`, `Fixed "f1" size ought to be number greater than zero`) testSchemaInvalid(t, `{"type":"fixed","name":"f1","size":0}`, `Fixed "f1" size ought to be number greater than zero`) } func TestFixedDecodeBufferUnderflow(t *testing.T) { testBinaryDecodeFail(t, `{"type":"fixed","name":"md5","size":16}`, nil, "short buffer") } func TestFixedDecodeWithExtra(t *testing.T) { c, err := NewCodec(`{"type":"fixed","name":"foo","size":4}`) if err != nil { t.Errorf("GOT: %#v; WANT: %#v", err, nil) } val, buf, err := c.NativeFromBinary([]byte("abcdefgh")) if actual, expected := string(val.([]byte)), "abcd"; actual != expected { t.Errorf("GOT: %#v; WANT: %#v", actual, expected) } if actual, expected := string(buf), "efgh"; actual != expected { t.Errorf("GOT: %#v; WANT: %#v", actual, expected) } if err != nil { t.Errorf("GOT: %#v; WANT: %#v", err, nil) } } func TestFixedEncodeUnsupportedType(t *testing.T) { testBinaryEncodeFailBadDatumType(t, `{"type":"fixed","name":"foo","size":4}`, 13) } func TestFixedEncodeWrongSize(t *testing.T) { testBinaryEncodeFail(t, `{"type":"fixed","name":"foo","size":4}`, []byte("abcde"), "datum size ought to equal schema size") testBinaryEncodeFail(t, `{"type":"fixed","name":"foo","size":4}`, []byte("abc"), "datum size ought to equal schema size") } func TestFixedEncode(t *testing.T) { testBinaryCodecPass(t, `{"type":"fixed","name":"foo","size":4}`, []byte("abcd"), []byte("abcd")) } func TestFixedTextCodec(t *testing.T) { schema := `{"type":"fixed","name":"f1","size":4}` testTextDecodeFail(t, schema, []byte(`"\u0001\u0002\u0003"`), "datum size ought to equal schema size") testTextDecodeFail(t, schema, []byte(`"\u0001\u0002\u0003\u0004\u0005"`), "datum size ought to equal schema size") testTextEncodeFail(t, schema, []byte{1, 2, 3}, "datum size ought to equal schema size") testTextEncodeFail(t, schema, []byte{1, 2, 3, 4, 5}, "datum size ought to equal schema size") testTextEncodePass(t, schema, []byte{1, 2, 3, 4}, []byte(`"\u0001\u0002\u0003\u0004"`)) } func TestFixedCodecAcceptsString(t *testing.T) { schema := `{"type":"fixed","name":"f1","size":4}` t.Run("binary", func(t *testing.T) { testBinaryEncodePass(t, schema, "abcd", []byte(`abcd`)) }) t.Run("text", func(t *testing.T) { testTextEncodePass(t, schema, "abcd", []byte(`"abcd"`)) }) } goavro-2.10.1/fixtures/000077500000000000000000000000001412474230400147425ustar00rootroot00000000000000goavro-2.10.1/fixtures/bad-header.avro000066400000000000000000000000041412474230400176010ustar00rootroot00000000000000Obj goavro-2.10.1/fixtures/blockCountExceedsMaxBlockCount.avro000066400000000000000000000000671412474230400236740ustar00rootroot00000000000000Objavro.schema{"type":"long"}0123456789abcdefgoavro-2.10.1/fixtures/blockSizeExceedsMaxBlockSize.avro000066400000000000000000000000701412474230400233320ustar00rootroot00000000000000Objavro.schema{"type":"long"}0123456789abcdefgoavro-2.10.1/fixtures/blockSizeNotGreaterThanZero.avro000066400000000000000000000000641412474230400232260ustar00rootroot00000000000000Objavro.schema{"type":"long"}0123456789abcdefgoavro-2.10.1/fixtures/cannotDiscardBlockBytes.avro000066400000000000000000000000641412474230400223710ustar00rootroot00000000000000Objavro.schema{"type":"long"}0123456789abcdefgoavro-2.10.1/fixtures/cannotReadBlockSize.avro000066400000000000000000000000631412474230400215160ustar00rootroot00000000000000Objavro.schema{"type":"long"}0123456789abcdefgoavro-2.10.1/fixtures/cannotReadSyncMarker.avro000066400000000000000000000000661412474230400217120ustar00rootroot00000000000000Objavro.schema{"type":"long"}0123456789abcdefabgoavro-2.10.1/fixtures/firstBlockCountNotGreaterThanZero.avro000066400000000000000000000000631412474230400244130ustar00rootroot00000000000000Objavro.schema{"type":"long"}0123456789abcdefgoavro-2.10.1/fixtures/quickstop-deflate.avro000066400000000000000000000540661412474230400212720ustar00rootroot00000000000000Objavro.codecdeflateavro.schema{"type":"record","name":"Person","fields":[{"name":"ID","type":{"type":"long"}},{"name":"First","type":{"type":"string"}},{"name":"Last","type":{"type":"string"}},{"name":"Phone","type":{"type":"string"}},{"name":"Age","type":{"type":"int"}}]}燞՞OX7ʹ "odq7IcUU֪zu>WAGݟw|랎o:z%xg-mF|볎'_ttO>kok{]_vu׽MMͭ"/^}w]/;_ְǑ:y$ ۾c۾Km#?嶦 &eަۏlMAr[smඟضRω郭9ma նS;- _Zh^ضW%?(7ڂ폶Ogbڃ^>ڃ,~m=5}Clx= I[]C#o뭠 M8P`'Uan1c\mD&F caqfF"E0 7fqR6"yXp8Ō%iD<1734""ڈ,eqWqMumDN6 øō̸#ʮ6"+{qa-:/}A,?F?~284<ǼCY"S/J*PָΔ #bdžġ<3<ʁ P0]++oV( S vo(G|bg  b͡lʩVn*pY9f[rVsS. p.)H[6+*WƙrM(G0غp(7r)C4+]CPsSFID{R2Pf2˔9D{J eѡ,qe)+BTʺC#<JD)ўSCyPsS^ %shϵq(o)yPJP>9\”B)%3ƕSmr*gڈr-D{ښhϼ\p(r)YQJgP5L0P=~gPn9\ÔQ%@R=<$SI+%ړ1Y2Ǖy,%STJd(e+LYJhOP9\y”B)%sn(/KbkD{n5w=W>0P=OJ<+ߘr=D{ڛht;Wδvk'yD{Y4K2W0P=fڳf(:Wn0fT Ph϶q(wr)9PJP&W2#hOV)ў;,2eI(ўR=CYu(k\YgʆP=GJ+Ϙ\(ў D{. Cy͕7Ly+hϝR=|'|JE)ўWCPNvQ%3ѬD{f;ñ_rS. %ڳԬ ўeCPre)עJ„R=rá-JgG)ў]CPsS %ړTJ'e(e+LJ'hOPW"hOU)ў; 0cT Phϳ|q(_)'{JgYLece//YPJgP.9\”BĕY3 r+7rS(ў-D{ C˕{L/hρR=2P2͔D{J eޡ,pe)KBʪCY:S6燞՞OX7͞ !_D cz^XNY>Tû?}ӻ g_}7}K7|ס/GO~0?i~w_"lÛ>_? |qo>~P)߿CYt(K\YfJD?PXUʮ Q3u+I(Y)ċ|u(Gr( h+{d#WN5iaY 9 ļ\p(r)2  r͡\  Rʾ m(w]c} ȡ<< :O\̔/B*%3ĕcMr*'r)D{JgP8\9ǔB,(%ڳh(e\aUD{֔Y7&Wn1P=;J=r+P(ў#D{ CyʕgLy.hυR=ʡJ.F{ eҡLqe)3Q%@aV)ў;,2eI(ўR=CYu(k\YgD{y2 W2hsT5Ǖhx3WN4fj&iD{f5sgP8\yĔQ%@R=̡< JJ)ўkCyPrS %ړTJ'e(e+LJ'hOPW"hOU)ў;\ȔOB<+%b(_v_=D{&Jg/%W0P=sJr+rY(ўD{V CΕL)hV\ُlr+r?(;/\ʔQ%3W=~/;~KD{f5sW0eN(ўCD{ Cy•Ly&hϹR=2P^rS^ %shϭs(ﹲB<+{ў'CPp+SJ’R=eCYq(\Ycx[T+ў6clsm|lcP= D{2Pp,S %3hς\t(r)B(%ړ1W3P=YD{6 C͕;L+hϞR=q!S %shωW0eN(ўCD{ Cy•Ly&hyr0y+:(QJP9\Y`D{y2 W2eQ(ўR=eCYq(\YcxgTu+ўNct엝|dP= D{2Pp,S %3hς\t(r)B(%ړ1W3P=YD{6 C͕;L+hϞR=q!S %s\D{N CyƕLyU4W5W0P=wJʂCL$hϳR=/ա,re)BTʚCe/Jg^LvecS&I)%燞՞OX7͞ !Ge[m/kշ_a}g>>9'gi:s$I1c1#I$I$IFd$I$yzo?+|z/#w?u 1F|e仟>22~xhp(Fƾ~/so_o@cFǿFao-o~ߛ?mL?}/ 592ϕ, `r( ešrS. P(p[5k:Wn0PV2|>mr+r? ( vh(cl(+e+r9(\QJgP9\Bl)%ڳm(w]c}D{94G1W0P=5D{ CyuJJ)ўkCp(o)WJP>:O\̔/B*%3e/_l Jg6hV=srޡ\E %ړPJ'i(Se+3LJ'׬ ў,8E,1e9((%S5K2W0P=kJ r+r[(ўD{v CϕLy(hϑR=džġ<S %shυ;\yŔB41W3P=J<g+_r#D{&;hT_v8_vR(ўyD{ CS&I5+ў8Y1e>(,(%S4%̕ %ڳhϲ\q(Wr)ׅPJgPn9\ÔB)%ڳo(C'+_U(M)cAr*Εq[98QţXǻ `(S"W.1PƃpE)A2m(3*W1P&pKc4[6W0nD (SOArP8\yĔB)O'H3+/R({J){䵡q(o)*%ړ3yE, %SVJb(e+LJhOPW>0P=OJ<+ߘr*U=Ӊn%3لKP|(I)%ڳh(e\aʴP=n'gP9\Q%@R=ۆrǡ=J@)ўCCyPs S %shϹp(/)QJP9\eʜP=yD{ P̔D{J eݡlpe)[B+Y(ўD{^ C95@D93U=Jgn+lWrQ(ўneڳl(W4Wfr5(\SJgPn8\ŔB(%ڳk(}<`CD{96')W1P=J\++oV(ў;D{ e֡qe) B)ʲCY*Sք+%0Mŕm%hϣ|r()_ySJgj/ /~9U=]^gPE\bpE)ў8\ƔBl(%ڳi(ma]D{7!W1P='J3+/R(ў+D{ Cy˕wLy/hOV)ў;,2eI(ўR=CYu(k\YgʆP=MD{ZPv)yRJP8\ƔSCQ%3=ԥ ў!crC|beT PRJgP.9\”iD{2J5r+7rS(ў-D{ C˕{L/hρR=ȡ<<JL)ўsCyP^rS^ %shϭs(2˔9D{J eѡ,qe)+BTʺC&Si+%1#W>1P=/>P9S|fpD P8;ܭD{憍rر_re)YRJgP8i0P=kJ r+r[(ўD{v CϕLy(hϑR=džġ<3<JB)ўKCyP^s S %shϽ:93eA(ўR=%CYv(+\YeʚP=uD{P͔D{hϣ|r()_J7D{Frı_r#Q%3?ҭD{ eʡ\%\JgE)ў8\ƔBl(%ڳi(ma]D{7!W1P='J3+/R(ў+D{ Cy˕wLy/hOV)ў;,2eI(ўR=CYu(k\YgʆP=ne iʶC|* y6/+W1hTLv+ўQcu엣|eP=)D{ C̕+LJ'hϪ\s(׹r)7RJgP8\ǔB(%sh(c8\ĔB(%j(ʩ$/lLFhl[%2/2ŔB,)%ڳl(W4WfrU(ў5D{ Cɕ[L-hώR=rϡ<JH)ўcCyPrS %shϥr()oSJPfW晲 hO[@{JPVʔhOP6W#h燞՞OX7͞ !Gc{rN}3avwG=Hd$H$I$I$H$c1ƾ|ޟ7x=y{Ǿ響)H2<߆=~lii ~-m$~ȷc? ߆"?;ګv~2:o##sSsSsk-?uȱ5ljc=nh?>PP N[9ID9U6tg\Cr(\9˔ pN)[DPy\`El %l ˆršs*S ek+e[0r+r[(ۂpG)ۃĮs()=#H+Ϙ\(;"l1H\++o6(SOAh(|d'JW+KLYJhOP:W62U=]q%3ŕ]r*rF(ўR=iC9Pfr)B7 "W.1P=+J CƕL!hOA)ўMCPnsS %ڳhϾ%hϓ|v(_)߄))%S6ʕ5 %PJ'c==l*ў鞸1)2͕LJgN)ў9\B,)%ڳl(W3P=JʒCY SV)%S7 2>_Ehd_\LecSUJ'c(,Wr^(ўD{ C̕+LJg5lA{ C܌* 6;.W1P=J#+OT(ў3D{ CyɕWLy-hύR=Ρ,r=S>%hϓ|v(_)߄))%S6ʕ5 %PJ'ol*ў1)2͕LJgN)ў9\B,ŕhϲ\q(\ʔkQ%@R=PnrSn %ڳhϮs()9RJP8\yƔB\(%si(ka[D{){+I(ўgD{^ Cƕ%, %SQJj(ke+L*ў0~9_ %ړRJ'm(g W1eV(ў\\نr+r9(\QJ'o(W5\g D{ Jl-r+wrW(ў=D{ CyȕGLy,hωR=̡< JJ)ўkCyPrSWJP>:O\̔/B*%f(Ke++LYJhOP6 /~9U=q%35h엃+SLJg6lG{2rΡre)JD{ C̕+LJgU)ў5CPnpe)7RJgP8\ǔB(%sh(cw=_˿G޼љeg>* d(|exwT1' 6͕Sݶr*gr6 )%ړ0I2ŕi%ړUJ'g(e+LYJhOPVW֙r^(ўR= rѡ\e\JgU)ў5CPnp&Sn %ڳhώu(r)9TJP;'\yʔgB+%sa(/+fD{n3W>2P=hϋ|u(=T9C=%@TOʙ[9˕sLJ'hOP Wf2'hO^)ў,:%,3eE(ўR=5CYw(繲 B,*%ڳd(\e5D{֕0Wn3P=J}+H(ўcD{N CyƕLy!hϥR=Wڡ[JYق<G+%(|UJ'˕r*zr7D{fzhϬs(\dʔP=iD{22P2ϔD{J e١pe)kBԕ7 r+rI(ўeD{V CƕL!hϦR=[rۡ]Jg_)ўCyPq1S%shϙ_R(ў9D{2P2͔D{J eޡ,pe)KBʪCY:S %PJgP.:K\̔+B*%ڳf( d-D{1W3P=Jc+OL(ўfe\K+&(UJP;\ȔOB<+%b(_x?/~U=SJg/%W1eB(ўR=)Cv(3\eʜP=yD{ P̔D{J eݡS.%ڳhϒ\v(Wr)ׄYWJgPn:[\͔;B*%ڳg(p#S> %hϋ|u(C|bPTL 5+ў!cr\9ǔ D{J eڡpe)sB)ʢCY2SV*%S3ur+L hϢR=Kr١\U\JgYىlMr+r'(UJgP;\yȔGB+%sb(O3 eޡ,pe)KQ%@aY)?:5c}J Cy̕'Ly*MAxAP^8\yŔB7J[CyPsSօ2 Gl bO١|WwE-A8ըl b]\9e+r(g5g-yr+rI(ۂpY)ۃ؊\u(׸r)7=7#mmr+w2!hOAIʴC,S"¼R=CYt(K\YfʊP=UD{jrϡ<JH)ўcCyPrS %shϥr()oSJP>8u|d'D{y1e*ǻr;D{&hT7WNwe9D{Y0W.3P=Jur+7rK(ўFgP: L2e*(L+%ړ1Y2Ǖy,%STJd(e+LYJgO)ў}CyPrS %shϩ3P=J{rVNPdQNDhtO1W3P=J,er+WrM(ўFeڳa(7-fʝpW)ўL:)L3eF(ўR=9Cw( \YdʒP=eD{*Pָr)9PJP9\y”B)%sn(/KbkD{n5w=W>0e](ўGD{ C•L*ўF%3k엽lJgV)ў9C9P.p"S. %ڳܨlF{V CƕLUn*%ڳe(eʄP=ID{R2Pf2˔9D{J eѡ,qe)+BT=r+P(ў#D{ CyʕgLy.hυR=ʡJN)ў{CPֹ)yVJP:>_/D{&hT_9Ke9D{!ڳ`(%\fʕpU)ў5CPnp&Sn %ڳhώu(\dʔP=iD{22P2ϔD{J e١pe)kB)%ڳo(C0e](ўGD{ C•L*ўe+31`rl* g3W.2P=JUr+יrC(ўMD{ CÕLJ'hOP Wf2'hO^)ў,:%,3eE(ўR=5CPsS %shϱ2P=ύ6P:A_r0(lT=S~9/r)WJgP.:K\̔+B*%ڳf( d-D{1e+LJ'hOPfW晲 hOQ)ў,;2eM(ў=D{ CyȕGLy,hωR=̡< JJ)ўkCyPrS %ШlG{ѡ|g|* _c/JgzQYr+rA(ўED{ C•L&hϺR=rӡmJgW)ўL:)L3eF(ўR=9Cw( \YdʒP=eD{*Pָ燞՞OX7͔+qۦm?;ݙJ/ɉr9 &&l.&I$I$I$I$Im$I$I$ɸϯw}ܾy3Tn)˕Wj1E҉TPJ&ʄR4snGOoh=7#-i" hzܨ0ڜLs;toWw7Qqin.z% UQ+Ly( 72VPL(@+Q勄WaN)P~I(qAR:daKP娅(dS@'I Ɣ3R2) 9r^B+1$@9tE\PuL(5 R%PnK(wp.c:}Nz PJ(p1 Dante Hicks(5)@@ Randal Graves(555) 123-5678 Randal Graves(555) 123-5678<>VeronicaLoughran(555) 123-09878>CaitlinBree(555) 123-23236>Bob Silent(555) 123-6422:>Jay??? (661)4> Dante Hicks (662)@> Randal Graves(555) 123-5678<>VeronicaLoughran(555) 123-09878>CaitlinBree(555) 123-23236>Bob Silent(555) 123-6422:>Jay??? (662)4> Dante Hicks (663)@> Randal Graves(555) 123-5678<>VeronicaLoughran(555) 123-09878>CaitlinBree(555) 123-23236>Bob Silent(555) 123-6422:>Jay??? (663)4> Dante Hicks (664)@> Randal Graves(555) 123-5678<>VeronicaLoughran(555) 123-09878>CaitlinBree(555) 123-23236>Bob Silent(555) 123-6422:>Jay??? (664)4> Dante Hicks (665)@> Randal Graves(555) 123-5678<>VeronicaLoughran(555) 123-09878>CaitlinBree(555) 123-23236>Bob Silent(555) 123-6422:>Jay??? (665)4> Dante Hicks (666)@> Randal Graves(555) 123-5678<>VeronicaLoughran(555) 123-09878>CaitlinBree(555) 123-23236>Bob Silent(555) 123-6422:>Jay??? (666)4> Dante Hicks (667)@> Randal Graves(555) 123-5678<>VeronicaLoughran(555) 123-09878>CaitlinBree(555) 123-23236>Bob Silent(555) 123-6422:>Jay??? (667)4> Dante Hicks (668)@> Randal Graves(555) 123-5678<>VeronicaLoughran(555) 123-09878>CaitlinBree(555) 123-23236>Bob Silent(555) 123-6422:>Jay??? (668)4> Dante Hicks (669)@> Randal Graves(555) 123-5678<>VeronicaLoughran(555) 123-09878>CaitlinBree(555) 123-23236>Bob Silent(555) 123-6422:>Jay??? (669)4> Dante Hicks (670)@> Randal Graves(555) 123-5678<>VeronicaLoughran(555) 123-09878>CaitlinBree(555) 123-23236>Bob Silent(555) 123-6422:>Jay??? (670)4> Dante Hicks (671)@> Randal Graves(555) 123-5678<>VeronicaLoughran(555) 123-09878>CaitlinBree(555) 123-23236>Bob Silent(555) 123-6422:?Jay??? (671)4? Dante Hicks (672)@? Randal Graves(555) 123-5678J,~goavro-2.10.1/fixtures/quickstop-snappy.avro000066400000000000000000001061141412474230400211700ustar00rootroot00000000000000Objavro.schema{"fields":[{"name":"ID","type":{"type":"long"}},{"name":"First","type":{"type":"string"}},{"name":"Last","type":{"type":"string"}},{"name":"Phone","type":{"type":"string"}},{"name":"Age","type":{"type":"int"}}],"name":"Person","type":"record"}avro.codec snappyG0@&gxH] C Dante Hicks(0)@ Randal Graves(555) 123-5678<VeronicaLoughran#H09878CaitlinBree@23236 Bob Silent46422: Jay???46 1)@vrj 1)46 2)@v r"j$ 2)4&6 3)@(v*,r.j0 3)426 4)@4v68r:j< 4)4>6 5)@@vBDrFjH 5)4J6 6)@LvNPrRjT 6)4V6 7)@XvZ\r^j` 7)4b6 8)@dvfhrjjl 8)4n6 9)@pvrtrvjx 9)4z.(10)@|v~rj4:1)@vvn. 1)4> 2)@zvn. 2)4> 3)@zvn. 3)4> 4)@zvn. 4)4> 5)@zvn. 5)4> 6)@zƊvn. 6)4> 7)@zҊvn. 7)4> 8)@zފvn. 8)4> 9)@zꊤvn. 9)4:20)@zvn20)4>1)@vhhr j  21)42 (22)@zvn. 2)4> 3)@zvn. 3)4> 4)@zvn. 4)4> 5)@zvn. 5)4> 6)@zvn. 6)4> 7)@zʊvn. 7)4> 8)@z֊vn. 8)4> 9)@z⊤vn. 9)4:30)@zvn30)4> 1)@zvnh31)46t 32)@v  r j . 2)4> 3)@zvn. 3)4> 4)@zvn. 4)4> 5)@zvn. 5)4> 6)@zvn. 6)4> 7)@zŠvn. 7)4> 8)@zΊvn. 8)4> 9)@zڊvn. 9)4:40)@z护vn40)4> 1)@zvn. 1)4> 2)@zrhjh 42)46 43)@z< vn. 3)4> 4)@zvn. 4)4> 5)@zvn. 5)4> 6)@zvn. 6)4> 7)@zvn. 7)4> 8)@zƊvn. 8)4> 9)@zҊvn. 9)4:50)@zފvn50)4> 1)@zꊤvn. 1)4> 2)@zvn. 2)4>3)@vt hr j  53)46 54)@zvn. 4)4> 5)@zvn. 5)4> 6)@zvn. 6)4> 7)@zvn. 7)4> 8)@zvn. 8)4> 9)@zʊvn. 9)4:60)@z֊vn60)4> 1)@z⊤vn. 1)4> 2)@zvn. 2)4> 3)@zvn((63)46h64)@v (r j . 4)4> 5)@zvn. 5)4> 6)@zvn. 6)4> 7)@zvn. 7)4> 8)@zvn. 8)4> 9)@zŠvn. 9)4:70)@zΊvn70)4> 1)@zڊvn. 1)4> 2)@z护vn. 2)4> 3)@zvn. 3)4> 4)@zrhjh 74)46 75)@v vn. 5)4> 6)@zvn. 6)4> 7)@zvn. 7)4> 8)@zvn. 8)4> 9)@zvn. 9)4:80)@zƊvn80)4> 1)@zҊvn. 1)4> 2)@zފvn. 2)4> 3)@zꊤvn. 3)4> 4)@zvn. 4)4>5)@vhhv&6j  85)46 86)@zvn. 6)4> 7)@zvn. 7)4> 8)@zvn. 8)4> 9)@zvn. 9)4:90)@zvn90)4> 1)@zʊvn. 1)4> 2)@z֊vn. 2)4> 3)@z⊤vn. 3)4> 4)@zvn. 4)4> 5)@zvn h95)4 6h96)@ v   r j . 6)4> 7)@zv j. 7)4> 8)@zvn. 8)4> 9)@zvn. 9)42 (100)@zvn 4B 1)@zŠvn2 1)4B 2)@zΊvn2 2)4B 3)@zڊvn2 3)4B 4)@z抦vn2 4)4B 5)@zvn2 5)4B 6)@z runC a6)4 .7)@ v vn2 7)4B 8)@zvn2 8)4B 9)@zvn2 9)4>#@zvn.10)4B 1)@zvn2 1)4B 2)@zƊvn2 2)4B 3)@zҊvn2 3)4B 4)@zފvn2 4)4B 5)@zꊦvn2 5)4B 6)@zvn2 6)4B7)@ v| | r" j  "17)4 :"5@zvn2 8)4B 9)@zvn2 9)4>9@zvn.:@B 1)@zvn2 1)4B 2)@zvn2 2)4B 3)@zʊvn2 3)4B 4)@z֊vn2 4)4B 5)@z⊦vn2 5)4B 6)@zvn2 6)4B 7)@zvn"GQ (1H@ :|I@ v" " r" j"2 8)4B 9)@zvn2 9)4>M@zvn.N@B 1)@zvn2 1)4B 2)@zvn2 2)4B 3)@zŠvn2 3)4B 4)@zΊvn2 4)4B 5)@zڊvn2 5)4B 6)@z抦vn2 6)4B 7)@zvn2 7)4B 8)@z r| j| * ^@ :"_@ v" "vn2 9)4>a@zvn.b@B 1)@zvn2 1)4B 2)@zvn2 2)4B 3)@zvn2 3)4B 4)@zƊvn2 4)4B 5)@zҊvn2 5)4B 6)@zފvn2 6)4B 7)@zꊦvn2 7)4B 8)@zvn2 8)4B9)@v||r"j""t@2_ (150)@zvn.v@B 1)@zvn2 1)4B 2)@zvn2 2)4B 3)@zvn2 3)4B 4)@zvn2 4)4B 5)@zʊvn2 5)4B 6)@z֊vn2 6)4B 7)@z⊦vn2 7)4B 8)@zvn2 8)4B 9)@zvn|@: @ rf"r"j".@B 1)@zvn2 1)4B 2)@zvn2 2)4B 3)@zvn2 3)4B 4)@zvn2 4)4B 5)@zŠvn2 5)4B 6)@zΊvn2 6)4B 7)@zڊvn2 7)4B 8)@z抦vn2 8)4B 9)@zvn2 9)4>@zr|j|Ja)m (4:"@z/m"vn2gBgzvn2gBgzvn2gBgzvn2gBgzvn2gBgzƊvn2gBgzҊvn2gB 8)@zފvn2gBgzꊦvn2 g>8!gzvn.@B#gv |r"j""@:"@zvn2&gB'gzvn2(gB)gzvn2*gB 5)@zvn2,gB-gzvn2.gB/gzʊvn20gB1gz֊vn22gB3gz⊦vn24g>@zvn.@B7gzvn|@:|@v"zr"j"2:gB;gzvn2 3)4B=gzvn2>gB?gzvn2@gBAgzvn2BgB 7)@zŠvn2DgBEgzΊvn2FgBGgzڊvn2 9)4:2@z抦vn4BKgzvn2LgBMgzr|j|"2@6"2@v"Dvn2PgBQgzvn2RgBSgzvn2TgB 6)@zvn2VgBWgzvn2XgB 8)@zƊvn2ZgB[gzҊvn2\g>@zފvn.@B_gzꊦvn2`gBagzvn2bgBcgv||vmj""@:"@zvn2fgBggzvn2hgBigzvn2jgBkgzvn2lgBmgzvn2ngBogzʊvn2pg>@z֊vn.@Bsgz⊦vn2tgBugzvn2 2)4Bwgzvn|@:|@v""rDj"2zgB 5)@zvn2 5)4B}gzvn2~gBgzvn2gBgzvn2gBgzŠvn2g>@zΊvn.@Bgzڊvn2 1)4Bgz抦vn2gB 3)@zvn2gBgzr|j|"@:"@v""vn2gBgzvn2gBgzvn2gBgzvn2 8)4Bgzvn2g>@zƊvn.@BgzҊvn2gBgzފvn2gBgzꊦvn2gBgzvn2gBgv||r"j""@:"@zvn2 6)4B 7)@zvn2gBgzvn2gBgzvn2g>@zvn.@Bgzʊvn2gBgz֊vn2 2)4B 3)@z⊦vn2gBgzvn2gB 5)@zvn" (2@:|@v""r"j"2gBgzvn2gBgvLvn2gBgzvn2 9)4>@zvn.@BgzŠvn2gBgzΊvn2gBgzڊvn2gBgz抦vn2gBgzvn2gBgzr| Bbh"@>2@:"@v"vn2gBgzvn2gBgzvn2 9)4>@zvn.vBgzvn2gBgzƊvn2gBgzҊvn2 3)4Bgzފvn2 4)4Bgzꊦvn2gBgzvn2gBgv||r"j""@:"@zvn2gBgzvn2g>@zvn.@Bgzvn2gBgzvn2gBgzʊvn2gBgz֊vn2gBgz⊦vn2 5)4Bgzvn2 6)4Bgzvn|@:|@v""r"j"2gBgzvn2g>@zvn.@Bgzvn2 1)4Bgzvn2gBgzŠvn2gBgzΊvn2gB 5)@zڊvn2gBgz抦vn2gBgzvn2gB 8)@zr|j|"@:"@z"vn2g:3@zvn4B 1)@zvn2gBgzvn2gBgzvn2gBgzƊvn2gBgzҊvn2gBgzފvn2 6)4Bgzꊦvn2gBgzvn2 8)4Bgv |r"j""3@6"3@zvn.@Bgzvn2gBgzvn2gBgzvn2gBgzvn2gBgzʊvn2gBgz֊vn2 6)4Bgz⊦vn2gB 8)@zvn2gBgzvn|@:|@ vg2r"j".@Bgzvn2gBgzvn2gBgzvn2gBgzvn2gBgzŠvn2gBgzΊvn2 6)4Bgzڊvn2gBgz抦r0n2gB 9)@zvn2g>@zrLj|"@:"@v"Dvn2gBgzvn2gBgzvn2gBgzvn2gBgzvn2 5)4BgzƊvn2gBgzҊvn2gBgzފvn2gBgzꊦvn2g>@zvn.@Bg v| |v j" "@ :"@zvn2gBgz Lvn2 3)4Bgzvn2gBgzvn2 5)4Bgzvn2 6)4B 7)@zʊvn2gBgz֊vn2gBgz⊦vn2g>@zvn.@Bgzvn!|@!:|@!v"!!rD!j"2gBgzvn2gBgzvn2 4)4Bgzvn2gBgzvn2gBgzŠvn2gBgzΊvn2gBgzڊvn2g>@z抦vn.@B 1)@zvn2gBgz"r|np""6":"@"v"""vn2gBgzvn2 4)4Bgzvn2gBgzvn2gBgzvn2gBgzƊvn2gBgzҊvn2g>7zފvn.@Bgzꊦvn2gB 2)@zvn2 2)4Bg#v|#|#r"#j #"bE3@#:"@zvn2gBgzvn2gBgzvn2gB 7)@zvn2 7)4Bgzvn2gBgzʊvn2g>@z֊vn.@Bgz⊦vn2 1)4Bgzvn2gBgzvn" (3@$:|@$v"$"$r"$j"2gBgzvn2 5)4Bgzvn2gBgzvn2gBgzvn2gBgzŠvn2 9)4>@zΊvn.@Bgzڊvn2gBgz抦vn2gBgzvn2gBgz%r|%j|%* @%:"@%v"%"vn2 5)4Bgzr-09878%CaitlinBree(555) 123-23236%Bob Silent6422:%Jay??? (396)4% Dante Hicks (397)@% Randal Gravesc`5678<%VeronicaLoughran$v%j2 7)4B8)@%vvn2 8)4B 9)@zvn2 9)4:400)@zƊvn4B 1)@zҊvn2 1)4B 2)@zފvn2 2)4B 3)@zꊦvn2 3)4B 4)@zvn2 4)4B5)@&v&0&r&j0&405)4&6406)@&vvn2 6)4B 7)@zvn2 7)4B 8)@zvn2 8)4B 9)@zvn2 9)4>10)@zvn.10)4B 1)@zʊvn2 1)4B 2)@z֊vn2 2)4B 3)@z⊦vn2 3)4B 4)@zvn2 4)4B 5)@zv&j|'|15)4':|16)@'v|'"'r"'j2 6)4B 7)@zvn2 7)4B 8)@zvn2 8)4B 9)@zvn2 9)4>20)@zvn.20)4B 1)@zŠvn2 1)4B 2)@zΊvn2 2)4B 3)@zڊvn2 3)4B 4)@z抦vn2 4)4B 5)@zvn2 5)4B 6)@z(r|(j|("26)4(:"27)@(v"("vn2 7)4B 8)@zvn2 8)4B 9)@zvn2 9)4>30)@zvn.30)4B 1)@zvn2 1)4B 2)@zƊvn2 2)4B 3)@zҊvn2 3)4B 4)@zފvn2 4)4B 5)@zꊦvn2 5)4B 6)@zvn2 6)4B7)@)v|)|)r")j")"37)4):"38)@zvn2 8)4B 9)@zvn2 9)4>40)@zvn.40)4B 1)@zvn2 1)4B 2)@zvn2 2)4B 3)@zʊvn2 3)4B 4)@z֊vn2 4)4B 5)@z⊦vn2 5)4B 6)@zvn2 6)4B 7)@zvn*|47)4*:|48)@*v"*"*r"*j"2 8)4B 9)@zvn2 9)4>50)@zvn.50)4B 1)@zvn2 1)4B 2)@zvn2 2)4B 3)@zŠvn2 3)4B 4)@zΊvn2 4)4B 5)@zڊvn2 5)4B 6)@z抦vn2 6)4B 7)@zvn2 7)4B 8)@z+r|+j|+"58)4+:"59)@+v"+"vn2 9)4>60)@zvn.60)4B 1)@zvn2 1)4B 2)@zvn2 2)4B 3)@zvn2 3)4B 4)@zƊvn2 4)4B 5)@zҊv+j2 5)4B 6)@zފvn2 6)4B 7)@zꊦvn2 7)4B 8)@zvn2 8)4B9)@,v|,|,r",j,"V/469)4,:"70)@zvn.70)4B 1)@zvn2 1)4B 2)@zvn2 2)4B 3)@zvn2 3)4B 4)@zvn2 4)4B 5)@zʊvn2 5)4B 6)@z֊vn2 6)4B 7)@z⊦vn2 7)4B 8)@zvn2 8)4B 9)@zvn-|79)4-:|80)@- Rn5-"-r"-j".80)4B 1)@zvn2 1)4B 2)@zvn2 2)4B 3)@zvn2 3)4B 4)@zvn2 4)4B 5)@zŠv-j>2 5)4B 6)@zΊvn2 6)4B 7)@zڊvn2 7)4B 8)@z抦vn2 8)4B 9)@zvn2 9)4>90)@z.r|.j>."90)4.:"91)@.v"."vn2 1)4B 2)@zvn2 2)4B 3)@zvn2 3)4B 4)@zvn2 4)4B 5)@zvn2 5)4B 6)@zƊvn2 6)4B 7)@zҊvn2 7)4B 8)@zފvn2 8)4B 9)@zꊦvn2 9)4:5@zvn4B1)@/v|/|/r"/j"/"501)4/6"5@zvn2 2)4B 3)@zvn2 3)4B 4)@zvn2 4)4B 5)@zvn2 5)4B 6)@zvn2 6)4B 7)@zʊvn2 7)4B 8)@z֊vn2 8)4B 9)@z⊦vn2 9)4>@zvn.@B 1)@zvn0|@0:|@0v"0"0r"0j"2 2)4B 3)@zvn2 3)4B 4)@zvn2 4)4B 5)@zvn2 5)4B 6)@zvn2 6)4B 7)@zŠvn2 7)4B 8)@zΊvn2 8)4B 9)@zڊvn2 9)4>@z抦vn.@B 1)@zvn2 1)4B 2)@z1r|1j|1"@1:"@1v"1"vn2 3)4B 4)@zvn2 4)4B 5)@zvn2 5)4B 6)@zvn2 6)4B 7)@zvn2 7)4B 8)@zƊvn2 8)4B 9)@zҊvn2 9)4>@zފvn.@B 1)@zꊦvn2 1)4B 2)@zvn2 2)4B3)@2v|2|2r"2j"2"@2:"@zvn2 4)4B 5)@zvn2 5)4B 6)@zvn2 6)4B 7)@zvn2 7)4B 8)@zvn2 8)4B 9)@zʊvn2 9)4>@z֊vn.@B 1)@z⊦vn2 1)4B 2)@zvn2 2)4B 3)@zvn3|@3:|@3v"3"3r"3j"3. 4)4B 5)@zvn2 5)4B 6)@zvn2 6)4B 7)@zvn2 7)4B 8)@zvn2 8)4B 9)@zŠvn2 9)4>@zΊvn.50)4B 1)@zڊvn2 1)4B 2)@z抦vn2 2)4B 3)@zvn2 3)4B 4)@z4r|4j|4|@4:"@4v"4"vn2 5)4B 6)@zvn2gB 7)@zvn2gBgzvn2gB 9)@zvn2g>@zƊvn.@BgzҊvn2gBgzފvn2gBgzꊦvn2gBgzvn2gBg5v|5|5r"5j"5"@>5@5:"@zvn2gB 7)@zvn2 7)4Bgzvn2gBgzvn2g>@zvn.@Bg5vʊvn2gBgz֊vn2gBgz⊦vn2gBgzvn2gBgzvn6|@6:|7g6v>6"6r"6j"2gBgzvn2 7)4Bgzvn2gBgzvn2g>@zvn.@BgzŠvn2gBgzΊvn2gBgzڊvn2 3)4Bgz抦vn2gBgzvn2 5)4Bgz7r|7j|7"8g7:"@7v"7"vn2gBgzvn2gBgzvn2g>90)@zvn.@Bgzvn2gB 2)@zƊvn2gBgzҊvn2gBgzފvn2gBgzꊦvn2 5)4Bgzvn2gBg8v|8|8r"8j"8"@8:"@zvn2gBgzvn2g:6@zvn4Bgzvn2gB 2)@zvn2gBgzʊvn2gB 4)@z֊vn2gB 5)@z⊦vn2gBgzvn2gB 7)@zvn9|60g96|6@9v"9"9r"9j"2gB 9)@zvn2g>@zvn.@B 1)@zvn2gBgzvn2gBgzŠvn2gBgzΊvn2gBgzڊvn2gBgz抦vn2gBgzvn2gBgz:r|:j|:"@::"@:v":"vn2g>@zvn.@Bgzvn2gBgzvn2gBgzvn2gBgzƊvn2gBgzҊvn2 5)4Bgzފvn2gBgzꊦvn2gBgzvn2gBg;v|;|;r";j";"@;:"@zvn.@Bgzvn2gBgzvn2gB 3)@zvn2gBgzvn2gBgzʊvn2gBgz֊vn2gB 7)@z⊦vn2gB 8)@zvn2gBgzvn<|@<:|@< vg<"2gBgzΊvn2gBgzڊvn2gBgz抦vn2gB 9)@zvn2g>@z=r|=j>="@=:"@=v"="vn2gBgzvn2gBgzvn2 3)4Bgzvn2gB 5)@zvn2 5)4BgzƊvn2gBgzҊvn2gBgzފvn2gB 9)@zꊦvn2g>@zvn.@Bg>v|>|>r">j">"@>:"62)@zvn2gBgzvn2gBgzvn2 4)4Bgzvn2gBgzvn2gBgzʊvn2gB 8)@z֊vn2gBgz⊦vn2g>@zvn.@Bgzvn?"D6@?:|@?v"?"?r"?j"2gBgzvn2gBgzvn2gBgzvn2gBgzvn2gB 7)@zŠvn2 7)4BgzΊvn2gBgzڊvn2g>@z抦vn.@Bgzvn2gBgz@r|@j|@"@@:"@@v"@"vn2gBgzvn2gBgzvn2gB 6)@zvn2gBgzvn2gBgzƊvn2gBgzҊvn2g>@zފvn.@Bgzꊦvn2gBgzvn2gBgAv|A|Ar"Aj"A"93)4A:"@zvn2gBgzvn2gBgzvn2gBgzvn2gBgzvn2gBgzʊvn2 9)4:7@z֊vn4Bgz⊦vn2gBgzvn2gB 3)@zvnB|7@B6|7@Bv"B"Br"Bj"2gBgzvn2gBgzvn2gBgzvn2gBgzvn2gBgzŠvn2g>@zΊvn.@Bgzڊvn2gBgz抦vn2gBgzvn2gBgzCr|Cj|C"@C:"@Cv"C"vn2gBgzvn2gBgzvn2gBgzvn2gBgzvn2g>@zƊvn.@BgzҊvn2gBgzފvn2gBgzꊦvn2gBgzvn2 4)4B5)@Dv|D|Dr"Dj"D"@D:"@zvn2 6)4Bgzvn2gBgzvn2gBgzvn2g>3zvn.@Bgzʊvn2gBgz֊vn2gBgz⊦vn2gBgzvn2gBgzvnE|@E:|@Ev"E"Er"Ej"2gBgzvn2gBgzvn2gBgzvn2g>@zvn.@BgzŠvn2gBgzΊvn2gBgzڊvn2gBgz抦vn2gBgzvn2 5)4BgzFr|Fj|F"@F:"@Fv"F"vn2 7)4Bgzvn2gBgzvn2g>@zvn.@Bgzvn2gBgzƊvn2gBgzҊvn2gBgzފvn2gBgzꊦvn2gBgzvn2gBgGv|G|Gr"Gj"G"@G:"@zvn2gB 9)@zvn2g>@zvn.@B 1)@zvn2gBgzvn2gB 3)@zʊvn2gBgz֊vn2gBgz⊦vn2gBgzvn2gBgzvnH"@>7@H:|@Hv"H"Hr"Hj"2gBgzvn2g>@zvn.@Bgzvn2gBgzvn2 2)4BgzŠvn2gBgzΊvn2gBgzڊvn2 5)4Bgz抦vn2gBgzvn2gBgzIr|Ij|I"@I:"@Iv"I"vn2g>@zvn.@Bgzvn2gBgzvn2gBgzvn2gBgzƊvn2gBgzҊvn2gBgzފvn2gBgzꊦvn2gBgzvn2 8)4BgJv|J|Jr"Jj"J"@J:"@zvn.@Bgj67810)@zLr|Lj|L"4L:"11)@Lv"L"vn2 1)4B 2)@zvn2 2)4B 3)@zvn2 3)4B 4)@zvn2 4)4B 5)@zvn2 5)4B 6)@zƊvn2 6)4B 7)@zҊvn2 7)4B 8)@zފvn2 8)4B 9)@zꊦvn2 9)4>20)@zvn.20)4B1)@Mv|M|Mr"Mj"M"21)4M:"22)@zvn2 2)4B 3)@zvn2 3)4B 4)@zvn2 4)4B 5)@zvn2 5)4B 6)@zvn2 6)4B 7)@zʊvn2 7)4B 8)@z֊vn2 8)4B 9)@z⊦vn2 9)4>30)@zvn.30)4B 1)@zvnN|31)4N:|32)@Nv"N"Nr"Nj"2 2)4B 3)@zvn2 3)4B 4)@zvn2 4)4B 5)@zvn2 5)4B 6)@zvn2 6)4B 7)@zŠvn2 7)4B 8)@zΊvn2 8)4B 9)@zڊvn2 9)4>40)@z抦vn.40)4B 1)@zvn2 1)4B 2)@zOr|Oj|O"42)4O:"43)@Ov"O"vn2 3)4B 4)@zvn24)4O>L 5)@zvn2 5)4B 6)@zvn2 6)4B 7)@zvn2 7)4B 8)@zƊvn2 8)4B 9)@zҊvn2 9)4>50)@zފvn.50)4B 1)@zꊦvn2 1)4B 2)@zvn2 2)4B3)@Pv|P|Pr"Pj"P"53)4P:54)@zvn2 4)4B 5)@zvn2 5)4B 6)@zvn2 6)4B 7)@zvn2 7)4B 8)@zvn2 8)4B 9)@zʊvn2 9)4>60)@z֊vn.60)4B 1)@z⊦vn2 1)4B 2)@zvn2 2)4B 3)@zvnQ".863)4Q:|64)@Qv"Q"Qr"Qj"2 4)4B 5)@zvn2 5)4B 6)@zvn2 6)4B 7)@zvn2 7)4B 8)@zvn2 8)4B 9)@zŠvn2 9)4>70)@zΊvn.70)4B 1)@zڊvn2 1)4B 2)@z抦vn2 2)4B 3)@zvn2 3)4B 4)@zRr|Rj|R"74)4R:"75)@Rv"R"vn2 5)4B 6)@zvn2 6)4B 7)@zvn2 7)4B 8)@zvn2 8)4B 9)@zvn2 9)4>80)@zƊvn.80)4B 1)@zҊvn2 1)4B 2)@zފvn22)4R:083)@zꊦvn2 3)4B 4)@zvn2 4)4B5)@Sv|S|Sr"Sj"S"85)4S> 6)@zvn2 6)4B 7)@zvn2 7)4B 8)@zvn2 8)4B 9)@zvn2 9)4>90)@zvn.90)4B 1)@zʊvn2 1)4B 2)@z֊vn2 2)4B 3)@z⊦vn2 3)4B 4)@zvn2 4)4B 5)@zvnT|@T:|@Tv"T"Tr"Tj"2 6)4B 7)@zvn2 7)4B 8)@zvn2 8)4B 9)@zvn2 9)4:9@zvn4T6>9@zŠvn2 1)4B 2)@zΊvn2 2)4B 3)@zڊvn2 3)4B 4)@z抦vn2 4)4B 5)@zvn2 5)4B 6)@zUr|Uj|U"9@U>7)@Uv"U"vn2 7)4B 8)@zvn2 8)4B 9)@zvn2 9)4>@zvn.10)4B 1)@zvn2 1)4B 2)@zƊvn2 2)4B 3)@zҊvn2 3)4B 4)@zފvn2 4)4B 5)@zꊦvn2 5)4B 6)@zvn2 6)4B7)@Vv|V|Vr"Vj"V"@V:"@zvn2 8)4B 9)@zvn2 9)4>@zvn.20)4B 1)@zvn2 1)4B 2)@zvn2 2)4B 3)@zʊvn2 3)4B 4)@z֊vn2 4)4B 5)@z⊦vn2 5)4B 6)@zvn2 6)4B 7)@zvnW|@W:|@Wv"W"Wr"Wj"2 8)4B 9)@zvn2 9)4>@zvn.@B 1)@zvn2 1)4B 2)@zvn2 2)4B 3)@zŠvn2 3)4B 4)@zΊvn2 4)4B 5)@zڊvn2 5)4B 6)@z抦vn2 6)4B 7)@zvWj2 7)4B 8)@zXr|XjX"@X:"@Xv"X"vn2 9)4>@zvn.@B 1)@zvn2 1)4B 2)@zvn2 2)4B 3)@zvn2 3)4B 4)@zƊvn2 4)4B 5)@zҊvn2 5)4B 6)@zފvn2 6)4B 7)@zꊦvn2 7)4B 8)@zvn2 8)4B9)@Yv|Y|Yr"Yj"Y"@Y:"@zvn.@B 1)@zvn2 1)4B 2)@zvn2gB 3)@zvn2gBgzvn2gBgzʊvn2gBgz֊vn2gBgz⊦vn2gB 8)@zvn2gBgzvnZ|@Z:|@Z vgZ"Zr"Zj".@Bgzvn2gBgzvn2gBgzvn2 3)4Bgzvn2gB 5)@zŠvn2gB 6)@zΊvn2gBgzڊvn2gBgz抦vn2 8)4Bgzvn2g>@z[r|[j|["bE4[:"@[v"["vn2gBgzvn2gBgzvn2gBgzvn2 4)4Bgzvn2gBgzƊvn2gBgzҊvn2g[> 8)@zފvn2gBgzꊦvn2g>@zvn.@Bg\v|\|\r"\j"\"@\:@zvn2gB 3)@zvn2gB 4)@zvn2gBgzvn2gBgzvn2gBgzʊvn2gB 8)@z֊vn2gB 9)@z⊦vn2 9)4>@zvn.@Bgzvn]|@]:|@]v"]"]r"]j"2gBgzvn2gBgzvn2gBgzvn2 5)4Bgzvn2gBgzŠvn2gBgzΊvn2gB 9)@zڊvn2gH] SuperMan 123456>LҕG0@&gxHgoavro-2.10.1/fixtures/quickstop.avsc000066400000000000000000000003631412474230400176440ustar00rootroot00000000000000{"fields":[{"name":"ID","type":{"type":"long"}},{"name":"First","type":{"type":"string"}},{"name":"Last","type":{"type":"string"}},{"name":"Phone","type":{"type":"string"}},{"name":"Age","type":{"type":"int"}}],"name":"Person","type":"record"}goavro-2.10.1/fixtures/secondBlockCountZero.avro000066400000000000000000000001071412474230400217300ustar00rootroot00000000000000Objavro.schema{"type":"long"}0123456789abcdefab0123456789abcdefgoavro-2.10.1/fixtures/syncMarkerMismatch.avro000066400000000000000000000001061412474230400214340ustar00rootroot00000000000000Objavro.schema{"type":"long"}0123456789abcdefabinvalid-syncmarkgoavro-2.10.1/fixtures/temp0.avro000066400000000000000000000000001412474230400166460ustar00rootroot00000000000000goavro-2.10.1/fixtures/temp1.avro000066400000000000000000000001311412474230400166540ustar00rootroot00000000000000Objavro.codecdeflateavro.schema{"type":"long"}0123456789abcdefab0123456789abcdefgoavro-2.10.1/fixtures/temp2.avro000066400000000000000000000000621412474230400166600ustar00rootroot00000000000000Objavro.schema{"type":"long"}0123456789abcdefgoavro-2.10.1/fixtures/temp3.avro000066400000000000000000000001061412474230400166600ustar00rootroot00000000000000Objavro.schema{"type":"long"}0123456789abcdefT0123456789abcdefgoavro-2.10.1/fixtures/temp4.avro000066400000000000000000000001331412474230400166610ustar00rootroot00000000000000Objavro.schema{"type":"long"}0123456789abcdefT0123456789abcdef0123456789abcdefgoavro-2.10.1/floatingPoint.go000066400000000000000000000210231412474230400162330ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "bytes" "encoding/binary" "fmt" "io" "math" "strconv" ) const ( doubleEncodedLength = 8 // double requires 8 bytes floatEncodedLength = 4 // float requires 4 bytes ) //////////////////////////////////////// // Binary Decode //////////////////////////////////////// func doubleNativeFromBinary(buf []byte) (interface{}, []byte, error) { if len(buf) < doubleEncodedLength { return nil, nil, fmt.Errorf("cannot decode binary double: %s", io.ErrShortBuffer) } return math.Float64frombits(binary.LittleEndian.Uint64(buf[:doubleEncodedLength])), buf[doubleEncodedLength:], nil } func floatNativeFromBinary(buf []byte) (interface{}, []byte, error) { if len(buf) < floatEncodedLength { return nil, nil, fmt.Errorf("cannot decode binary float: %s", io.ErrShortBuffer) } return math.Float32frombits(binary.LittleEndian.Uint32(buf[:floatEncodedLength])), buf[floatEncodedLength:], nil } //////////////////////////////////////// // Binary Encode //////////////////////////////////////// func doubleBinaryFromNative(buf []byte, datum interface{}) ([]byte, error) { var value float64 switch v := datum.(type) { case float64: value = v case float32: value = float64(v) case int: if value = float64(v); int(value) != v { return nil, fmt.Errorf("cannot encode binary double: provided Go int would lose precision: %d", v) } case int64: if value = float64(v); int64(value) != v { return nil, fmt.Errorf("cannot encode binary double: provided Go int64 would lose precision: %d", v) } case int32: if value = float64(v); int32(value) != v { return nil, fmt.Errorf("cannot encode binary double: provided Go int32 would lose precision: %d", v) } default: return nil, fmt.Errorf("cannot encode binary double: expected: Go numeric; received: %T", datum) } buf = append(buf, 0, 0, 0, 0, 0, 0, 0, 0) binary.LittleEndian.PutUint64(buf[len(buf)-doubleEncodedLength:], math.Float64bits(value)) return buf, nil } func floatBinaryFromNative(buf []byte, datum interface{}) ([]byte, error) { var value float32 switch v := datum.(type) { case float32: value = v case float64: // Assume runtime can cast special floats correctly, and if there is a // loss of precision from float64 and float32, that should be expected // or at least understood by the client. value = float32(v) case int: if value = float32(v); int(value) != v { return nil, fmt.Errorf("cannot encode binary float: provided Go int would lose precision: %d", v) } case int64: if value = float32(v); int64(value) != v { return nil, fmt.Errorf("cannot encode binary float: provided Go int64 would lose precision: %d", v) } case int32: if value = float32(v); int32(value) != v { return nil, fmt.Errorf("cannot encode binary float: provided Go int32 would lose precision: %d", v) } default: return nil, fmt.Errorf("cannot encode binary float: expected: Go numeric; received: %T", datum) } // return floatingBinaryEncoder(buf, uint64(math.Float32bits(value)), floatEncodedLength) buf = append(buf, 0, 0, 0, 0) binary.LittleEndian.PutUint32(buf[len(buf)-floatEncodedLength:], uint32(math.Float32bits(value))) return buf, nil } //////////////////////////////////////// // Text Decode //////////////////////////////////////// func doubleNativeFromTextual(buf []byte) (interface{}, []byte, error) { return floatingTextDecoder(buf, 64) } func floatNativeFromTextual(buf []byte) (interface{}, []byte, error) { return floatingTextDecoder(buf, 32) } func floatingTextDecoder(buf []byte, bitSize int) (interface{}, []byte, error) { buflen := len(buf) if buflen >= 4 { if bytes.Equal(buf[:4], []byte("null")) { return math.NaN(), buf[4:], nil } if buflen >= 5 { if bytes.Equal(buf[:5], []byte("1e999")) { return math.Inf(1), buf[5:], nil } if buflen >= 6 { if bytes.Equal(buf[:6], []byte("-1e999")) { return math.Inf(-1), buf[6:], nil } } } } index, err := numberLength(buf, true) // NOTE: floatAllowed = true if err != nil { return nil, nil, err } datum, err := strconv.ParseFloat(string(buf[:index]), bitSize) if err != nil { return nil, nil, err } if bitSize == 32 { return float32(datum), buf[index:], nil } return datum, buf[index:], nil } func numberLength(buf []byte, floatAllowed bool) (int, error) { // ALGORITHM: increment index as long as bytes are valid for number state engine. var index, buflen, count int var b byte // STATE 0: begin, optional: - if buflen = len(buf); index == buflen { return 0, io.ErrShortBuffer } if buf[index] == '-' { if index++; index == buflen { return 0, io.ErrShortBuffer } } // STATE 1: if 0, goto 2; otherwise if 1-9, goto 3; otherwise bail if b = buf[index]; b == '0' { if index++; index == buflen { return index, nil // valid number } } else if b >= '1' && b <= '9' { if index++; index == buflen { return index, nil // valid number } // STATE 3: absorb zero or more digits for { if b = buf[index]; b < '0' || b > '9' { break } if index++; index == buflen { return index, nil // valid number } } } else { return 0, fmt.Errorf("unexpected byte: %q", b) } if floatAllowed { // STATE 2: if ., goto 4; otherwise goto 5 if buf[index] == '.' { if index++; index == buflen { return 0, io.ErrShortBuffer } // STATE 4: absorb one or more digits for { if b = buf[index]; b < '0' || b > '9' { break } count++ if index++; index == buflen { return index, nil // valid number } } if count == 0 { // did not get at least one digit return 0, fmt.Errorf("unexpected byte: %q", b) } } // STATE 5: if e|e, goto 6; otherwise goto 7 if b = buf[index]; b == 'e' || b == 'E' { if index++; index == buflen { return 0, io.ErrShortBuffer } // STATE 6: if -|+, goto 8; otherwise goto 8 if b = buf[index]; b == '+' || b == '-' { if index++; index == buflen { return 0, io.ErrShortBuffer } } // STATE 8: absorb one or more digits count = 0 for { if b = buf[index]; b < '0' || b > '9' { break } count++ if index++; index == buflen { return index, nil // valid number } } if count == 0 { // did not get at least one digit return 0, fmt.Errorf("unexpected byte: %q", b) } } } // STATE 7: end return index, nil } //////////////////////////////////////// // Text Encode //////////////////////////////////////// func floatTextualFromNative(buf []byte, datum interface{}) ([]byte, error) { return floatingTextEncoder(buf, datum, 32) } func doubleTextualFromNative(buf []byte, datum interface{}) ([]byte, error) { return floatingTextEncoder(buf, datum, 64) } func floatingTextEncoder(buf []byte, datum interface{}, bitSize int) ([]byte, error) { var isFloat bool var someFloat64 float64 var someInt64 int64 switch v := datum.(type) { case float32: isFloat = true someFloat64 = float64(v) case float64: isFloat = true someFloat64 = v case int: if someInt64 = int64(v); int(someInt64) != v { if bitSize == 64 { return nil, fmt.Errorf("cannot encode textual double: provided Go int would lose precision: %d", v) } return nil, fmt.Errorf("cannot encode textual float: provided Go int would lose precision: %d", v) } case int64: someInt64 = v case int32: if someInt64 = int64(v); int32(someInt64) != v { if bitSize == 64 { return nil, fmt.Errorf("cannot encode textual double: provided Go int32 would lose precision: %d", v) } return nil, fmt.Errorf("cannot encode textual float: provided Go int32 would lose precision: %d", v) } default: if bitSize == 64 { return nil, fmt.Errorf("cannot encode textual double: expected: Go numeric; received: %T", datum) } return nil, fmt.Errorf("cannot encode textual float: expected: Go numeric; received: %T", datum) } if isFloat { if math.IsNaN(someFloat64) { return append(buf, "null"...), nil } if math.IsInf(someFloat64, 1) { return append(buf, "1e999"...), nil } if math.IsInf(someFloat64, -1) { return append(buf, "-1e999"...), nil } return strconv.AppendFloat(buf, someFloat64, 'g', -1, bitSize), nil } return strconv.AppendInt(buf, someInt64, 10), nil } goavro-2.10.1/floatingPoint_test.go000066400000000000000000000064451412474230400173050ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "math" "testing" ) func TestSchemaPrimitiveCodecDouble(t *testing.T) { testSchemaPrimativeCodec(t, `"double"`) } func TestPrimitiveDoubleBinary(t *testing.T) { testBinaryEncodeFailBadDatumType(t, `"double"`, "some string") testBinaryDecodeFailShortBuffer(t, `"double"`, []byte("\x00\x00\x00\x00\x00\x00\xf0")) testBinaryCodecPass(t, `"double"`, 3.5, []byte("\x00\x00\x00\x00\x00\x00\f@")) testBinaryCodecPass(t, `"double"`, math.Inf(-1), []byte("\x00\x00\x00\x00\x00\x00\xf0\xff")) testBinaryCodecPass(t, `"double"`, math.Inf(1), []byte("\x00\x00\x00\x00\x00\x00\xf0\u007f")) testBinaryCodecPass(t, `"double"`, math.NaN(), []byte("\x01\x00\x00\x00\x00\x00\xf8\u007f")) } func TestPrimitiveDoubleText(t *testing.T) { testTextDecodeFailShortBuffer(t, `"double"`, []byte("")) testTextDecodeFailShortBuffer(t, `"double"`, []byte("-")) testTextCodecPass(t, `"double"`, -12.3, []byte("-12.3")) testTextCodecPass(t, `"double"`, -0.5, []byte("-0.5")) testTextCodecPass(t, `"double"`, -3.5, []byte("-3.5")) testTextCodecPass(t, `"double"`, 0, []byte("0")) testTextCodecPass(t, `"double"`, 0.5, []byte("0.5")) testTextCodecPass(t, `"double"`, 1, []byte("1")) testTextCodecPass(t, `"double"`, 19.7, []byte("19.7")) testTextCodecPass(t, `"double"`, math.Inf(-1), []byte("-1e999")) testTextCodecPass(t, `"double"`, math.Inf(1), []byte("1e999")) testTextCodecPass(t, `"double"`, math.NaN(), []byte("null")) testTextDecodePass(t, `"double"`, math.Copysign(0, -1), []byte("-0")) } func TestSchemaPrimitiveCodecFloat(t *testing.T) { testSchemaPrimativeCodec(t, `"float"`) } func TestPrimitiveFloatBinary(t *testing.T) { testBinaryEncodeFailBadDatumType(t, `"float"`, "some string") testBinaryDecodeFailShortBuffer(t, `"float"`, []byte("\x00\x00\x80")) testBinaryCodecPass(t, `"float"`, 3.5, []byte("\x00\x00\x60\x40")) testBinaryCodecPass(t, `"float"`, math.Inf(-1), []byte("\x00\x00\x80\xff")) testBinaryCodecPass(t, `"float"`, math.Inf(1), []byte("\x00\x00\x80\u007f")) testBinaryCodecPass(t, `"float"`, math.NaN(), []byte("\x00\x00\xc0\u007f")) } func TestPrimitiveFloatText(t *testing.T) { testTextDecodeFailShortBuffer(t, `"float"`, []byte("")) testTextDecodeFailShortBuffer(t, `"float"`, []byte("-")) testTextCodecPass(t, `"float"`, -12.3, []byte("-12.3")) testTextCodecPass(t, `"float"`, -0.5, []byte("-0.5")) testTextCodecPass(t, `"float"`, -3.5, []byte("-3.5")) testTextCodecPass(t, `"float"`, 0, []byte("0")) testTextCodecPass(t, `"float"`, 0.5, []byte("0.5")) testTextCodecPass(t, `"float"`, 1, []byte("1")) testTextCodecPass(t, `"float"`, 19.7, []byte("19.7")) testTextCodecPass(t, `"float"`, math.Inf(-1), []byte("-1e999")) testTextCodecPass(t, `"float"`, math.Inf(1), []byte("1e999")) testTextCodecPass(t, `"float"`, math.NaN(), []byte("null")) testTextDecodePass(t, `"float"`, math.Copysign(0, -1), []byte("-0")) } goavro-2.10.1/fuzz_test.go000066400000000000000000000417561412474230400154720ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "bytes" "strings" "testing" ) func TestCrashers_OCFReader(t *testing.T) { var crashers = map[string]string{ "scan: negative block sizes": "Obj\x01\x04\x16avro.schema\x96\x05{" + "\"type\":\"record\",\"nam" + "e\":\"c0000000\",\"00000" + "0000\":\"00000000000\"," + "\"fields\":[{\"name\":\"u" + "0000000\",\"type\":\"str" + "ing\",\"000\":\"00000000" + "0000\"},{\"name\":\"c000" + "000\",\"type\":\"string\"" + ",\"000\":\"000000000000" + "00000000000000000000" + "0\"},{\"name\":\"t000000" + "00\",\"type\":\"long\",\"0" + "00\":\"000000000000000" + "0000000000000000\"}]," + "\"0000\":\"000000000000" + "00000000000000000000" + "00000000\"}\x14000000000" + "0\b0000\x000000000000000" + "0000\xd90", } for testName, f := range crashers { ensureNoPanic(t, testName, func() { _, _ = NewOCFReader(strings.NewReader(f)) // ensure does not panic }) } } func TestCrashers_OCF_e2e(t *testing.T) { var crashers = map[string]string{ "map: initialSize overflow": "Obj\x01\x04\x14avro.codec\bnul" + "l\x16avro.schema\xa2\x0e{\"typ" + "e\":\"record\",\"name\":\"" + "test_schema\",\"fields" + "\":[{\"name\":\"string\"," + "\"type\":\"string\",\"doc" + "\":\"Meaningless strin" + "g of characters\"},{\"" + "name\":\"simple_map\",\"" + "type\":{\"type\":\"map\"," + "\"values\":\"int\"}},{\"n" + "ame\":\"complex_map\",\"" + "type\":{\"type\":\"map\"," + "\"values\":{\"type\":\"ma" + "p\",\"values\":\"string\"" + "}}},{\"name\":\"union_s" + "tring_null\",\"type\":[" + "\"null\",\"string\"]},{\"" + "name\":\"union_int_lon" + "g_null\",\"type\":[\"int" + "\",\"long\",\"null\"]},{\"" + "name\":\"union_float_d" + "ouble\",\"type\":[\"floa" + "t\",\"double\"]},{\"name" + "\":\"fixed3\",\"type\":{\"" + "type\":\"fixed\",\"name\"" + ":\"fixed3\",\"size\":3}}" + ",{\"name\":\"fixed2\",\"t" + "ype\":{\"type\":\"fixed\"" + ",\"name\":\"fixed2\",\"si" + "ze\":2}},{\"name\":\"enu" + "m\",\"type\":{\"type\":\"e" + "num\",\"name\":\"Suit\",\"" + "symbols\":[\"SPADES\",\"" + "HEARTS\",\"DIAMONDS\",\"" + "CLUBS\"]}},{\"name\":\"r" + "ecord\",\"type\":{\"type" + "\":\"record\",\"name\":\"r" + "ecord\",\"fields\":[{\"n" + "ame\":\"value_field\",\"" + "type\":\"string\"}],\"al" + "iases\":[\"Reco\x9adAlias" + "\"]}},{\"name\":\"array_" + "of_boolean\",\"type\":{" + "\"type\":\"array\",\"item" + "s\":\"boolean\"}},{\"nam" + "e\":\"bytes\",\"type\":\"b" + "ytes\"}]}\x00\xfeJ\x17\u007f\xb4r\x11\x0e\x96&\x0e" + "\xda<\xed\x86\xf6\x06\xfa\x05(OMG SPARK I" + "S AWESOME\x04\x06abc\x02\x06bcd\x0e" + "\x00\x02\x06key\x03\x80\x00\x02d\x02a\x02b\x00\x00\x01\x00\x00" + "\x00\x00\x00\x04\x00\xdb\x0fI@\x02\x03\x04\x11\x12\x00\xb6\x01Two" + " things are infinite" + ": the universe and h" + "uman stupidity; and " + "I'm not sure about " + "universe.\x06\x01\x00\x00\x00\x06ABCT\x00e" + "rran is IMBA!\x04\x06qqq\x84\x01" + "\x06mmm\x00\x00\x02\x06key\x04\x023\x024\x021\x02K" + "��~\x02\x84\x01\x02`\xaa\xaa\xaa\xaa\xaa\x1a@\a" + "\a\a\x01\x02\x06\x9e\x01Life did no\xef\xbf" + "\xbd\ttend to make us pe" + "rfect. Whoever is pe" + "rfect `elongs in a m" + "useum.\x00\x00$The cake is" + " a LIE!\x00\x02\x06key\x00\x00\x00\x04\x02\x00\x00" + "\x00\x00\x00\x00\x00\x00\x11\"\t\x10\x90\x04\x16TEST_ST" + "R123\x00\x04\x00\x02S\xfeJ\x17\u007f\xb4r\x11\x0e\x96&\x0e" + "\xda<\xed\x86\xf6", "map: initialSize overflow-2": "Obj\x01\xff\xff\xff\xff\xff\xff\xff\xff\xff\x010", "array: initialSize overflow": "Obj\x01\x04\x14avro.codec\bnul" + "l\x16avro.schema\xa2\x0e{\"typ" + "e\":\"record\",\"name\":\"" + "test_schema\",\"fields" + "\":[{\"name\":\"string\"," + "\"type\":\"string\",\"doc" + "\":\"Meaningless strin" + "g of characters\"},{\"" + "name\":\"simple_map\",\"" + "type\":{\"type\":\"map\"," + "\"values\":\"int\"}},{\"n" + "ame\":\"complex_map\",\"" + "type\":{\"type\":\"map\"," + "\"values\":{\"type\":\"ma" + "p\",\"values\":\"string\"" + "}}},{\"name\":\"union_s" + "tring_null\",\"type\":[" + "\"null\",\"string\"]},{\"" + "name\":\"union_int_lon" + "g_null\",\"type\":[\"int" + "\",\"long\",\"null\"]},{\"" + "name\":\"union_float_d" + "ouble\",\"type\":[\"floa" + "t\",\"double\"]},{\"name" + "\":\"fixed3\",\"type\":{\"" + "type\":\"fixed\",\"name\"" + ":\"fixed3\",\"size\":3}}" + ",{\"name\":\"fixed2\",\"t" + "ype\":{\"type\":\"fixed\"" + ",\"name\":\"fixed2\",\"si" + "ze\":2}},{\"name\":\"enu" + "m\",\"type\":{\"type\":\"e" + "num\",\"name\":\"Suit\",\"" + "symbols\":[\"SPADES\",\"" + "HEARTS\",\"DIAMONDS\",\"" + "CLUBS\"]}},{\"name\":\"r" + "ecord\",\"type\":{\"type" + "\":\"record\",\"name\":\"r" + "ecord\",\"fields\":[{\"n" + "ame\":\"value_field\",\"" + "type\":\"string\"}],\"al" + "iases\":[\"Reco\x9adAlias" + "\"]}},{\"name\":\"array_" + "of_boolean\",\"type\":{" + "\"type\":\"array\",\"item" + "s\":\"boolean\"}},{\"nam" + "e\":\"bytes\",\"type\":\"b" + "ytes\"}]}\x00\xfeJ\x17\u007f\xb4r\x11\x0e\x96&\x0e" + "\xda<\xed\x86\xf6\x06\xfa\x05(OMG SPARK I" + "S AWESOME\x04\x06abc\x02\x06bcd\x0e" + "\x00\x02\x06key\x03\x80\x00\x02d\x02a\x02b\x00\x00\x01\x00\x00" + "\x00\x00\x00\x04\x00\xdb\x0fI@\x02\x03\x04\x11\x12\x00\xb6\x01Two" + " things are infinite" + ": the universe and h" + "uman stupidity; and " + "I'm not sure about u" + "n������\xef" + "\xbf\xbd�is IMBA!\x04\x06qqq\x84\x01" + "\x06mmm\x00\x00\x02\x06key\x04\x023\x024\x021\x022" + "\x00\x00\x02\x06123\x02\x84\x01\x02`\xaa\xaa\xaa\xaa\xaa\x1a@\a" + "\a\a\x01\x02\x06\x9e\x01Life did no\xef\xbf" + "\xbd\ttend to make us pe" + "rfect. Whoever is pe" + "rfect `elongs in a m" + "useum.\x00\x00$The cake is" + " a LIE!\x00\x02\x06key\x00\x00\x00\x04\x02\x00\x00" + "\x00\x00\x00\x00\x00\x00\x11\"\t\x10\x90\x04\x16TEST_ST" + "R123\x00\x04\x00\x02S\xfeJ\x17\u007f\xb4r\x11\x0e\x96&\x0e" + "\xda<\xed\x86\xf6", "scan: blockSize overflow": "Obj\x01\x04\x14avro.codec\fsna" + "ppy\x16avro.schema\xf2\x05{\"t" + "ype\":\"record\",\"name\"" + ":\"twitter_schema\",\"n" + "amespace\":\"com.migun" + "o.avro\",\"fields\":[{\"" + "name\":\"username\",\"ty" + "pe\":\"string\",\"doc\":\"" + "Name of the user acc" + "ount on Twitter.com\"" + "},{\"name\":\"tweet\",\"t" + "ype\":\"string\",\"doc\":" + "\"The content of the " + "user's Twitter messa" + "ge\"},{\"name\":\"timest" + "amp\",\"type\":\"long\",\"" + "doc\":\"Unix epoch tim" + "e in milliseconds\"}]" + ",\"doc:\":\"A basic sch" + "ema for storing Twit" + "ter messages\"}\x00.\xe2\xf3\xee\x96" + "\nw2\xc3*5\\\x951\xa4\xae~\xa2\x8f\xdc\xf8\xa3H", "fixed: size conversion: positive float64 -> negative int": "Obj\x01\x04\x14avro.codec\x0edef" + "late\x16avro.schema\xa2\x0e{\"" + "type\":\"record\",\"name" + "\":\"test_schema\",\"fie" + "lds\":[{\"name\":\"strin" + "g\",\"type\":\"string\",\"" + "doc\":\"Meaningless st" + "ring of characters\"}" + ",{\"name\":\"simple_map" + "\",\"type\":{\"type\":\"ma" + "p\",\"values\":\"int\"}}," + "{\"name\":\"complex_map" + "\",\"type\":{\"type\":\"ma" + "p\",\"values\":{\"type\":" + "\"map\",\"values\":\"stri" + "ng\"}}},{\"name\":\"unio" + "n_string_null\",\"type" + "\":[\"null\",\"string\"]}" + ",{\"name\":\"union_int_" + "long_null\",\"type\":[\"" + "int\",\"long\",\"null\"]}" + ",{\"name\":\"union_floa" + "t_double\",\"type\":[\"f" + "loat\",\"double\"]},{\"n" + "ame\":\"fixed3\",\"type\"" + ":{\"type\":\"fixed\",\"na" + "me\":\"fixed3\",\"size\":" + "3}},{\"name\":\"fixed2\"" + ",\"type\":{\"type\":\"fix" + "ed\",\"name\":\"fixed2\"," + "\"size\":6938893903907" + "22837764769792556762" + "6953125,\"name\":\"Suit" + "\",\"symbols\":[\"SPADES" + "\",\"HEARTS\",\"DIAMONDS" + "\",\"CLUBS\"]}},{\"name\"" + ":\"record\",\"type\":{\"t" + "ype\":\"record\",\"name\"" + ":\"record\",\"fields\":[" + "{\"name\":\"value_field" + "\",\"type\":\"string\"}]," + "\"aliases\":[\"RecordAl" + "ias\"]}},{\"name\":\"arr" + "ay_of_boolean\",\"type" + "\":{\"type\":\"array\",\"i" + "tems\":\"boolean\"}},{\"" + "name\":\"bytes\",\"type\"" + ":\"bytes\"}]}\x00\x90\xfb\x1eO%\x06%B" + "\x03\x00s\x0f(\x89\x02\a\x06\x82&\x1d\x97K\xa8\x9dg\x15\x86\xf7" + "9l\x8f\xc74\x84\x10B\x88\xa1\x04\x04\x11\x1d8\x14g\xa9\"8" + "R\xe7N\x84\xef~[\xdf\xfd>\xac\xa0\x93Nl5\x95B\xb5\xa8" + "\x88U\x10\x8a\x04\xa1H\xb5\xb6b\x89ZL\xa5\x88x\x03+T\x87" + "\x8a\x11\xec@p\x1d\x87g\xef\xff\xfc\xff\xf7\xaf\xf5\xbe\xcf\xfb\xee+" + "~4\x10#]:Z\xfa\xbd\xbbO\xdc=\xbb=\x1cp#\xa6\x02" + "\xae\xa73\xf0\xc2\xd3\x0f\xff\xf6\x9eۓ/jK\xe4\x1dx\xb5}" + "\xbe\xf1\xab?>" + "F{z[\xcdY\x14w$\xec \x17U\x97]\x16\xc5\xcf\xc3\r" + "_j\xdeV\x05\xd6\xe7\xd9ᛧk\x04\xaaX\xf76ڡ\f" + "\xaf1\x10N\x94\xe4֮E\x88Rswi\x19\x95s\x1a\x06\xa5" + "\xab\xb0{a$E]!\x15\xba<[\x9bQ\xed\xc6ܤ$\x11" + "\xd9\fi$\xcdx\x9c\xa6\x89-\xbc\xe1m\xa4\xdcz0\x91t%" + "\x8b\xd9C딽YMFݕ\xee\xcd\bG|\x9c\xccX\x1a" + "\xf2\xbaI\x197V\x96\xd8i\xcc\u007f<ѪLKt\xcf|\xd9" + "U\x05\xe9:I\xaf5\xad\xc6U!\x16Tּe\xab\x8b\xech" + "\x8c\xb4N\xbb\x82j\x9a\U000f16a2I\x98\x96U\xb1l\x8e\x81\xe3" + "\x91\xaa0\xba#\x03\xce\xe3.rf\xa8D\x12\br\xa4ڬ\xe5" + "\xf2\xe6\xdaմ\x99\xe2\xe8kݶ\xb9\x84\xb5ea>\x88\xd8\xe6" + "\xe1l\xf4tv%\x17\xef\xa8\xf6\xc7=\x0e\xd7,\xde\xd1\xe7\\\xd9" + "MS\r\x1f\xa9Ԩ\xb6?\xe6~8\x9c^\xd3\x1bD\xc2\xe1\xf2" + "ӧ\xfe\xf1\xc6+\xf7^{\xe5\xbb\xee\xf0\xf2/\x9f\xbb\xf3ٯ" + "\xfc\xe8\x1b\xe3\xecH\xec\xd9\xc9\xc9\xc9\xe1\xfc\v\xcf\u007f❣\xab" + "\xe7g\xf8\x90\xb7\xfe\xf5\xcf\xfb'7\xadć\xf6\xb4D\x9a\xe5\xd5" + "\xef\xdf\u007f\xf6\xe4F\xee\x13\xd4\x02\xb9\x9a\xff\xebK\u007fzx\xf9\n" + "d¶\xe5o\xff\xe4\x9e>\x9c\xdfҶw\xbb\xd4\xecc\xcc~" + "\xfc\xfdI\xa7\x15\x88\b\xd3\x105\x95\x94c\xaf\x0eL\xe8,\xea\"" + ",\x87\x94V\x91\xbay\xb6\xb3\x8d\xb1z.Zu4\xb5\xc5}r" + "R\x93\xb1q>A\xdbA\xbdl(f\xd6w\xa9[T]\x92\x1d" + "\xa6\x8fX<\xb1Z҉\x1b\x1e\xeb\nU\x9b\xd74\xafֱD" + "H[\\\x87\xa0\x19\xdb;,w8\xed\x87\xf3\x046\x1c\x1fE)" + "\x051d\x11i\xd4(n\xf2\x94S\x18\x8aɲ\xf8uR\xacK" + "\x8c̦\x8eU\x1c.i\xbb\x05\x1cp\x84/\xbf\xf5\xbb\xfbw\xef" + "?x\xf3\xe1\xc9\xe1\xeb\x1f\xfb\xe0c\x8f}\xe0\xa3\x1foǫ|" + "\x1aU\xa9!\x97\x0e8\xbe\xc3\xe1\xfa\v/\xfe\xf6\x87\x9f\xb9\xf9\xe9" + "\xe7\xee\xfd<\x90\xdd͊R&C,\xeb$\x89\x11l\t*\xcf" + "\xeew\xad\xbcj\xa1k\xf7<\x13CV\x81\x14\x8bV@}?\u007f" + "\x14J\x90\xad\xb7>[\xb6:\xbe\xf8\xe4\xcf^?\xb9E)\x10t" + "\x8av[/\xf3\xfc\xab\x8f\xbft\xf2\xf9\x13[T\xe4\xa2*\x87j" + "\xcfd\x1b\xb0\x8d\x9a\xccRw~\x84\xcaJ\x94\x1dŶ)\x17\xd6" + "\xa0s\x84MΓ\x19\xba\xdc\xc3\xe5\xe0\n\xbc\xf9\xfa\x8f\xbf\x86\x1a" + "\x12B\xa1r\x9ey\xe6\xdf\u007f\u007f\xf7\xe1\xfcp\xbcbkk3\xaf" + "/\xbd\x8f\xe4\x1d3I}4\xcbJ\x92\xf8:\xccJ_@\xa9m" + "\xdcf\xe07\U000dc520/^ĵ\x9a\x80\f>\xebڭ\xf6" + "J\x18s{Q>\u009e=\x0e\x1dL%=\nt\xac\x97\xb2\xe3" + "\x1eX\x1a\xb2\xe2\xd7(\xef4\xc1\xe7\xe5\xe8f\x9b\xaf\x8a#\xc5]" + "\x90@\xf5,n[\x1aѶ%+\xbf\xc0̵\xe2\xacj\xd1T" + "\xb7\xe2\t\xa6\x8f\xb1\xc7\xdaFm\x8cWe2ߑ\xec\xca\xdb\xd4" + "\x8dТ&ϛ\xc6]\xd1\xd8\xd5\xe2\xc5\x1d5\x99\x03哢" + "\x8b\xa3\"\x12mhhp\xb2\xd5ac\xaa]\xc5\\\xecFY\xd2" + "\\\xf0DEr\x1c\xf8J*\x89\xac\x8c\xf3\xbeg\nx\xd6Zq" + "\xabi\x0f\xc9s!E\x90I\nE\xd5\xf7&\xa8\xb5\xa5\x00\x99\\" + "1$\x95\x03\xd9v\x0f\x99\xd20\x96\x1ec\xf5\xa9Bɣm\x8f" + "7\a\x03\\\t&ypF7i\xbc\v\xb58'i\xed\x14\x04" + "*S\xf3\xa9z\x8dMVE\x8av\"w^\x96\x12V\x80\x8a\x1e" + "\x8f\x88\xfa\xa5\xc8\xe1\xad\x13mr\x03\xcc\n\x13ᬤ\x8e\x85!" + "\r\r\x05\x96Z\xdbh\xb2MF\xd0[\x8fiM\xa1\x95(o\xd8" + "^\x8eۤ\xc4\xcaVHɍ\x12\x938\xbe\xa0k\xa9\x9a1\x11" + "'m\x9d\xa0F\x94\xe8XZ\xc98njj\xa5\xf5\x98\xe8\x14\xba" + "\x05\xa1UU\x11\x97l\x93&\xacY\xcc\x11B\xaa3y\xc1ސ" + "\x9bp\xa98\x1f\x17\xa2?\xf2\x9d\xcb.)$\x06D\x87\x8b\xa0\x01" + "ӕ\xb7\xd9L\x9fC\x17\xccM%\xd4\bLv\x1f\xad@\n\xe3" + "9\xb5\x1a\xb4)i\xf6\xa4\x90CF\xa0\"l\xc0\x94\x84\xd9f\x99" + "\xb2|l\v\x16e\x01Zp\xe2\x06\xe9aC\x15\xdeɴ\r\x89" + "k\xe1\x149\x84\x9e\xe5\xd0\xd4V\x88(ֆSǜ\xa0\x83\xa9" + "a0{X[qX\xcf\xdd\x14\t\x9d\a[N\xa16\xb50U" + "\xa9\xba\x06\x92\xb5+K\xcb\x0e\r\xff\x12\xa3\xc4U\xac\u0381\xa9D" + " \x1a\x87\xdc\xf7ɚ\xa0T/\xcb\bc\xccT\xcaY)-_" + "\xdbV\xbd\xfd\xb4i4\xbc\xd0\r\xbeZ@B\a\vqr\x89d" + "\xd1*64\xa6\x1cuՅ\xccJ\x1d\xa2\xb5\xb1F\xe7{\x8c\"" + "\x91\xec5\n\x88K\xa8nr\x9ey\xef\xbew\xb0\x9b\x05 \t\xdd" + "\xbcCA\x05\xa5\xac\x88\x02\xef!ɍ\x91\xb13\xa3\xae//B" + "\b\xac\xf2\xc4\xf2\xaeR($\x8d\x9c{T<\xa86\xcc\v@\x8f" + "\x06҅oy3\xd4\\7\xcaB\xb0\x9d\x13Fg\xf2\x937-" + "\x97\n\x12\x93\x105d\xdd\xee\"\x19@\x9aY\xbd2\x0e\xc0\xc4\xc5" + "\xa8\x0fzb^\x13L\u007f\f\x9f\xb6\xa8fjE\xe6\x02>\x8d\xd7" + "ͽ\xee.-\u0099Wc\x99\xa2\xd1*\x99\xb6Ѧ\xac$\x88" + "\x8a\x99T\x94\x99\x06P\xab\x05C\xc5X\xccJN\fzp\xa4\xc6" + "}'9\xf1\x1e\x97\xa2\xc2\x10\xe8\xbc\xf35\xf3\u0604s\x99b`" + "\x94+̧\xd8W\x11\xe8\xbe\x1dr\xa3\xca\v\xac:\x8393\x16" + "\x93\x11\x9cs\r\xf7ra\xe42GV\b\xcdRݖ\x11'\xe6" + "|\xb1F\x916\xa1\x10\x8f\xfb\xf7\xc9`5Bs\xb6=\xb7\x16A" + "RG\x84O\x1e!\xed\x12_\xcbi\x96UfX\xa4\xaa^\xd8Z" + "B\xe3\xdcAх3\x1a\xca\x0e\xae\x8d9\x18\x1e\x86)\xe1\x01\xab" + "\xc1,x\a+h\xa6\xb3S\x1d7\xbaN\x8d\vZ\x1a\x14A\xc5" + "q7\"\x91\xfa\x1b\xabJ\xc7\xf4\xe7\xd9\x06\x9f\x94\xf2\xd4Թ\xc4" + "\xb6\xab\x04\xac\x17\x02\xef\xb8\xc3\xde\x18\x90\xdbM\x17\xb3j\xa2\xba\xff" + "\v\x80\x8b8V\xa8\xce\xd5.\xd3D\x82\xf6b\x03\xfa\xa5\xc3d\xb3" + "\x106X\x91{\xbb\xac8~\xe0\x15\xce\x18\x10jDg\x9b*F" + "h\xc3V\xe1\t2pV\xba}߹F\x83%f\xba\xa5s\xea" + "8\xf9\xc58>\xb8l\xa60\x06\x88\x1eVC\xc5\xe640\xab\x98" + "\u09a9\x99\x90\x15\xa0\xb0y\x18ΆD\x1e\xf6\x94u\xda|N" + "<+\x86\x1et\xa0\x9a\xd4\xed\x05\xabsD\x14\x06%\x94v2\xbc" + "\xcf2g\xea\x16\x06(vK+pa\x9a\xed\xb6cپ4X" + "\xb8\xb8@ad&\x91?i1b\x86]\x94\xf5\xc0\x1a\xa9!-" + "\xb43\b\x94\x11\x84\vʹ\x8c\x95\r\xf3'\x99^\xb9\x1d\x80\xed" + "'\xcdޖ\a\x13/\x8a\x96p3\xa6\x84\xcd̊\x14\xb3\x17\xaa" + "ٞQ\xe5\x80U\xb0\xe6ū\xaf\r\x92\x95c\xb8с \xeb" + "\xe6t\xaeT\xc4DG\xc4\x12$\xe8T\x83\x10dt\xbe\xf8Ve" + "\xbai\b\x820\x83\x1d \x91\x80\xee\xc9\x16\xf914\n?\xe2\x00" + "\xb4\x1e\x18\xf3\x8eZ\x02t\xf8fUČ\xa1\xa0\x86\x94\x91\xb3\xcd" + "td\xb6.\x89\uf06e\x8e\x1a\x99\x17\xa8\x8c\xf8z8\x15_\x86" + "\xc0J ;.\r!1\xb0i\n\xe8yH\xf06Av\x9c\x02" + "*R\x00Ŕ\xf4d\xa3D\x9b\xf4 ;\x8a\x9c\xfb&Y\xa9\xa1" + "\xf8\xe00\x8a\xab\x05<\x9e\xc3\xe8\xc3y\xf2:5q\x80\xa46\v" + "\xb3\xd4U\n\xc9_\xb4`\xca45<:\xa7\xbd\xc5lĘ\x14" + "\x8ee\x0e\bn\x9c\x8c\xf4}HTD\n\xb8\xfaX\xf3(\x03\xcf" + "\xe5\xb2\xf5\xc3b1\x97\x805\xdd#([\xf0\f\x96&\xb8\x16$" + "J\xb4<\a\xf0\xe9\x1c 0 { tb.Fatalf("BinaryDecode ought to have returned nil buffer: %v", buf) } nativeData[i] = nativeDatum } return nativeData } func nativeFromTextUsingV2(tb testing.TB, codec *Codec, textData [][]byte) []interface{} { tb.Helper() nativeData := make([]interface{}, len(textData)) for i, textDatum := range textData { nativeDatum, buf, err := codec.NativeFromTextual(textDatum) if err != nil { tb.Fatal(err) } if len(buf) > 0 { tb.Fatalf("TextDecode ought to have returned nil buffer: %v", buf) } nativeData[i] = nativeDatum } return nativeData } goavro-2.10.1/helpers_test.go000066400000000000000000000037571412474230400161350ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "io" "runtime" "sync" "testing" ) func benchmarkLowAndHigh(b *testing.B, callback func()) { b.Helper() // Run test case in parallel at relative low concurrency b.Run("Low", func(b *testing.B) { b.ResetTimer() b.RunParallel(func(pb *testing.PB) { for pb.Next() { callback() } }) }) // Run test case in parallel at relative high concurrency b.Run("High", func(b *testing.B) { concurrency := runtime.NumCPU() * 1000 wg := new(sync.WaitGroup) wg.Add(concurrency) b.ResetTimer() for c := 0; c < concurrency; c++ { go func() { defer wg.Done() for n := 0; n < b.N; n++ { callback() } }() } wg.Wait() }) } // ShortWriter returns a structure that wraps an io.Writer, but returns // io.ErrShortWrite when the number of bytes to write exceeds a preset limit. // // Copied with author's permission from https://github.com/karrick/gorill. // // bb := NopCloseBuffer() // sw := ShortWriter(bb, 16) // // n, err := sw.Write([]byte("short write")) // // n == 11, err == nil // // n, err := sw.Write([]byte("a somewhat longer write")) // // n == 16, err == io.ErrShortWrite func ShortWriter(w io.Writer, max int) io.Writer { return shortWriter{Writer: w, max: max} } func (s shortWriter) Write(data []byte) (int, error) { var short bool index := len(data) if index > s.max { index = s.max short = true } n, err := s.Writer.Write(data[:index]) if short { return n, io.ErrShortWrite } return n, err } type shortWriter struct { io.Writer max int } goavro-2.10.1/integer.go000066400000000000000000000134261412474230400150630ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "fmt" "io" "strconv" ) const ( intDownShift = uint32(31) intFlag = byte(128) intMask = byte(127) longDownShift = uint32(63) ) //////////////////////////////////////// // Binary Decode //////////////////////////////////////// func intNativeFromBinary(buf []byte) (interface{}, []byte, error) { var offset, value int var shift uint for offset = 0; offset < len(buf); offset++ { b := buf[offset] value |= int(b&intMask) << shift if b&intFlag == 0 { return (int32(value>>1) ^ -int32(value&1)), buf[offset+1:], nil } shift += 7 } return nil, nil, io.ErrShortBuffer } func longNativeFromBinary(buf []byte) (interface{}, []byte, error) { var offset int var value uint64 var shift uint for offset = 0; offset < len(buf); offset++ { b := buf[offset] value |= uint64(b&intMask) << shift if b&intFlag == 0 { return (int64(value>>1) ^ -int64(value&1)), buf[offset+1:], nil } shift += 7 } return nil, nil, io.ErrShortBuffer } //////////////////////////////////////// // Binary Encode //////////////////////////////////////// func intBinaryFromNative(buf []byte, datum interface{}) ([]byte, error) { var value int32 switch v := datum.(type) { case int32: value = v case int: if value = int32(v); int(value) != v { return nil, fmt.Errorf("cannot encode binary int: provided Go int would lose precision: %d", v) } case int64: if value = int32(v); int64(value) != v { return nil, fmt.Errorf("cannot encode binary int: provided Go int64 would lose precision: %d", v) } case float64: if value = int32(v); float64(value) != v { return nil, fmt.Errorf("cannot encode binary int: provided Go float64 would lose precision: %f", v) } case float32: if value = int32(v); float32(value) != v { return nil, fmt.Errorf("cannot encode binary int: provided Go float32 would lose precision: %f", v) } default: return nil, fmt.Errorf("cannot encode binary int: expected: Go numeric; received: %T", datum) } encoded := uint64((uint32(value) << 1) ^ uint32(value>>intDownShift)) return integerBinaryEncoder(buf, encoded) } func longBinaryFromNative(buf []byte, datum interface{}) ([]byte, error) { var value int64 switch v := datum.(type) { case int64: value = v case int: value = int64(v) case int32: value = int64(v) case float64: if value = int64(v); float64(value) != v { return nil, fmt.Errorf("cannot encode binary long: provided Go float64 would lose precision: %f", v) } case float32: if value = int64(v); float32(value) != v { return nil, fmt.Errorf("cannot encode binary long: provided Go float32 would lose precision: %f", v) } default: return nil, fmt.Errorf("long: expected: Go numeric; received: %T", datum) } encoded := (uint64(value) << 1) ^ uint64(value>>longDownShift) return integerBinaryEncoder(buf, encoded) } func integerBinaryEncoder(buf []byte, encoded uint64) ([]byte, error) { // used by both intBinaryEncoder and longBinaryEncoder if encoded == 0 { return append(buf, 0), nil } for encoded > 0 { b := byte(encoded) & intMask encoded = encoded >> 7 if encoded != 0 { b |= intFlag // set high bit; we have more bytes } buf = append(buf, b) } return buf, nil } //////////////////////////////////////// // Text Decode //////////////////////////////////////// func longNativeFromTextual(buf []byte) (interface{}, []byte, error) { return integerTextDecoder(buf, 64) } func intNativeFromTextual(buf []byte) (interface{}, []byte, error) { return integerTextDecoder(buf, 32) } func integerTextDecoder(buf []byte, bitSize int) (interface{}, []byte, error) { index, err := numberLength(buf, false) // NOTE: floatAllowed = false if err != nil { return nil, nil, err } datum, err := strconv.ParseInt(string(buf[:index]), 10, bitSize) if err != nil { return nil, nil, err } if bitSize == 32 { return int32(datum), buf[index:], nil } return datum, buf[index:], nil } //////////////////////////////////////// // Text Encode //////////////////////////////////////// func longTextualFromNative(buf []byte, datum interface{}) ([]byte, error) { return integerTextEncoder(buf, datum, 64) } func intTextualFromNative(buf []byte, datum interface{}) ([]byte, error) { return integerTextEncoder(buf, datum, 32) } func integerTextEncoder(buf []byte, datum interface{}, bitSize int) ([]byte, error) { var someInt64 int64 switch v := datum.(type) { case int: someInt64 = int64(v) case int32: someInt64 = int64(v) case int64: someInt64 = v case float32: if someInt64 = int64(v); float32(someInt64) != v { if bitSize == 64 { return nil, fmt.Errorf("cannot encode textual long: provided Go float32 would lose precision: %f", v) } return nil, fmt.Errorf("cannot encode textual int: provided Go float32 would lose precision: %f", v) } case float64: if someInt64 = int64(v); float64(someInt64) != v { if bitSize == 64 { return nil, fmt.Errorf("cannot encode textual long: provided Go float64 would lose precision: %f", v) } return nil, fmt.Errorf("cannot encode textual int: provided Go float64 would lose precision: %f", v) } default: if bitSize == 64 { return nil, fmt.Errorf("cannot encode textual long: expected: Go numeric; received: %T", datum) } return nil, fmt.Errorf("cannot encode textual int: expected: Go numeric; received: %T", datum) } return strconv.AppendInt(buf, someInt64, 10), nil } goavro-2.10.1/integer_test.go000066400000000000000000000101531412474230400161140ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "testing" ) func TestSchemaPrimitiveCodecInt(t *testing.T) { testSchemaPrimativeCodec(t, `"int"`) } func TestPrimitiveIntBinary(t *testing.T) { testBinaryEncodeFailBadDatumType(t, `"int"`, "some string") testBinaryDecodeFailShortBuffer(t, `"int"`, []byte{0xfd, 0xff, 0xff, 0xff}) testBinaryCodecPass(t, `"int"`, -1, []byte{0x01}) testBinaryCodecPass(t, `"int"`, -2147483647, []byte{0xfd, 0xff, 0xff, 0xff, 0xf}) testBinaryCodecPass(t, `"int"`, -3, []byte{0x05}) testBinaryCodecPass(t, `"int"`, -65, []byte("\x81\x01")) testBinaryCodecPass(t, `"int"`, 0, []byte{0x00}) testBinaryCodecPass(t, `"int"`, 1, []byte{0x02}) testBinaryCodecPass(t, `"int"`, 1016, []byte("\xf0\x0f")) testBinaryCodecPass(t, `"int"`, 1455301406, []byte{0xbc, 0x8c, 0xf1, 0xeb, 0xa}) testBinaryCodecPass(t, `"int"`, 2147483647, []byte{0xfe, 0xff, 0xff, 0xff, 0xf}) testBinaryCodecPass(t, `"int"`, 3, []byte("\x06")) testBinaryCodecPass(t, `"int"`, 64, []byte("\x80\x01")) testBinaryCodecPass(t, `"int"`, 66052, []byte("\x88\x88\x08")) testBinaryCodecPass(t, `"int"`, 8454660, []byte("\x88\x88\x88\x08")) } func TestPrimitiveIntText(t *testing.T) { testTextDecodeFailShortBuffer(t, `"int"`, []byte("")) testTextDecodeFailShortBuffer(t, `"int"`, []byte("-")) testTextCodecPass(t, `"int"`, -13, []byte("-13")) testTextCodecPass(t, `"int"`, 0, []byte("0")) testTextCodecPass(t, `"int"`, 13, []byte("13")) testTextDecodePass(t, `"int"`, -0, []byte("-0")) testTextEncodePass(t, `"int"`, -0, []byte("0")) // NOTE: -0 encodes as "0" } func TestSchemaPrimitiveCodecLong(t *testing.T) { testSchemaPrimativeCodec(t, `"long"`) } func TestPrimitiveLongBinary(t *testing.T) { testBinaryEncodeFailBadDatumType(t, `"long"`, "some string") testBinaryDecodeFailShortBuffer(t, `"long"`, []byte("\xff\xff\xff\xff")) testBinaryCodecPass(t, `"long"`, int64((1<<63)-1), []byte{0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x1}) testBinaryCodecPass(t, `"long"`, int64(-(1 << 63)), []byte{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x1}) testBinaryCodecPass(t, `"long"`, -2147483648, []byte("\xff\xff\xff\xff\x0f")) testBinaryCodecPass(t, `"long"`, -3, []byte("\x05")) testBinaryCodecPass(t, `"long"`, -65, []byte("\x81\x01")) testBinaryCodecPass(t, `"long"`, 0, []byte("\x00")) testBinaryCodecPass(t, `"long"`, 1082196484, []byte("\x88\x88\x88\x88\x08")) testBinaryCodecPass(t, `"long"`, int64(1359702038045356208), []byte{0xe0, 0xc2, 0x8b, 0xa1, 0x96, 0xf3, 0xd0, 0xde, 0x25}) testBinaryCodecPass(t, `"long"`, int64(138521149956), []byte("\x88\x88\x88\x88\x88\x08")) testBinaryCodecPass(t, `"long"`, int64(17730707194372), []byte("\x88\x88\x88\x88\x88\x88\x08")) testBinaryCodecPass(t, `"long"`, 2147483647, []byte("\xfe\xff\xff\xff\x0f")) testBinaryCodecPass(t, `"long"`, int64(2269530520879620), []byte("\x88\x88\x88\x88\x88\x88\x88\x08")) testBinaryCodecPass(t, `"long"`, 3, []byte("\x06")) testBinaryCodecPass(t, `"long"`, int64(5959107741628848600), []byte{0xb0, 0xe7, 0x8a, 0xe1, 0xe2, 0xba, 0x80, 0xb3, 0xa5, 0x1}) testBinaryCodecPass(t, `"long"`, 64, []byte("\x80\x01")) // https://github.com/linkedin/goavro/issues/49 testBinaryCodecPass(t, `"long"`, int64(-5513458701470791632), []byte("\x9f\xdf\x9f\x8f\xc7\xde\xde\x83\x99\x01")) } func TestPrimitiveLongText(t *testing.T) { testTextDecodeFailShortBuffer(t, `"long"`, []byte("")) testTextDecodeFailShortBuffer(t, `"long"`, []byte("-")) testTextCodecPass(t, `"long"`, -13, []byte("-13")) testTextCodecPass(t, `"long"`, 0, []byte("0")) testTextCodecPass(t, `"long"`, 13, []byte("13")) testTextDecodePass(t, `"long"`, -0, []byte("-0")) testTextEncodePass(t, `"long"`, -0, []byte("0")) // NOTE: -0 encodes as "0" } goavro-2.10.1/logical_type.go000066400000000000000000000364431412474230400161050ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "errors" "fmt" "math/big" "time" ) type toNativeFn func([]byte) (interface{}, []byte, error) type fromNativeFn func([]byte, interface{}) ([]byte, error) ////////////////////////////////////////////////////////////////////////////////////////////// // date logical type - to/from time.Time, time.UTC location ////////////////////////////////////////////////////////////////////////////////////////////// func nativeFromDate(fn toNativeFn) toNativeFn { return func(bytes []byte) (interface{}, []byte, error) { l, b, err := fn(bytes) if err != nil { return l, b, err } i, ok := l.(int32) if !ok { return l, b, fmt.Errorf("cannot transform to native date, expected int, received %T", l) } t := time.Date(1970, 1, 1, 0, 0, 0, 0, time.UTC).AddDate(0, 0, int(i)).UTC() return t, b, nil } } func dateFromNative(fn fromNativeFn) fromNativeFn { return func(b []byte, d interface{}) ([]byte, error) { switch val := d.(type) { case int, int32, int64, float32, float64: // "Language implementations may choose to represent logical types with an appropriate native type, although this is not required." // especially permitted default values depend on the field's schema type and goavro encodes default values using the field schema return fn(b, val) case time.Time: // rephrasing the avro 1.9.2 spec a date is actually stored as the duration since unix epoch in days // time.Unix() returns this duration in seconds and time.UnixNano() in nanoseconds // reviewing the source code, both functions are based on the internal function unixSec() // unixSec() returns the seconds since unix epoch as int64, whereby Unix() provides the greater range and UnixNano() the higher precision // As a date requires a precision of days Unix() provides more then enough precision and a greater range, including the go zero time numDays := val.Unix() / 86400 return fn(b, numDays) default: return nil, fmt.Errorf("cannot transform to binary date, expected time.Time or Go numeric, received %T", d) } } } ////////////////////////////////////////////////////////////////////////////////////////////// // time-millis logical type - to/from time.Time, time.UTC location ////////////////////////////////////////////////////////////////////////////////////////////// func nativeFromTimeMillis(fn toNativeFn) toNativeFn { return func(bytes []byte) (interface{}, []byte, error) { l, b, err := fn(bytes) if err != nil { return l, b, err } i, ok := l.(int32) if !ok { return l, b, fmt.Errorf("cannot transform to native time.Duration, expected int, received %T", l) } t := time.Duration(i) * time.Millisecond return t, b, nil } } func timeMillisFromNative(fn fromNativeFn) fromNativeFn { return func(b []byte, d interface{}) ([]byte, error) { switch val := d.(type) { case int, int32, int64, float32, float64: // "Language implementations may choose to represent logical types with an appropriate native type, although this is not required." // especially permitted default values depend on the field's schema type and goavro encodes default values using the field schema return fn(b, val) case time.Duration: duration := int32(val.Nanoseconds() / int64(time.Millisecond)) return fn(b, duration) default: return nil, fmt.Errorf("cannot transform to binary time-millis, expected time.Duration or Go numeric, received %T", d) } } } ////////////////////////////////////////////////////////////////////////////////////////////// // time-micros logical type - to/from time.Time, time.UTC location ////////////////////////////////////////////////////////////////////////////////////////////// func nativeFromTimeMicros(fn toNativeFn) toNativeFn { return func(bytes []byte) (interface{}, []byte, error) { l, b, err := fn(bytes) if err != nil { return l, b, err } i, ok := l.(int64) if !ok { return l, b, fmt.Errorf("cannot transform to native time.Duration, expected long, received %T", l) } t := time.Duration(i) * time.Microsecond return t, b, nil } } func timeMicrosFromNative(fn fromNativeFn) fromNativeFn { return func(b []byte, d interface{}) ([]byte, error) { switch val := d.(type) { case int, int32, int64, float32, float64: // "Language implementations may choose to represent logical types with an appropriate native type, although this is not required." // especially permitted default values depend on the field's schema type and goavro encodes default values using the field schema return fn(b, val) case time.Duration: duration := int32(val.Nanoseconds() / int64(time.Microsecond)) return fn(b, duration) default: return nil, fmt.Errorf("cannot transform to binary time-micros, expected time.Duration or Go numeric, received %T", d) } } } ////////////////////////////////////////////////////////////////////////////////////////////// // timestamp-millis logical type - to/from time.Time, time.UTC location ////////////////////////////////////////////////////////////////////////////////////////////// func nativeFromTimeStampMillis(fn toNativeFn) toNativeFn { return func(bytes []byte) (interface{}, []byte, error) { l, b, err := fn(bytes) if err != nil { return l, b, err } milliseconds, ok := l.(int64) if !ok { return l, b, fmt.Errorf("cannot transform native timestamp-millis, expected int64, received %T", l) } seconds := milliseconds / 1e3 nanoseconds := (milliseconds - (seconds * 1e3)) * 1e6 return time.Unix(seconds, nanoseconds).UTC(), b, nil } } func timeStampMillisFromNative(fn fromNativeFn) fromNativeFn { return func(b []byte, d interface{}) ([]byte, error) { switch val := d.(type) { case int, int32, int64, float32, float64: // "Language implementations may choose to represent logical types with an appropriate native type, although this is not required." // especially permitted default values depend on the field's schema type and goavro encodes default values using the field schema return fn(b, val) case time.Time: // While this code performs a few more steps than seem required, it is // written this way to allow the best time resolution without overflowing the int64 value. return fn(b, val.Unix()*1e3+int64(val.Nanosecond()/1e6)) default: return nil, fmt.Errorf("cannot transform to binary timestamp-millis, expected time.Time or Go numeric, received %T", d) } } } ////////////////////////////////////////////////////////////////////////////////////////////// // timestamp-micros logical type - to/from time.Time, time.UTC location ////////////////////////////////////////////////////////////////////////////////////////////// func nativeFromTimeStampMicros(fn toNativeFn) toNativeFn { return func(bytes []byte) (interface{}, []byte, error) { l, b, err := fn(bytes) if err != nil { return l, b, err } microseconds, ok := l.(int64) if !ok { return l, b, fmt.Errorf("cannot transform native timestamp-micros, expected int64, received %T", l) } // While this code performs a few more steps than seem required, it is // written this way to allow the best time resolution on UNIX and // Windows without overflowing the int64 value. Windows has a zero-time // value of 1601-01-01 UTC, and the number of nanoseconds since that // zero-time overflows 64-bit integers. seconds := microseconds / 1e6 nanoseconds := (microseconds - (seconds * 1e6)) * 1e3 return time.Unix(seconds, nanoseconds).UTC(), b, nil } } func timeStampMicrosFromNative(fn fromNativeFn) fromNativeFn { return func(b []byte, d interface{}) ([]byte, error) { switch val := d.(type) { case int, int32, int64, float32, float64: // "Language implementations may choose to represent logical types with an appropriate native type, although this is not required." // especially permitted default values depend on the field's schema type and goavro encodes default values using the field schema return fn(b, val) case time.Time: // While this code performs a few more steps than seem required, it is // written this way to allow the best time resolution on UNIX and // Windows without overflowing the int64 value. Windows has a zero-time // value of 1601-01-01 UTC, and the number of nanoseconds since that // zero-time overflows 64-bit integers. return fn(b, val.Unix()*1e6+int64(val.Nanosecond()/1e3)) default: return nil, fmt.Errorf("cannot transform to binary timestamp-micros, expected time.Time or Go numeric, received %T", d) } } } ///////////////////////////////////////////////////////////////////////////////////////////// // decimal logical-type - byte/fixed - to/from math/big.Rat // two's complement algorithm taken from: // https://groups.google.com/d/msg/golang-nuts/TV4bRVrHZUw/UcQt7S4IYlcJ by rog ///////////////////////////////////////////////////////////////////////////////////////////// type makeCodecFn func(st map[string]*Codec, enclosingNamespace string, schemaMap map[string]interface{}) (*Codec, error) func precisionAndScaleFromSchemaMap(schemaMap map[string]interface{}) (int, int, error) { p1, ok := schemaMap["precision"] if !ok { return 0, 0, errors.New("cannot create decimal logical type without precision") } p2, ok := p1.(float64) if !ok { return 0, 0, fmt.Errorf("cannot create decimal logical type with wrong precision type; expected: float64; received: %T", p1) } p3 := int(p2) if p3 <= 1 { return 0, 0, fmt.Errorf("cannot create decimal logical type when precision is less than one: %d", p3) } var s3 int // scale defaults to 0 if not set if s1, ok := schemaMap["scale"]; ok { s2, ok := s1.(float64) if !ok { return 0, 0, fmt.Errorf("cannot create decimal logical type with wrong precision type; expected: float64; received: %T", p1) } s3 = int(s2) if s3 < 0 { return 0, 0, fmt.Errorf("cannot create decimal logical type when scale is less than zero: %d", s3) } if s3 > p3 { return 0, 0, fmt.Errorf("cannot create decimal logical type when scale is larger than precision: %d > %d", s3, p3) } } return p3, s3, nil } var one = big.NewInt(1) func makeDecimalBytesCodec(st map[string]*Codec, enclosingNamespace string, schemaMap map[string]interface{}) (*Codec, error) { precision, scale, err := precisionAndScaleFromSchemaMap(schemaMap) if err != nil { return nil, err } if _, ok := schemaMap["name"]; !ok { schemaMap["name"] = "bytes.decimal" } c, err := registerNewCodec(st, schemaMap, enclosingNamespace) if err != nil { return nil, fmt.Errorf("Bytes ought to have valid name: %s", err) } c.binaryFromNative = decimalBytesFromNative(bytesBinaryFromNative, toSignedBytes, precision, scale) c.textualFromNative = decimalBytesFromNative(bytesTextualFromNative, toSignedBytes, precision, scale) c.nativeFromBinary = nativeFromDecimalBytes(bytesNativeFromBinary, precision, scale) c.nativeFromTextual = nativeFromDecimalBytes(bytesNativeFromTextual, precision, scale) return c, nil } func nativeFromDecimalBytes(fn toNativeFn, precision, scale int) toNativeFn { return func(bytes []byte) (interface{}, []byte, error) { d, b, err := fn(bytes) if err != nil { return d, b, err } bs, ok := d.([]byte) if !ok { return nil, bytes, fmt.Errorf("cannot transform to native decimal, expected []byte, received %T", d) } num := big.NewInt(0) fromSignedBytes(num, bs) denom := new(big.Int).Exp(big.NewInt(10), big.NewInt(int64(scale)), nil) r := new(big.Rat).SetFrac(num, denom) return r, b, nil } } func decimalBytesFromNative(fromNativeFn fromNativeFn, toBytesFn toBytesFn, precision, scale int) fromNativeFn { return func(b []byte, d interface{}) ([]byte, error) { r, ok := d.(*big.Rat) if !ok { return nil, fmt.Errorf("cannot transform to bytes, expected *big.Rat, received %T", d) } // Reduce accuracy to precision by dividing and multiplying by digit length num := big.NewInt(0).Set(r.Num()) denom := big.NewInt(0).Set(r.Denom()) i := new(big.Int).Mul(num, new(big.Int).Exp(big.NewInt(10), big.NewInt(int64(scale)), nil)) // divide that by the denominator precnum := new(big.Int).Div(i, denom) bout, err := toBytesFn(precnum) if err != nil { return nil, err } return fromNativeFn(b, bout) } } func makeDecimalFixedCodec(st map[string]*Codec, enclosingNamespace string, schemaMap map[string]interface{}) (*Codec, error) { precision, scale, err := precisionAndScaleFromSchemaMap(schemaMap) if err != nil { return nil, err } if _, ok := schemaMap["name"]; !ok { schemaMap["name"] = "fixed.decimal" } c, err := makeFixedCodec(st, enclosingNamespace, schemaMap) if err != nil { return nil, err } size, err := sizeFromSchemaMap(c.typeName, schemaMap) if err != nil { return nil, err } c.binaryFromNative = decimalBytesFromNative(c.binaryFromNative, toSignedFixedBytes(size), precision, scale) c.textualFromNative = decimalBytesFromNative(c.textualFromNative, toSignedFixedBytes(size), precision, scale) c.nativeFromBinary = nativeFromDecimalBytes(c.nativeFromBinary, precision, scale) c.nativeFromTextual = nativeFromDecimalBytes(c.nativeFromTextual, precision, scale) return c, nil } func padBytes(bytes []byte, fixedSize uint) []byte { s := int(fixedSize) padded := make([]byte, s, s) if s >= len(bytes) { copy(padded[s-len(bytes):], bytes) } return padded } type toBytesFn func(n *big.Int) ([]byte, error) // fromSignedBytes sets the value of n to the big-endian two's complement // value stored in the given data. If data[0]&80 != 0, the number // is negative. If data is empty, the result will be 0. func fromSignedBytes(n *big.Int, data []byte) { n.SetBytes(data) if len(data) > 0 && data[0]&0x80 > 0 { n.Sub(n, new(big.Int).Lsh(one, uint(len(data))*8)) } } // toSignedBytes returns the big-endian two's complement // form of n. func toSignedBytes(n *big.Int) ([]byte, error) { switch n.Sign() { case 0: return []byte{0}, nil case 1: b := n.Bytes() if b[0]&0x80 > 0 { b = append([]byte{0}, b...) } return b, nil case -1: length := uint(n.BitLen()/8+1) * 8 b := new(big.Int).Add(n, new(big.Int).Lsh(one, length)).Bytes() // When the most significant bit is on a byte // boundary, we can get some extra significant // bits, so strip them off when that happens. if len(b) >= 2 && b[0] == 0xff && b[1]&0x80 != 0 { b = b[1:] } return b, nil } return nil, fmt.Errorf("toSignedBytes: error big.Int.Sign() returned unexpected value") } // toSignedFixedBytes returns the big-endian two's complement // form of n for a given length of bytes. func toSignedFixedBytes(size uint) func(*big.Int) ([]byte, error) { return func(n *big.Int) ([]byte, error) { switch n.Sign() { case 0: return []byte{0}, nil case 1: b := n.Bytes() if b[0]&0x80 > 0 { b = append([]byte{0}, b...) } return padBytes(b, size), nil case -1: length := size * 8 b := new(big.Int).Add(n, new(big.Int).Lsh(one, length)).Bytes() // Unlike a variable length byte length we need the extra bits to meet byte length return b, nil } return nil, fmt.Errorf("toSignedBytes: error big.Int.Sign() returned unexpected value") } } goavro-2.10.1/logical_type_test.go000066400000000000000000000233741412474230400171430ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "fmt" "math/big" "testing" "time" ) func TestSchemaLogicalType(t *testing.T) { testSchemaValid(t, `{"type": "long", "logicalType": "timestamp-millis"}`) testSchemaInvalid(t, `{"type": "bytes", "logicalType": "decimal"}`, "precision") testSchemaInvalid(t, `{"type": "fixed", "size": 16, "logicalType": "decimal"}`, "precision") } func TestStringLogicalTypeFallback(t *testing.T) { schema := `{"type": "string", "logicalType": "this_logical_type_does_not_exist"}` testSchemaValid(t, schema) testBinaryCodecPass(t, schema, "test string", []byte("\x16\x74\x65\x73\x74\x20\x73\x74\x72\x69\x6e\x67")) } func TestLongLogicalTypeFallback(t *testing.T) { schema := `{"type": "long", "logicalType": "this_logical_type_does_not_exist"}` testSchemaValid(t, schema) testBinaryCodecPass(t, schema, 12345, []byte("\xf2\xc0\x01")) } func TestTimeStampMillisLogicalTypeEncode(t *testing.T) { schema := `{"type": "long", "logicalType": "timestamp-millis"}` testBinaryDecodeFail(t, schema, []byte(""), "short buffer") t.Skip("this test is broken") testBinaryEncodeFail(t, schema, "test", "cannot transform binary timestamp-millis, expected time.Time") testBinaryCodecPass(t, schema, time.Date(2006, 1, 2, 15, 04, 05, 565000000, time.UTC), []byte("\xfa\x82\xac\xba\x91\x42")) } func TestTimeStampMillisLogicalTypeUnionEncode(t *testing.T) { schema := `{"type": ["null", {"type": "long", "logicalType": "timestamp-millis"}]}` testBinaryEncodeFail(t, schema, Union("string", "test"), "cannot encode binary union: no member schema types support datum: allowed types: [null long.timestamp-millis]") testBinaryCodecPass(t, schema, Union("long.timestamp-millis", time.Date(2006, 1, 2, 15, 04, 05, 565000000, time.UTC)), []byte("\x02\xfa\x82\xac\xba\x91\x42")) } func TestTimeStampMicrosLogicalTypeEncode(t *testing.T) { schema := `{"type": "long", "logicalType": "timestamp-micros"}` testBinaryDecodeFail(t, schema, []byte(""), "short buffer") t.Skip("this test is broken") testBinaryEncodeFail(t, schema, "test", "cannot transform binary timestamp-micros, expected time.Time") testBinaryCodecPass(t, schema, time.Date(2006, 1, 2, 15, 04, 05, 565283000, time.UTC), []byte("\xc6\x8d\xf7\xe7\xaf\xd8\x84\x04")) } func TestTimeStampMicrosLogicalTypeUnionEncode(t *testing.T) { schema := `{"type": ["null", {"type": "long", "logicalType": "timestamp-micros"}]}` testBinaryEncodeFail(t, schema, Union("string", "test"), "cannot encode binary union: no member schema types support datum: allowed types: [null long.timestamp-micros]") testBinaryCodecPass(t, schema, Union("long.timestamp-micros", time.Date(2006, 1, 2, 15, 04, 05, 565283000, time.UTC)), []byte("\x02\xc6\x8d\xf7\xe7\xaf\xd8\x84\x04")) } func TestTimeMillisLogicalTypeEncode(t *testing.T) { schema := `{"type": "int", "logicalType": "time-millis"}` testBinaryDecodeFail(t, schema, []byte(""), "short buffer") testBinaryEncodeFail(t, schema, "test", "cannot transform to binary time-millis, expected time.Duration") testBinaryCodecPass(t, schema, 66904022*time.Millisecond, []byte("\xac\xff\xe6\x3f")) } func TestTimeMillisLogicalTypeUnionEncode(t *testing.T) { schema := `{"type": ["null", {"type": "int", "logicalType": "time-millis"}]}` testBinaryEncodeFail(t, schema, Union("string", "test"), "cannot encode binary union: no member schema types support datum: allowed types: [null int.time-millis]") testBinaryCodecPass(t, schema, Union("int.time-millis", 66904022*time.Millisecond), []byte("\x02\xac\xff\xe6\x3f")) } func TestTimeMicrosLogicalTypeEncode(t *testing.T) { schema := `{"type": "long", "logicalType": "time-micros"}` testBinaryDecodeFail(t, schema, []byte(""), "short buffer") testBinaryEncodeFail(t, schema, "test", "cannot transform to binary time-micros, expected time.Duration") t.Skip("this test is broken") testBinaryCodecPass(t, schema, 66904022566*time.Microsecond, []byte("\xcc\xf8\xd2\xbc\xf2\x03")) } func TestTimeMicrosLogicalTypeUnionEncode(t *testing.T) { schema := `{"type": ["null", {"type": "long", "logicalType": "time-micros"}]}` testBinaryEncodeFail(t, schema, Union("string", "test"), "cannot encode binary union: no member schema types support datum: allowed types: [null long.time-micros]") t.Skip("this test is broken") testBinaryCodecPass(t, schema, Union("long.time-micros", 66904022566*time.Microsecond), []byte("\x02\xcc\xf8\xd2\xbc\xf2\x03")) } func TestDateLogicalTypeEncode(t *testing.T) { schema := `{"type": "int", "logicalType": "date"}` testBinaryDecodeFail(t, schema, []byte(""), "short buffer") t.Skip("this test is broken") testBinaryEncodeFail(t, schema, "test", "cannot transform to binary date, expected time.Time, received string") testBinaryCodecPass(t, schema, time.Date(2006, 1, 2, 0, 0, 0, 0, time.UTC), []byte("\xbc\xcd\x01")) } func testGoZeroTime(t *testing.T, schema string, expected []byte) { t.Helper() testBinaryEncodePass(t, schema, time.Time{}, expected) codec, err := NewCodec(schema) if err != nil { t.Fatal(err) } value, remaining, err := codec.NativeFromBinary(expected) if err != nil { t.Fatalf("schema: %s; %s", schema, err) } // remaining ought to be empty because there is nothing remaining to be // decoded if actual, expected := len(remaining), 0; actual != expected { t.Errorf("schema: %s; Remaining; Actual: %#v; Expected: %#v", schema, actual, expected) } zeroTime, ok := value.(time.Time) if !ok { t.Fatalf("schema: %s, NativeFromBinary: expected time.Time, got %T", schema, value) } if !zeroTime.IsZero() { t.Fatalf("schema: %s, Check: time.Time{}.IsZero(), Actual: %t, Expected: true", schema, zeroTime.IsZero()) } } func TestDateGoZero(t *testing.T) { testGoZeroTime(t, `{"type": "int", "logicalType": "date"}`, []byte{0xf3, 0xe4, 0x57}) } func TestTimeStampMillisGoZero(t *testing.T) { testGoZeroTime(t, `{"type": "long", "logicalType": "timestamp-millis"}`, []byte{0xff, 0xdf, 0xe6, 0xa2, 0xe2, 0xa0, 0x1c}) } func TestTimeStampMicrosGoZero(t *testing.T) { testGoZeroTime(t, `{"type": "long", "logicalType": "timestamp-micros"}`, []byte{0xff, 0xff, 0xdd, 0xf2, 0xdf, 0xff, 0xdf, 0xdc, 0x1}) } func TestDecimalBytesLogicalTypeEncode(t *testing.T) { schema := `{"type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2}` testBinaryCodecPass(t, schema, big.NewRat(617, 50), []byte("\x04\x04\xd2")) testBinaryCodecPass(t, schema, big.NewRat(-617, 50), []byte("\x04\xfb\x2e")) testBinaryCodecPass(t, schema, big.NewRat(0, 1), []byte("\x02\x00")) // Test with a large decimal of precision 77 and scale 38 largeDecimalSchema := `{"type": "bytes", "logicalType": "decimal", "precision": 77, "scale": 38}` n, _ := new(big.Int).SetString("12345678901234567890123456789012345678911111111111111111111111111111111111111", 10) d, _ := new(big.Int).SetString("100000000000000000000000000000000000000", 10) largeRat := new(big.Rat).SetFrac(n, d) testBinaryCodecPass(t, largeDecimalSchema, largeRat, []byte("\x40\x1b\x4b\x68\x19\x26\x11\xfa\xea\x20\x8f\xca\x21\x62\x7b\xe9\xda\xee\x32\x19\x83\x83\x95\x5d\xe8\x13\x1f\x4b\xf1\xc7\x1c\x71\xc7")) } func TestDecimalFixedLogicalTypeEncode(t *testing.T) { schema := `{"type": "fixed", "size": 12, "logicalType": "decimal", "precision": 4, "scale": 2}` testBinaryCodecPass(t, schema, big.NewRat(617, 50), []byte("\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x04\xd2")) testBinaryCodecPass(t, schema, big.NewRat(-617, 50), []byte("\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\x2e")) testBinaryCodecPass(t, schema, big.NewRat(25, 4), []byte("\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x71")) testBinaryCodecPass(t, schema, big.NewRat(33, 100), []byte("\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x21")) schema0scale := `{"type": "fixed", "size": 12, "logicalType": "decimal", "precision": 4, "scale": 0}` // Encodes to 12 due to scale: 0 testBinaryEncodePass(t, schema0scale, big.NewRat(617, 50), []byte("\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0c")) testBinaryDecodePass(t, schema0scale, big.NewRat(12, 1), []byte("\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0c")) } func TestDecimalBytesLogicalTypeInRecordEncode(t *testing.T) { schema := `{"type": "record", "name": "myrecord", "fields" : [ {"name": "mydecimal", "type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2}]}` testBinaryCodecPass(t, schema, map[string]interface{}{"mydecimal": big.NewRat(617, 50)}, []byte("\x04\x04\xd2")) } func ExampleUnion_logicalType() { // Supported logical types and their native go types: // * timestamp-millis - time.Time // * timestamp-micros - time.Time // * time-millis - time.Duration // * time-micros - time.Duration // * date - int // * decimal - big.Rat codec, err := NewCodec(`["null", {"type": "long", "logicalType": "timestamp-millis"}]`) if err != nil { fmt.Println(err) } // Note the usage of type.logicalType i.e. `long.timestamp-millis` to denote the type in a union. This is due to the single string naming format // used by goavro. Decimal can be both bytes.decimal or fixed.decimal bytes, err := codec.BinaryFromNative(nil, map[string]interface{}{"long.timestamp-millis": time.Date(2006, 1, 2, 15, 4, 5, 0, time.UTC)}) if err != nil { fmt.Println(err) } decoded, _, err := codec.NativeFromBinary(bytes) if err != nil { fmt.Println(err) } out := decoded.(map[string]interface{}) fmt.Printf("%#v\n", out["long.timestamp-millis"].(time.Time).String()) // Output: "2006-01-02 15:04:05 +0000 UTC" } goavro-2.10.1/map.go000066400000000000000000000257711412474230400142110ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "errors" "fmt" "io" "math" "reflect" ) func makeMapCodec(st map[string]*Codec, namespace string, schemaMap map[string]interface{}, cb *codecBuilder) (*Codec, error) { // map type must have values valueSchema, ok := schemaMap["values"] if !ok { return nil, errors.New("Map ought to have values key") } valueCodec, err := buildCodec(st, namespace, valueSchema, cb) if err != nil { return nil, fmt.Errorf("Map values ought to be valid Avro type: %s", err) } return &Codec{ typeName: &name{"map", nullNamespace}, nativeFromBinary: func(buf []byte) (interface{}, []byte, error) { var err error var value interface{} // block count and block size if value, buf, err = longNativeFromBinary(buf); err != nil { return nil, nil, fmt.Errorf("cannot decode binary map block count: %s", err) } blockCount := value.(int64) if blockCount < 0 { // NOTE: A negative block count implies there is a long encoded // block size following the negative block count. We have no use // for the block size in this decoder, so we read and discard // the value. if blockCount == math.MinInt64 { // The minimum number for any signed numerical type can // never be made positive return nil, nil, fmt.Errorf("cannot decode binary map with block count: %d", blockCount) } blockCount = -blockCount // convert to its positive equivalent if _, buf, err = longNativeFromBinary(buf); err != nil { return nil, nil, fmt.Errorf("cannot decode binary map block size: %s", err) } } // Ensure block count does not exceed some sane value. if blockCount > MaxBlockCount { return nil, nil, fmt.Errorf("cannot decode binary map when block count exceeds MaxBlockCount: %d > %d", blockCount, MaxBlockCount) } // NOTE: While the attempt of a RAM optimization shown below is not // necessary, many encoders will encode all items in a single block. // We can optimize amount of RAM allocated by runtime for the array // by initializing the array for that number of items. mapValues := make(map[string]interface{}, blockCount) for blockCount != 0 { // Decode `blockCount` datum values from buffer for i := int64(0); i < blockCount; i++ { // first decode the key string if value, buf, err = stringNativeFromBinary(buf); err != nil { return nil, nil, fmt.Errorf("cannot decode binary map key: %s", err) } key := value.(string) // string decoder always returns a string if _, ok := mapValues[key]; ok { return nil, nil, fmt.Errorf("cannot decode binary map: duplicate key: %q", key) } // then decode the value if value, buf, err = valueCodec.nativeFromBinary(buf); err != nil { return nil, nil, fmt.Errorf("cannot decode binary map value for key %q: %s", key, err) } mapValues[key] = value } // Decode next blockCount from buffer, because there may be more blocks if value, buf, err = longNativeFromBinary(buf); err != nil { return nil, nil, fmt.Errorf("cannot decode binary map block count: %s", err) } blockCount = value.(int64) if blockCount < 0 { // NOTE: A negative block count implies there is a long // encoded block size following the negative block count. We // have no use for the block size in this decoder, so we // read and discard the value. if blockCount == math.MinInt64 { // The minimum number for any signed numerical type can // never be made positive return nil, nil, fmt.Errorf("cannot decode binary map with block count: %d", blockCount) } blockCount = -blockCount // convert to its positive equivalent if _, buf, err = longNativeFromBinary(buf); err != nil { return nil, nil, fmt.Errorf("cannot decode binary map block size: %s", err) } } // Ensure block count does not exceed some sane value. if blockCount > MaxBlockCount { return nil, nil, fmt.Errorf("cannot decode binary map when block count exceeds MaxBlockCount: %d > %d", blockCount, MaxBlockCount) } } return mapValues, buf, nil }, binaryFromNative: func(buf []byte, datum interface{}) ([]byte, error) { mapValues, err := convertMap(datum) if err != nil { return nil, fmt.Errorf("cannot encode binary map: %s", err) } keyCount := int64(len(mapValues)) var alreadyEncoded, remainingInBlock int64 for k, v := range mapValues { if remainingInBlock == 0 { // start a new block remainingInBlock = keyCount - alreadyEncoded if remainingInBlock > MaxBlockCount { // limit block count to MacBlockCount remainingInBlock = MaxBlockCount } buf, _ = longBinaryFromNative(buf, remainingInBlock) } // only fails when given non string, so elide error checking buf, _ = stringBinaryFromNative(buf, k) // encode the value if buf, err = valueCodec.binaryFromNative(buf, v); err != nil { return nil, fmt.Errorf("cannot encode binary map value for key %q: %v: %s", k, v, err) } remainingInBlock-- alreadyEncoded++ } return longBinaryFromNative(buf, 0) // append tailing 0 block count to signal end of Map }, nativeFromTextual: func(buf []byte) (interface{}, []byte, error) { return genericMapTextDecoder(buf, valueCodec, nil) // codecFromKey == nil }, textualFromNative: func(buf []byte, datum interface{}) ([]byte, error) { return genericMapTextEncoder(buf, datum, valueCodec, nil) }, }, nil } // genericMapTextDecoder decodes a JSON text blob to a native Go map, using the // codecs from codecFromKey, and if a key is not found in that map, from // defaultCodec if provided. If defaultCodec is nil, this function returns an // error if it encounters a map key that is not present in codecFromKey. If // codecFromKey is nil, every map value will be decoded using defaultCodec, if // possible. func genericMapTextDecoder(buf []byte, defaultCodec *Codec, codecFromKey map[string]*Codec) (map[string]interface{}, []byte, error) { var value interface{} var err error var b byte lencodec := len(codecFromKey) mapValues := make(map[string]interface{}, lencodec) if buf, err = advanceAndConsume(buf, '{'); err != nil { return nil, nil, err } if buf, _ = advanceToNonWhitespace(buf); len(buf) == 0 { return nil, nil, io.ErrShortBuffer } // NOTE: Special case empty map if buf[0] == '}' { return mapValues, buf[1:], nil } // NOTE: Also terminates when read '}' byte. for len(buf) > 0 { // decode key string value, buf, err = stringNativeFromTextual(buf) if err != nil { return nil, nil, fmt.Errorf("cannot decode textual map: expected key: %s", err) } key := value.(string) // Is key already used? if _, ok := mapValues[key]; ok { return nil, nil, fmt.Errorf("cannot decode textual map: duplicate key: %q", key) } // Find a codec for the key fieldCodec := codecFromKey[key] if fieldCodec == nil { fieldCodec = defaultCodec } if fieldCodec == nil { return nil, nil, fmt.Errorf("cannot decode textual map: cannot determine codec: %q", key) } // decode colon if buf, err = advanceAndConsume(buf, ':'); err != nil { return nil, nil, err } // decode value if buf, _ = advanceToNonWhitespace(buf); len(buf) == 0 { return nil, nil, io.ErrShortBuffer } value, buf, err = fieldCodec.nativeFromTextual(buf) if err != nil { return nil, nil, fmt.Errorf("%s for key: %q", err, key) } // set map value for key mapValues[key] = value // either comma or closing curly brace if buf, _ = advanceToNonWhitespace(buf); len(buf) == 0 { return nil, nil, io.ErrShortBuffer } switch b = buf[0]; b { case '}': return mapValues, buf[1:], nil case ',': // no-op default: return nil, nil, fmt.Errorf("cannot decode textual map: expected ',' or '}'; received: %q", b) } // NOTE: consume comma from above if buf, _ = advanceToNonWhitespace(buf[1:]); len(buf) == 0 { return nil, nil, io.ErrShortBuffer } } return nil, nil, io.ErrShortBuffer } // genericMapTextEncoder encodes a native Go map to a JSON text blob, using the // codecs from codecFromKey, and if a key is not found in that map, from // defaultCodec if provided. If defaultCodec is nil, this function returns an // error if it encounters a map key that is not present in codecFromKey. If // codecFromKey is nil, every map value will be encoded using defaultCodec, if // possible. func genericMapTextEncoder(buf []byte, datum interface{}, defaultCodec *Codec, codecFromKey map[string]*Codec) ([]byte, error) { mapValues, err := convertMap(datum) if err != nil { return nil, fmt.Errorf("cannot encode textual map: %s", err) } var atLeastOne bool buf = append(buf, '{') for key, value := range mapValues { atLeastOne = true // Find a codec for the key fieldCodec := codecFromKey[key] if fieldCodec == nil { fieldCodec = defaultCodec } if fieldCodec == nil { return nil, fmt.Errorf("cannot encode textual map: cannot determine codec: %q", key) } // Encode key string buf, err = stringTextualFromNative(buf, key) if err != nil { return nil, err } buf = append(buf, ':') // Encode value buf, err = fieldCodec.textualFromNative(buf, value) if err != nil { // field was specified in datum; therefore its value was invalid return nil, fmt.Errorf("cannot encode textual map: value for %q does not match its schema: %s", key, err) } buf = append(buf, ',') } if atLeastOne { return append(buf[:len(buf)-1], '}'), nil } return append(buf, '}'), nil } // convertMap converts datum to map[string]interface{} if possible. func convertMap(datum interface{}) (map[string]interface{}, error) { mapValues, ok := datum.(map[string]interface{}) if ok { return mapValues, nil } // NOTE: When given a map of any other type, zip values to items as a // convenience to client. v := reflect.ValueOf(datum) if v.Kind() != reflect.Map { return nil, fmt.Errorf("cannot create map[string]interface{}: expected map[string]...; received: %T", datum) } // NOTE: Two better alternatives to the current algorithm are: // (1) mutate the reflection tuple underneath to convert the // map[string]int, for example, to map[string]interface{}, with // O(1) complexity. // (2) use copy builtin to zip the data items over with O(n) complexity, // but more efficient than what's below. mapValues = make(map[string]interface{}, v.Len()) for _, key := range v.MapKeys() { k, ok := key.Interface().(string) if !ok { // bail when map key type is not string return nil, fmt.Errorf("cannot create map[string]interface{}: expected map[string]...; received: %T", datum) } mapValues[string(k)] = v.MapIndex(key).Interface() } return mapValues, nil } goavro-2.10.1/map_test.go000066400000000000000000000167241412474230400152460ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "fmt" "log" "testing" ) func TestMapSchema(t *testing.T) { // NOTE: This schema also used to read and write files in OCF format testSchemaValid(t, `{"type":"map","values":"bytes"}`) testSchemaInvalid(t, `{"type":"map","value":"int"}`, "Map ought to have values key") testSchemaInvalid(t, `{"type":"map","values":"integer"}`, "Map values ought to be valid Avro type") testSchemaInvalid(t, `{"type":"map","values":3}`, "Map values ought to be valid Avro type") testSchemaInvalid(t, `{"type":"map","values":int}`, "invalid character") // type name must be quoted } func TestMapDecodeInitialBlockCountCannotDecode(t *testing.T) { testBinaryDecodeFail(t, `{"type":"map","values":"int"}`, nil, "block count") } func TestMapDecodeInitialBlockCountZero(t *testing.T) { testBinaryDecodePass(t, `{"type":"map","values":"int"}`, map[string]interface{}{}, []byte{0}) } func TestMapDecodeInitialBlockCountNegative(t *testing.T) { testBinaryDecodePass(t, `{"type":"map","values":"int"}`, map[string]interface{}{"k1": 3}, []byte{1, 2, 4, 'k', '1', 6, 0}) } func TestMapDecodeInitialBlockCountTooLarge(t *testing.T) { testBinaryDecodeFail(t, `{"type":"map","values":"int"}`, morePositiveThanMaxBlockCount, "block count") } func TestMapDecodeInitialBlockCountNegativeTooLarge(t *testing.T) { testBinaryDecodeFail(t, `{"type":"map","values":"int"}`, append(moreNegativeThanMaxBlockCount, byte(0)), "block count") } func TestMapDecodeInitialBlockCountTooNegative(t *testing.T) { testBinaryDecodeFail(t, `{"type":"map","values":"int"}`, append(mostNegativeBlockCount, byte(0)), "block count") } func TestMapDecodeNextBlockCountCannotDecode(t *testing.T) { testBinaryDecodeFail(t, `{"type":"map","values":"int"}`, []byte{1, 2, 4, 'k', '1', 6}, "block count") } func TestMapDecodeNextBlockCountNegative(t *testing.T) { c, err := NewCodec(`{"type":"map","values":"int"}`) if err != nil { t.Fatal(err) } decoded, _, err := c.NativeFromBinary([]byte{1, 2, 4, 'k', '1', 6, 1, 8, 4, 'k', '2', 0x1a, 0}) if err != nil { t.Fatal(err) } decodedMap, ok := decoded.(map[string]interface{}) if !ok { t.Fatalf("GOT: %v; WANT: %v", ok, true) } value, ok := decodedMap["k1"] if !ok { t.Errorf("GOT: %v; WANT: %v", ok, true) } if actual, expected := value.(int32), int32(3); actual != expected { t.Errorf("GOT: %v; WANT: %v", actual, expected) } value, ok = decodedMap["k2"] if !ok { t.Errorf("GOT: %v; WANT: %v", ok, true) } if actual, expected := value.(int32), int32(13); actual != expected { t.Errorf("GOT: %v; WANT: %v", actual, expected) } } func TestMapDecodeNextBlockCountTooLarge(t *testing.T) { testBinaryDecodeFail(t, `{"type":"map","values":"int"}`, append([]byte{1, 2, 4, 'k', '1', 6}, morePositiveThanMaxBlockCount...), "block count") } func TestMapDecodeNextBlockCountNegativeTooLarge(t *testing.T) { testBinaryDecodeFail(t, `{"type":"map","values":"int"}`, append(append([]byte{1, 2, 4, 'k', '1', 6}, moreNegativeThanMaxBlockCount...), 2), "block count") } func TestMapDecodeNextBlockCountTooNegative(t *testing.T) { testBinaryDecodeFail(t, `{"type":"map","values":"int"}`, append(append([]byte{1, 2, 4, 'k', '1', 6}, mostNegativeBlockCount...), 2), "block count") } func TestMapDecodeFail(t *testing.T) { schema := `{"type":"map","values":"boolean"}` testBinaryDecodeFail(t, schema, nil, "cannot decode binary map block count") // leading block count testBinaryDecodeFail(t, schema, []byte("\x01"), "cannot decode binary map block size") // when block count < 0 testBinaryDecodeFail(t, schema, []byte("\x02\x04"), "cannot decode binary map key") testBinaryDecodeFail(t, schema, []byte("\x02\x04"), "cannot decode binary map key") testBinaryDecodeFail(t, schema, []byte("\x02\x04a"), "cannot decode binary map key") testBinaryDecodeFail(t, schema, []byte("\x02\x04ab"), `cannot decode binary map value for key "ab"`) testBinaryDecodeFail(t, schema, []byte("\x02\x04ab\x02"), "boolean: expected") testBinaryDecodeFail(t, schema, []byte("\x02\x04ab\x01"), "cannot decode binary map block count") // trailing block count testBinaryDecodeFail(t, schema, []byte("\x04\x04ab\x00\x04ab\x00\x00"), "duplicate key") } func TestMap(t *testing.T) { testBinaryCodecPass(t, `{"type":"map","values":"null"}`, map[string]interface{}{"ab": nil}, []byte("\x02\x04ab\x00")) testBinaryCodecPass(t, `{"type":"map","values":"boolean"}`, map[string]interface{}{"ab": true}, []byte("\x02\x04ab\x01\x00")) } func TestMapTextDecodeFail(t *testing.T) { schema := `{"type":"map","values":"string"}` testTextDecodeFail(t, schema, []byte(` "string" : "silly" , "bytes" : "silly" } `), "expected: '{'") testTextDecodeFail(t, schema, []byte(` { 16 : "silly" , "bytes" : "silly" } `), "expected initial \"") testTextDecodeFail(t, schema, []byte(` { "string" , "silly" , "bytes" : "silly" } `), "expected: ':'") testTextDecodeFail(t, schema, []byte(` { "string" : 13 , "bytes" : "silly" } `), "expected initial \"") testTextDecodeFail(t, schema, []byte(` { "string" : "silly" : "bytes" : "silly" } `), "expected ',' or '}'") testTextDecodeFail(t, schema, []byte(` { "string" : "silly" "bytes" : "silly" } `), "expected ',' or '}'") testTextDecodeFail(t, schema, []byte(` { "string" : "silly" , "bytes" : "silly" `), "short buffer") testTextDecodeFail(t, schema, []byte(` { "string" : "silly" `), "short buffer") testTextDecodeFail(t, schema, []byte(`{"key1":"\u0001\u2318 ","key1":"value2"}`), "duplicate key") } func TestMapTextCodecPass(t *testing.T) { schema := `{"type":"map","values":"string"}` datum := map[string]interface{}{"key1": "⌘ "} testTextCodecPass(t, schema, make(map[string]interface{}), []byte(`{}`)) // empty map testTextEncodePass(t, schema, datum, []byte(`{"key1":"\u0001\u2318 "}`)) testTextDecodePass(t, schema, datum, []byte(` { "key1" : "\u0001\u2318 " }`)) } func TestMapBinaryReceiveSliceInt(t *testing.T) { testBinaryCodecPass(t, `{"type":"map","values":"int"}`, map[string]int{}, []byte("\x00")) testBinaryCodecPass(t, `{"type":"map","values":"int"}`, map[string]int{"k1": 13}, []byte("\x02\x04k1\x1a\x00")) testBinaryEncodeFail(t, `{"type":"map","values":"int"}`, map[int]int{42: 13}, "cannot create map[string]interface{}") } func TestMapTextualReceiveSliceInt(t *testing.T) { testTextCodecPass(t, `{"type":"map","values":"int"}`, map[string]int{}, []byte(`{}`)) testTextCodecPass(t, `{"type":"map","values":"int"}`, map[string]int{"k1": 13}, []byte(`{"k1":13}`)) testTextEncodeFail(t, `{"type":"map","values":"int"}`, map[int]int{42: 13}, "cannot create map[string]interface{}") } func ExampleMap() { codec, err := NewCodec(`{ "name": "r1", "type": "record", "fields": [{ "name": "f1", "type": {"type":"map","values":"double"} }] }`) if err != nil { log.Fatal(err) } buf, err := codec.TextualFromNative(nil, map[string]interface{}{ "f1": map[string]float64{ "k1": 3.5, }, }) if err != nil { log.Fatal(err) } fmt.Println(string(buf)) // Output: {"f1":{"k1":3.5}} } goavro-2.10.1/name.go000066400000000000000000000101731412474230400143420ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "errors" "fmt" "strings" ) const nullNamespace = "" // ErrInvalidName is the error returned when one or more parts of an Avro name // is invalid. type ErrInvalidName struct { Message string } func (e ErrInvalidName) Error() string { return "schema name ought to " + e.Message } // NOTE: This function designed to work with name components, after they have // been split on the period rune. func isRuneInvalidForFirstCharacter(r rune) bool { return (r < 'A' || r > 'Z') && (r < 'a' || r > 'z') && r != '_' } func isRuneInvalidForOtherCharacters(r rune) bool { return isRuneInvalidForFirstCharacter(r) && (r < '0' || r > '9') } func checkNameComponent(s string) error { err := checkString(s) if err != nil { return &ErrInvalidName{err.Error()} } return err } func checkString(s string) error { if len(s) == 0 { return errors.New("be non-empty string") } if strings.IndexFunc(s[:1], isRuneInvalidForFirstCharacter) != -1 { return errors.New("start with [A-Za-z_]: " + s) } if strings.IndexFunc(s[1:], isRuneInvalidForOtherCharacters) != -1 { return errors.New("have second and remaining characters contain only [A-Za-z0-9_]: " + s) } return nil } // name describes an Avro name in terms of its full name and namespace. type name struct { fullName string // the instance's Avro name namespace string // for use when building new name from existing one } // newName returns a new Name instance after first ensuring the arguments do not // violate any of the Avro naming rules. func newName(n, ns, ens string) (*name, error) { var nn name if index := strings.LastIndexByte(n, '.'); index > -1 { // inputName does contain a dot, so ignore everything else and use it as the full name nn.fullName = n nn.namespace = n[:index] } else { // inputName does not contain a dot, therefore is not the full name if ns != nullNamespace { // if namespace provided in the schema in the same schema level, use it nn.fullName = ns + "." + n nn.namespace = ns } else if ens != nullNamespace { // otherwise if enclosing namespace provided, use it nn.fullName = ens + "." + n nn.namespace = ens } else { // otherwise no namespace, so use null namespace, the empty string nn.fullName = n } } // verify all components of the full name for adherence to Avro naming rules for i, component := range strings.Split(nn.fullName, ".") { if i == 0 && RelaxedNameValidation && component == "" { continue } if err := checkNameComponent(component); err != nil { return nil, err } } return &nn, nil } var ( // RelaxedNameValidation causes name validation to allow the first component // of an Avro namespace to be the empty string. RelaxedNameValidation bool ) func newNameFromSchemaMap(enclosingNamespace string, schemaMap map[string]interface{}) (*name, error) { var nameString, namespaceString string name, ok := schemaMap["name"] if !ok { return nil, errors.New("schema ought to have name key") } nameString, ok = name.(string) if !ok || nameString == nullNamespace { return nil, fmt.Errorf("schema name ought to be non-empty string; received: %T: %v", name, name) } if namespace, ok := schemaMap["namespace"]; ok { namespaceString, ok = namespace.(string) if !ok { return nil, fmt.Errorf("schema namespace, if provided, ought to be a string; received: %T: %v", namespace, namespace) } } return newName(nameString, namespaceString, enclosingNamespace) } func (n *name) String() string { return n.fullName } // short returns the name without the prefixed namespace. func (n *name) short() string { if index := strings.LastIndexByte(n.fullName, '.'); index > -1 { return n.fullName[index+1:] } return n.fullName } goavro-2.10.1/name_test.go000066400000000000000000000061671412474230400154110ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro // NOTE: part of goavro package because it tests private functionality import ( "testing" ) func TestNameStartsInvalidCharacter(t *testing.T) { _, err := newName("&X", "org.foo", nullNamespace) if _, ok := err.(ErrInvalidName); err == nil && !ok { t.Errorf("GOT: %#v, WANT: %#v", err, ErrInvalidName{"start with [A-Za-z_]"}) } } func TestNameContainsInvalidCharacter(t *testing.T) { _, err := newName("X&", "org.foo.bar", nullNamespace) if _, ok := err.(ErrInvalidName); err == nil && !ok { t.Errorf("GOT: %#v, WANT: %#v", err, ErrInvalidName{"start with [A-Za-z_]"}) } } func TestNamespaceContainsInvalidCharacter(t *testing.T) { defer func() { RelaxedNameValidation = false }() RelaxedNameValidation = true n, err := newName("X", ".org.foo", nullNamespace) if err != nil { t.Fatal(err) } if actual, expected := n.fullName, ".org.foo.X"; actual != expected { t.Errorf("GOT: %#v; WANT: %#v", actual, expected) } if actual, expected := n.namespace, ".org.foo"; actual != expected { t.Errorf("GOT: %#v; WANT: %#v", actual, expected) } } func TestNameAndNamespaceProvided(t *testing.T) { n, err := newName("X", "org.foo", nullNamespace) if err != nil { t.Fatal(err) } if actual, expected := n.fullName, "org.foo.X"; actual != expected { t.Errorf("GOT: %#v; WANT: %#v", actual, expected) } if actual, expected := n.namespace, "org.foo"; actual != expected { t.Errorf("GOT: %#v; WANT: %#v", actual, expected) } } func TestNameWithDotIgnoresNamespace(t *testing.T) { n, err := newName("org.bar.X", "some.ignored.namespace", nullNamespace) if err != nil { t.Fatal(err) } if actual, expected := n.fullName, "org.bar.X"; actual != expected { t.Errorf("GOT: %#v; WANT: %#v", actual, expected) } if actual, expected := n.namespace, "org.bar"; actual != expected { t.Errorf("GOT: %#v; WANT: %#v", actual, expected) } } func TestNameWithoutDotsButWithEmptyNamespaceAndEnclosingName(t *testing.T) { n, err := newName("X", nullNamespace, "org.foo") if err != nil { t.Fatal(err) } if actual, expected := n.fullName, "org.foo.X"; actual != expected { t.Errorf("GOT: %#v; WANT: %#v", actual, expected) } if actual, expected := n.namespace, "org.foo"; actual != expected { t.Errorf("GOT: %#v; WANT: %#v", actual, expected) } } func TestNewNameFromSchemaMap(t *testing.T) { n, err := newNameFromSchemaMap(nullNamespace, map[string]interface{}{ "name": "foo", "namespace": "", "type": map[string]interface{}{}, }) ensureError(t, err) if got, want := n.fullName, "foo"; got != want { t.Errorf("GOT: %q; WANT: %q", got, want) } if got, want := n.namespace, ""; got != want { t.Errorf("GOT: %q; WANT: %q", got, want) } } goavro-2.10.1/null.go000066400000000000000000000026221412474230400143740ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "bytes" "errors" "fmt" "io" ) var nullBytes = []byte("null") func nullNativeFromBinary(buf []byte) (interface{}, []byte, error) { return nil, buf, nil } func nullBinaryFromNative(buf []byte, datum interface{}) ([]byte, error) { if datum != nil { return nil, fmt.Errorf("cannot encode binary null: expected: Go nil; received: %T", datum) } return buf, nil } func nullNativeFromTextual(buf []byte) (interface{}, []byte, error) { if len(buf) < 4 { return nil, nil, fmt.Errorf("cannot decode textual null: %s", io.ErrShortBuffer) } if bytes.Equal(buf[:4], nullBytes) { return nil, buf[4:], nil } return nil, nil, errors.New("cannot decode textual null: expected: null") } func nullTextualFromNative(buf []byte, datum interface{}) ([]byte, error) { if datum != nil { return nil, fmt.Errorf("cannot encode textual null: expected: Go nil; received: %T", datum) } return append(buf, nullBytes...), nil } goavro-2.10.1/null_test.go000066400000000000000000000016031412474230400154310ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import "testing" func TestSchemaPrimitiveNullCodec(t *testing.T) { testSchemaPrimativeCodec(t, `"null"`) } func TestPrimitiveNullBinary(t *testing.T) { testBinaryEncodeFailBadDatumType(t, `"null"`, false) testBinaryCodecPass(t, `"null"`, nil, nil) } func TestPrimitiveNullText(t *testing.T) { testTextEncodeFailBadDatumType(t, `"null"`, false) testTextCodecPass(t, `"null"`, nil, []byte("null")) } goavro-2.10.1/number_recover_test.go000066400000000000000000000036771412474230400175110ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "reflect" "testing" ) func testPrimitiveRecoverNative(t *testing.T, schema string, value interface{}) { t.Helper() codec, err := NewCodec(schema) if err != nil { t.Fatalf("Schema: %s; %s", schema, err) } // native -> binary -> native binary, err := codec.BinaryFromNative(nil, value) if err != nil { t.Fatalf("Datum: %v; %s", value, err) } native, _, err := codec.NativeFromBinary(binary) if err != nil { t.Fatalf("Datum: %s; %s", binary, err) } if reflect.TypeOf(value) != reflect.TypeOf(native) { t.Fatalf("Datum: %v expected type %T but was value %v of type %T", value, value, native, native) } // native -> textual -> native textual, err := codec.TextualFromNative(nil, value) if err != nil { t.Fatalf("Datum: %v; %s", value, err) } native, _, err = codec.NativeFromTextual(textual) if err != nil { t.Fatalf("Datum: %s; %s", textual, err) } if reflect.TypeOf(value) != reflect.TypeOf(native) { t.Fatalf("Datum: %v expected type %T but was value %v of type %T", value, value, native, native) } } func TestPrimitiveRecoverInt(t *testing.T) { testPrimitiveRecoverNative(t, `"int"`, int32(1010)) } func TestPrimitiveRecoverLong(t *testing.T) { testPrimitiveRecoverNative(t, `"long"`, int64(8128953)) } func TestPrimitiveRecoverFloat(t *testing.T) { testPrimitiveRecoverNative(t, `"float"`, float32(-8.937134)) } func TestPrimitiveRecoverDouble(t *testing.T) { testPrimitiveRecoverNative(t, `"double"`, float64(5.247290238727473)) } goavro-2.10.1/ocf.go000066400000000000000000000146121412474230400141730ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "bytes" "crypto/rand" "errors" "fmt" "io" ) const ( // CompressionNullLabel is used when OCF blocks are not compressed. CompressionNullLabel = "null" // CompressionDeflateLabel is used when OCF blocks are compressed using the // deflate algorithm. CompressionDeflateLabel = "deflate" // CompressionSnappyLabel is used when OCF blocks are compressed using the // snappy algorithm. CompressionSnappyLabel = "snappy" ) // compressionID are values used to specify compression algorithm used to compress // and decompress Avro Object Container File (OCF) streams. type compressionID uint8 const ( compressionNull compressionID = iota compressionDeflate compressionSnappy ) const ( ocfBlockConst = 24 // Each OCF block has two longs prefix, and sync marker suffix ocfHeaderSizeConst = 48 // OCF header is usually about 48 bytes longer than its compressed schema ocfMagicString = "Obj\x01" ocfMetadataSchema = `{"type":"map","values":"bytes"}` ocfSyncLength = 16 ) var ( ocfMagicBytes = []byte(ocfMagicString) ocfMetadataCodec *Codec ) func init() { ocfMetadataCodec, _ = NewCodec(ocfMetadataSchema) } type ocfHeader struct { codec *Codec compressionID compressionID syncMarker [ocfSyncLength]byte metadata map[string][]byte } func newOCFHeader(config OCFConfig) (*ocfHeader, error) { var err error header := new(ocfHeader) // // avro.codec // switch config.CompressionName { case "": header.compressionID = compressionNull case CompressionNullLabel: header.compressionID = compressionNull case CompressionDeflateLabel: header.compressionID = compressionDeflate case CompressionSnappyLabel: header.compressionID = compressionSnappy default: return nil, fmt.Errorf("cannot create OCF header using unrecognized compression algorithm: %q", config.CompressionName) } // // avro.schema // if config.Codec != nil { header.codec = config.Codec } else if config.Schema == "" { return nil, fmt.Errorf("cannot create OCF header without either Codec or Schema specified") } else { if header.codec, err = NewCodec(config.Schema); err != nil { return nil, fmt.Errorf("cannot create OCF header: %s", err) } } header.metadata = config.MetaData // // The 16-byte, randomly-generated sync marker for this file. // _, err = rand.Read(header.syncMarker[:]) if err != nil { return nil, err } return header, nil } func readOCFHeader(ior io.Reader) (*ocfHeader, error) { // // magic bytes // magic := make([]byte, 4) _, err := io.ReadFull(ior, magic) if err != nil { return nil, fmt.Errorf("cannot read OCF header magic bytes: %s", err) } if !bytes.Equal(magic, ocfMagicBytes) { return nil, fmt.Errorf("cannot read OCF header with invalid magic bytes: %#q", magic) } // // metadata // metadata, err := metadataBinaryReader(ior) if err != nil { return nil, fmt.Errorf("cannot read OCF header metadata: %s", err) } // // avro.codec // // NOTE: Avro specification states that `null` cID is used by // default when "avro.codec" was not included in the metadata header. The // specification does not talk about the case when "avro.codec" was included // with the empty string as its value. I believe it is an error for an OCF // file to provide the empty string as the cID algorithm. While it // is trivially easy to gracefully handle here, I'm not sure whether this // happens a lot, and don't want to accept bad input unless we have // significant reason to do so. var cID compressionID value, ok := metadata["avro.codec"] if ok { switch avroCodec := string(value); avroCodec { case CompressionNullLabel: cID = compressionNull case CompressionDeflateLabel: cID = compressionDeflate case CompressionSnappyLabel: cID = compressionSnappy default: return nil, fmt.Errorf("cannot read OCF header using unrecognized compression algorithm from avro.codec: %q", avroCodec) } } // // create goavro.Codec from specified avro.schema // value, ok = metadata["avro.schema"] if !ok { return nil, errors.New("cannot read OCF header without avro.schema") } codec, err := NewCodec(string(value)) if err != nil { return nil, fmt.Errorf("cannot read OCF header with invalid avro.schema: %s", err) } header := &ocfHeader{codec: codec, compressionID: cID, metadata: metadata} // // read and store sync marker // if n, err := io.ReadFull(ior, header.syncMarker[:]); err != nil { return nil, fmt.Errorf("cannot read OCF header without sync marker: only read %d of %d bytes: %s", n, ocfSyncLength, err) } // // header is valid // return header, nil } func writeOCFHeader(header *ocfHeader, iow io.Writer) (err error) { // // avro.codec // var avroCodec string switch header.compressionID { case compressionNull: avroCodec = CompressionNullLabel case compressionDeflate: avroCodec = CompressionDeflateLabel case compressionSnappy: avroCodec = CompressionSnappyLabel default: return fmt.Errorf("should not get here: cannot write OCF header using unrecognized compression algorithm: %d", header.compressionID) } // // avro.schema // // Create buffer for OCF header. The first four bytes are magic, and we'll // use copy to fill them in, so initialize buffer's length with 4, and its // capacity equal to length of avro schema plus a constant. schema := header.codec.Schema() buf := make([]byte, 4, len(schema)+ocfHeaderSizeConst) _ = copy(buf, ocfMagicBytes) // // file metadata, including the schema // meta := make(map[string]interface{}) for k, v := range header.metadata { meta[k] = v } meta["avro.schema"] = []byte(schema) meta["avro.codec"] = []byte(avroCodec) buf, err = ocfMetadataCodec.BinaryFromNative(buf, meta) if err != nil { return fmt.Errorf("should not get here: cannot write OCF header: %s", err) } // // 16-byte sync marker // buf = append(buf, header.syncMarker[:]...) // emit OCF header _, err = iow.Write(buf) if err != nil { return fmt.Errorf("cannot write OCF header: %s", err) } return nil } goavro-2.10.1/ocf_reader.go000066400000000000000000000213011412474230400155060ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "bytes" "compress/flate" "encoding/binary" "errors" "fmt" "hash/crc32" "io" "io/ioutil" "github.com/golang/snappy" ) // OCFReader structure is used to read Object Container Files (OCF). type OCFReader struct { header *ocfHeader block []byte // buffer from which decoding takes place rerr error // most recent error that took place while reading bytes (unrecoverable) ior io.Reader readReady bool // true after Scan and before Read remainingBlockItems int64 // count of encoded data items remaining in block buffer to be decoded } // NewOCFReader initializes and returns a new structure used to read an Avro // Object Container File (OCF). // // func example(ior io.Reader) error { // // NOTE: Wrap provided io.Reader in a buffered reader, which improves the // // performance of streaming file data. // br := bufio.NewReader(ior) // ocfr, err := goavro.NewOCFReader(br) // if err != nil { // return err // } // for ocfr.Scan() { // datum, err := ocfr.Read() // if err != nil { // return err // } // fmt.Println(datum) // } // return ocfr.Err() // } func NewOCFReader(ior io.Reader) (*OCFReader, error) { header, err := readOCFHeader(ior) if err != nil { return nil, fmt.Errorf("cannot create OCFReader: %s", err) } return &OCFReader{header: header, ior: ior}, nil } //MetaData returns the file metadata map found within the OCF file func (ocfr *OCFReader) MetaData() map[string][]byte { return ocfr.header.metadata } // Codec returns the codec found within the OCF file. func (ocfr *OCFReader) Codec() *Codec { return ocfr.header.codec } // CompressionName returns the name of the compression algorithm found within // the OCF file. func (ocfr *OCFReader) CompressionName() string { switch ocfr.header.compressionID { case compressionNull: return CompressionNullLabel case compressionDeflate: return CompressionDeflateLabel case compressionSnappy: return CompressionSnappyLabel default: return "should not get here: unrecognized compression algorithm" } } // Err returns the last error encountered while reading the OCF file. See // `NewOCFReader` documentation for an example. func (ocfr *OCFReader) Err() error { return ocfr.rerr } // Read consumes one datum value from the Avro OCF stream and returns it. Read // is designed to be called only once after each invocation of the Scan method. // See `NewOCFReader` documentation for an example. func (ocfr *OCFReader) Read() (interface{}, error) { // NOTE: Test previous error before testing readReady to prevent overwriting // previous error. if ocfr.rerr != nil { return nil, ocfr.rerr } if !ocfr.readReady { ocfr.rerr = errors.New("Read called without successful Scan") return nil, ocfr.rerr } ocfr.readReady = false // decode one datum value from block var datum interface{} datum, ocfr.block, ocfr.rerr = ocfr.header.codec.NativeFromBinary(ocfr.block) if ocfr.rerr != nil { return false, ocfr.rerr } ocfr.remainingBlockItems-- return datum, nil } // RemainingBlockItems returns the number of items remaining in the block being // processed. func (ocfr *OCFReader) RemainingBlockItems() int64 { return ocfr.remainingBlockItems } // Scan returns true when there is at least one more data item to be read from // the Avro OCF. Scan ought to be called prior to calling the Read method each // time the Read method is invoked. See `NewOCFReader` documentation for an // example. func (ocfr *OCFReader) Scan() bool { ocfr.readReady = false if ocfr.rerr != nil { return false } // NOTE: If there are no more remaining data items from the existing block, // then attempt to slurp in the next block. if ocfr.remainingBlockItems <= 0 { if count := len(ocfr.block); count != 0 { ocfr.rerr = fmt.Errorf("extra bytes between final datum in previous block and block sync marker: %d", count) return false } // Read the block count and update the number of remaining items for // this block ocfr.remainingBlockItems, ocfr.rerr = longBinaryReader(ocfr.ior) if ocfr.rerr != nil { if ocfr.rerr == io.EOF { ocfr.rerr = nil // merely end of file, rather than error } else { ocfr.rerr = fmt.Errorf("cannot read block count: %s", ocfr.rerr) } return false } if ocfr.remainingBlockItems <= 0 { ocfr.rerr = fmt.Errorf("cannot decode when block count is not greater than 0: %d", ocfr.remainingBlockItems) return false } if ocfr.remainingBlockItems > MaxBlockCount { ocfr.rerr = fmt.Errorf("cannot decode when block count exceeds MaxBlockCount: %d > %d", ocfr.remainingBlockItems, MaxBlockCount) } var blockSize int64 blockSize, ocfr.rerr = longBinaryReader(ocfr.ior) if ocfr.rerr != nil { ocfr.rerr = fmt.Errorf("cannot read block size: %s", ocfr.rerr) return false } if blockSize <= 0 { ocfr.rerr = fmt.Errorf("cannot decode when block size is not greater than 0: %d", blockSize) return false } if blockSize > MaxBlockSize { ocfr.rerr = fmt.Errorf("cannot decode when block size exceeds MaxBlockSize: %d > %d", blockSize, MaxBlockSize) return false } // read entire block into buffer ocfr.block = make([]byte, blockSize) _, ocfr.rerr = io.ReadFull(ocfr.ior, ocfr.block) if ocfr.rerr != nil { ocfr.rerr = fmt.Errorf("cannot read block: %s", ocfr.rerr) return false } switch ocfr.header.compressionID { case compressionNull: // no-op case compressionDeflate: // NOTE: flate.NewReader wraps with io.ByteReader if argument does // not implement that interface. rc := flate.NewReader(bytes.NewBuffer(ocfr.block)) ocfr.block, ocfr.rerr = ioutil.ReadAll(rc) if ocfr.rerr != nil { _ = rc.Close() return false } if ocfr.rerr = rc.Close(); ocfr.rerr != nil { return false } case compressionSnappy: index := len(ocfr.block) - 4 // last 4 bytes is crc32 of decoded block if index <= 0 { ocfr.rerr = fmt.Errorf("cannot decompress snappy without CRC32 checksum: %d", len(ocfr.block)) return false } decoded, err := snappy.Decode(nil, ocfr.block[:index]) if err != nil { ocfr.rerr = fmt.Errorf("cannot decompress: %s", err) return false } actualCRC := crc32.ChecksumIEEE(decoded) expectedCRC := binary.BigEndian.Uint32(ocfr.block[index : index+4]) if actualCRC != expectedCRC { ocfr.rerr = fmt.Errorf("snappy CRC32 checksum mismatch: %x != %x", actualCRC, expectedCRC) return false } ocfr.block = decoded default: ocfr.rerr = fmt.Errorf("should not get here: cannot compress block using unrecognized compression: %d", ocfr.header.compressionID) return false } // read and ensure sync marker matches sync := make([]byte, ocfSyncLength) var n int if n, ocfr.rerr = io.ReadFull(ocfr.ior, sync); ocfr.rerr != nil { ocfr.rerr = fmt.Errorf("cannot read sync marker: read %d out of %d bytes: %s", n, ocfSyncLength, ocfr.rerr) return false } if !bytes.Equal(sync, ocfr.header.syncMarker[:]) { ocfr.rerr = fmt.Errorf("sync marker mismatch: %v != %v", sync, ocfr.header.syncMarker) return false } } ocfr.readReady = true return true } // SkipThisBlockAndReset can be called after an error occurs while reading or // decoding datum values from an OCF stream. OCF specifies each OCF stream // contain one or more blocks of data. Each block consists of a block count, the // number of bytes for the block, followed be the possibly compressed // block. Inside each decompressed block is all of the binary encoded datum // values concatenated together. In other words, OCF framing is at a block level // rather than a datum level. If there is an error while reading or decoding a // datum, the reader is not able to skip to the next datum value, because OCF // does not have any markers for where each datum ends and the next one // begins. Therefore, the reader is only able to skip this datum value and all // subsequent datum values in the current block, move to the next block and // start decoding datum values there. func (ocfr *OCFReader) SkipThisBlockAndReset() { // ??? is it an error to call method unless the reader has had an error ocfr.remainingBlockItems = 0 ocfr.block = ocfr.block[:0] ocfr.rerr = nil } goavro-2.10.1/ocf_reader_test.go000066400000000000000000000111321412474230400165460ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "bytes" "testing" ) // readOCFHeader, magic bytes func TestReadOCFHeaderMagicBytes(t *testing.T) { _, err := NewOCFReader(bytes.NewBuffer([]byte("Obj"))) // missing fourth byte ensureError(t, err, "cannot create OCF") _, err = NewOCFReader(bytes.NewBuffer([]byte("...."))) ensureError(t, err, "cannot create OCF") } // // cannot read OCF header // func testCannotReadOCFHeader(t *testing.T, input []byte, expected ...string) { t.Helper() _, err := NewOCFReader(bytes.NewBuffer(append([]byte("Obj\x01"), input...))) ensureError(t, err, append([]string{"cannot read OCF header"}, expected...)...) } // readOCFHeader, metadataBinaryReader, block count func TestReadOCFHeaderMetadataBinaryReaderBlockCount(t *testing.T) { testCannotReadOCFHeader(t, nil, "cannot read map block count", "EOF") testCannotReadOCFHeader(t, mostNegativeBlockCount, "cannot read map with block count") testCannotReadOCFHeader(t, []byte("\x01"), "cannot read map block size", "EOF") testCannotReadOCFHeader(t, morePositiveThanMaxBlockCount, "cannot read map when block count exceeds") } // readOCFHeader, metadataBinaryReader, bytesBinaryReader func TestReadOCFHeaderMetadataBinaryReaderMapKey(t *testing.T) { testCannotReadOCFHeader(t, []byte("\x02"), "cannot read map key", "cannot read bytes", "cannot read size", "EOF") testCannotReadOCFHeader(t, []byte("\x02\x01"), "cannot read map key", "cannot read bytes", "size is negative") testCannotReadOCFHeader(t, append([]byte("\x02"), morePositiveThanMaxBlockCount...), "cannot read map key", "cannot read bytes", "size exceeds MaxBlockSize") testCannotReadOCFHeader(t, append([]byte("\x02"), mostNegativeBlockCount...), "cannot read map key", "cannot read bytes", "size is negative") testCannotReadOCFHeader(t, append([]byte("\x02"), moreNegativeThanMaxBlockCount...), "cannot read map key", "cannot read bytes", "size is negative") testCannotReadOCFHeader(t, []byte("\x02\x02"), "cannot read map key", "cannot read bytes", "EOF") testCannotReadOCFHeader(t, []byte("\x02\x04k1\x04v1\x02\x04k1"), "cannot read map", "duplicate key") testCannotReadOCFHeader(t, []byte("\x04\x04k1\x04v1\x04k1"), "cannot read map", "duplicate key") } func TestReadOCFHeaderMetadataBinaryReaderMapValue(t *testing.T) { testCannotReadOCFHeader(t, []byte("\x02\x04k1"), "cannot read map value for key", "cannot read bytes", "EOF") // have already tested all other binaryBytesReader errors above testCannotReadOCFHeader(t, []byte("\x02\x04k1\x04v1"), "cannot read map block count", "EOF") testCannotReadOCFHeader(t, append([]byte("\x02\x04k1\x04v1"), mostNegativeBlockCount...), "cannot read map with block count") testCannotReadOCFHeader(t, []byte("\x02\x04k1\x04v1"), "cannot read map block count", "EOF") testCannotReadOCFHeader(t, []byte("\x02\x04k1\x04v1\x01"), "cannot read map block size", "EOF") testCannotReadOCFHeader(t, append(append([]byte("\x02\x04k1\x04v1"), moreNegativeThanMaxBlockCount...), []byte("\x02")...), "cannot read map when block count exceeds") testCannotReadOCFHeader(t, append([]byte("\x02\x04k1\x04v1"), morePositiveThanMaxBlockCount...), "cannot read map when block count exceeds") } // readOCFHeader, avro.codec func TestReadOCFHeaderMetadataAvroCodecUnknown(t *testing.T) { testCannotReadOCFHeader(t, []byte("\x02\x14avro.codec\x06bad\x00"), "cannot read OCF header", "unrecognized compression", "bad") } // readOCFHeader, avro.schema func TestReadOCFHeaderMetadataAvroSchemaMissing(t *testing.T) { testCannotReadOCFHeader(t, []byte("\x00"), "without avro.schema") testCannotReadOCFHeader(t, []byte("\x02\x16avro.schema\x04{}\x00"), "invalid avro.schema") } // readOCFHeader, sync marker func TestReadOCFHeaderMetadataSyncMarker(t *testing.T) { testCannotReadOCFHeader(t, []byte("\x02\x16avro.schema\x1e{\"type\":\"null\"}\x00"), "sync marker", "EOF") } // TODO: writeOCFHeader // // OCFReader // // func testOCFReader(t *testing.T, schema string, input []byte, expected ...string) { // _, err := NewOCFReader(bytes.NewBuffer(append([]byte("Obj\x01"), input...))) // ensureError(t, err, append([]string{"any prefix?"}, expected...)...) // } // func TestOCFReaderRead(t *testing.T) { // testOCFReader(t, // } goavro-2.10.1/ocf_test.go000066400000000000000000000052131412474230400152270ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "bytes" "fmt" "testing" ) // testOCFRoundTripWithHeaders has OCFWriter write to a buffer using specified // compression algorithm, then attempt to read it back func testOCFRoundTrip(t *testing.T, compressionName string) { testOCFRoundTripWithHeaders(t, compressionName, nil) } // testOCFRoundTripWithHeaders has OCFWriter write to a buffer using specified // compression algorithm and headers, then attempt to read it back func testOCFRoundTripWithHeaders(t *testing.T, compressionName string, headers map[string][]byte) { schema := `{"type":"long"}` bb := new(bytes.Buffer) ocfw, err := NewOCFWriter(OCFConfig{ W: bb, CompressionName: compressionName, Schema: schema, MetaData: headers, }) if err != nil { t.Fatal(err) } valuesToWrite := []int64{13, 42, -12, -1234} if err = ocfw.Append(valuesToWrite); err != nil { t.Fatal(err) } ocfr, err := NewOCFReader(bb) if err != nil { t.Fatal(err) } var valuesRead []int64 for ocfr.Scan() { value, err := ocfr.Read() if err != nil { t.Fatal(err) } valuesRead = append(valuesRead, value.(int64)) } if err = ocfr.Err(); err != nil { t.Fatal(err) } if actual, expected := len(valuesRead), len(valuesToWrite); actual != expected { t.Errorf("GOT: %v; WANT: %v", actual, expected) } for i := 0; i < len(valuesRead); i++ { if actual, expected := valuesRead[i], valuesToWrite[i]; actual != expected { t.Errorf("GOT: %v; WANT: %v", actual, expected) } } readMeta := ocfr.MetaData() for k, v := range headers { expected := fmt.Sprintf("%s", v) actual := fmt.Sprintf("%s", readMeta[k]) if actual != expected { t.Errorf("GOT: %v; WANT: %v (%v)", actual, expected, k) } } } func TestOCFWriterCompressionNull(t *testing.T) { testOCFRoundTrip(t, CompressionNullLabel) } func TestOCFWriterCompressionDeflate(t *testing.T) { testOCFRoundTrip(t, CompressionDeflateLabel) } func TestOCFWriterCompressionSnappy(t *testing.T) { testOCFRoundTrip(t, CompressionSnappyLabel) } func TestOCFWriterWithApplicationMetaData(t *testing.T) { testOCFRoundTripWithHeaders(t, CompressionNullLabel, map[string][]byte{"foo": []byte("BOING"), "goo": []byte("zoo")}) } goavro-2.10.1/ocf_writer.go000066400000000000000000000217741412474230400155760ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "bytes" "compress/flate" "encoding/binary" "errors" "fmt" "hash/crc32" "io" "io/ioutil" "os" "github.com/golang/snappy" ) // OCFConfig is used to specify creation parameters for OCFWriter. type OCFConfig struct { // W specifies the `io.Writer` to which to send the encoded data, // (required). If W is `*os.File`, then creating an OCF for writing will // attempt to read any existing OCF header and use the schema and // compression codec specified by the existing header, then advance the file // position to the tail end of the file for appending. W io.Writer // Codec specifies the Codec to use for the new OCFWriter, (optional). If // the W parameter above is an `*os.File` which contains a Codec, the Codec // in the existing file will be used instead. Otherwise if this Codec // parameter is specified, it will be used. If neither the W parameter above // is an `*os.File` with an existing Codec, nor this Codec parameter is // specified, the OCFWriter will create a new Codec from the schema string // specified by the Schema parameter below. Codec *Codec // Schema specifies the Avro schema for the data to be encoded, (optional). // If neither the W parameter above is an `*os.File` with an existing Codec, // nor the Codec parameter above is specified, the OCFWriter will create a // new Codec from the schema string specified by this Schema parameter. Schema string // CompressionName specifies the compression codec used, (optional). If // omitted, defaults to "null" codec. When appending to an existing OCF, // this field is ignored. CompressionName string //MetaData specifies application specific meta data to be added to //the OCF file. When appending to an existing OCF, this field //is ignored MetaData map[string][]byte } // OCFWriter is used to create a new or append to an existing Avro Object // Container File (OCF). type OCFWriter struct { header *ocfHeader iow io.Writer } // NewOCFWriter returns a new OCFWriter instance that may be used for appending // binary Avro data, either by appending to an existing OCF file or creating a // new OCF file. func NewOCFWriter(config OCFConfig) (*OCFWriter, error) { var err error ocf := &OCFWriter{iow: config.W} switch config.W.(type) { case nil: return nil, errors.New("cannot create OCFWriter when W is nil") case *os.File: file := config.W.(*os.File) stat, err := file.Stat() if err != nil { return nil, fmt.Errorf("cannot create OCFWriter: %s", err) } // NOTE: When upstream provides a new file, it will already exist but // have a size of 0 bytes. if stat.Size() > 0 { // attempt to read existing OCF header if ocf.header, err = readOCFHeader(file); err != nil { return nil, fmt.Errorf("cannot create OCFWriter: %s", err) } // prepare for appending data to existing OCF if err = ocf.quickScanToTail(file); err != nil { return nil, fmt.Errorf("cannot create OCFWriter: %s", err) } return ocf, nil // happy case for appending to existing OCF } } // create new OCF header based on configuration parameters if ocf.header, err = newOCFHeader(config); err != nil { return nil, fmt.Errorf("cannot create OCFWriter: %s", err) } if err = writeOCFHeader(ocf.header, config.W); err != nil { return nil, fmt.Errorf("cannot create OCFWriter: %s", err) } return ocf, nil // another happy case for creation of new OCF } // quickScanToTail advances the stream reader to the tail end of the // file. Rather than reading each encoded block, optionally decompressing it, // and then decoding it, this method reads the block count, ignoring it, then // reads the block size, then skips ahead to the followig block. It does this // repeatedly until attempts to read the file return io.EOF. func (ocfw *OCFWriter) quickScanToTail(ior io.Reader) error { sync := make([]byte, ocfSyncLength) for { // Read and validate block count blockCount, err := longBinaryReader(ior) if err != nil { if err == io.EOF { return nil // merely end of file, rather than error } return fmt.Errorf("cannot read block count: %s", err) } if blockCount <= 0 { return fmt.Errorf("cannot read when block count is not greater than 0: %d", blockCount) } if blockCount > MaxBlockCount { return fmt.Errorf("cannot read when block count exceeds MaxBlockCount: %d > %d", blockCount, MaxBlockCount) } // Read block size blockSize, err := longBinaryReader(ior) if err != nil { return fmt.Errorf("cannot read block size: %s", err) } if blockSize <= 0 { return fmt.Errorf("cannot read when block size is not greater than 0: %d", blockSize) } if blockSize > MaxBlockSize { return fmt.Errorf("cannot read when block size exceeds MaxBlockSize: %d > %d", blockSize, MaxBlockSize) } // Advance reader to end of block if _, err = io.CopyN(ioutil.Discard, ior, blockSize); err != nil { return fmt.Errorf("cannot seek to next block: %s", err) } // Read and validate sync marker var n int if n, err = io.ReadFull(ior, sync); err != nil { return fmt.Errorf("cannot read sync marker: read %d out of %d bytes: %s", n, ocfSyncLength, err) } if !bytes.Equal(sync, ocfw.header.syncMarker[:]) { return fmt.Errorf("sync marker mismatch: %v != %v", sync, ocfw.header.syncMarker) } } } // Append appends one or more data items to an OCF file in a block. If there are // more data items in the slice than MaxBlockCount allows, the data slice will // be chunked into multiple blocks, each not having more than MaxBlockCount // items. func (ocfw *OCFWriter) Append(data interface{}) error { arrayValues, err := convertArray(data) if err != nil { return err } // Chunk data so no block has more than MaxBlockCount items. for int64(len(arrayValues)) > MaxBlockCount { if err := ocfw.appendDataIntoBlock(arrayValues[:MaxBlockCount]); err != nil { return err } arrayValues = arrayValues[MaxBlockCount:] } return ocfw.appendDataIntoBlock(arrayValues) } func (ocfw *OCFWriter) appendDataIntoBlock(data []interface{}) error { var block []byte // working buffer for encoding data values var err error // Encode and concatenate each data item into the block for _, datum := range data { if block, err = ocfw.header.codec.BinaryFromNative(block, datum); err != nil { return fmt.Errorf("cannot translate datum to binary: %v; %s", datum, err) } } switch ocfw.header.compressionID { case compressionNull: // no-op case compressionDeflate: // compress into new bytes buffer. bb := bytes.NewBuffer(make([]byte, 0, len(block))) cw, _ := flate.NewWriter(bb, flate.DefaultCompression) // writing bytes to cw will compress bytes and send to bb. if _, err := cw.Write(block); err != nil { return err } if err := cw.Close(); err != nil { return err } block = bb.Bytes() case compressionSnappy: compressed := snappy.Encode(nil, block) // OCF requires snappy to have CRC32 checksum after each snappy block compressed = append(compressed, 0, 0, 0, 0) // expand slice by 4 bytes so checksum will fit binary.BigEndian.PutUint32(compressed[len(compressed)-4:], crc32.ChecksumIEEE(block)) // checksum of decompressed block block = compressed default: return fmt.Errorf("should not get here: cannot compress block using unrecognized compression: %d", ocfw.header.compressionID) } // create file data block buf := make([]byte, 0, len(block)+ocfBlockConst) // pre-allocate block bytes buf, _ = longBinaryFromNative(buf, len(data)) // block count (number of data items) buf, _ = longBinaryFromNative(buf, len(block)) // block size (number of bytes in block) buf = append(buf, block...) // serialized objects buf = append(buf, ocfw.header.syncMarker[:]...) // sync marker _, err = ocfw.iow.Write(buf) return err } // Codec returns the codec used by OCFWriter. This function provided because // upstream may be appending to existing OCF which uses a different schema than // requested during instantiation. func (ocfw *OCFWriter) Codec() *Codec { return ocfw.header.codec } // CompressionName returns the name of the compression algorithm used by // OCFWriter. This function provided because upstream may be appending to // existing OCF which uses a different compression algorithm than requested // during instantiation. the OCF file. func (ocfw *OCFWriter) CompressionName() string { switch ocfw.header.compressionID { case compressionNull: return CompressionNullLabel case compressionDeflate: return CompressionDeflateLabel case compressionSnappy: return CompressionSnappyLabel default: return "should not get here: unrecognized compression algorithm" } } goavro-2.10.1/ocf_writer_test.go000066400000000000000000000215221412474230400166240ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "bytes" "io" "os" "testing" ) // createTestFile is used to create a new test file fixture with provided data func createTestFile(t *testing.T, pathname string, data []byte) { t.Helper() nf, err := os.Create(pathname) if err != nil { t.Fatal(err) } if _, err = nf.Write(data); err != nil { t.Fatal(err) } if err = nf.Close(); err != nil { t.Fatal(err) } } // NOTE: already tested readOCFHeader func TestNewOCFWriterWhenNotFileNewOCFHeader(t *testing.T) { // when config.W nil _, err := NewOCFWriter(OCFConfig{}) ensureError(t, err, "cannot create OCFWriter", "when W is nil") // when config.CompressionName invalid _, err = NewOCFWriter(OCFConfig{W: new(bytes.Buffer), CompressionName: "*invalid*compression*algorithm*"}) ensureError(t, err, "cannot create OCFWriter", "unrecognized compression algorithm") // when config.Schema doesn't compile _, err = NewOCFWriter(OCFConfig{W: new(bytes.Buffer), CompressionName: "null", Schema: "invalid-schema"}) ensureError(t, err, "cannot create OCFWriter", "cannot unmarshal schema") _, err = NewOCFWriter(OCFConfig{W: new(bytes.Buffer), CompressionName: "null", Schema: `{}`}) ensureError(t, err, "cannot create OCFWriter", "missing type") _, err = NewOCFWriter(OCFConfig{W: new(bytes.Buffer), CompressionName: "null"}) ensureError(t, err, "cannot create OCFWriter", "without either Codec or Schema specified") } func TestNewOCFWriterWhenNotFileWriteOCFHeader(t *testing.T) { _, err := NewOCFWriter(OCFConfig{ W: ShortWriter(new(bytes.Buffer), 3), CompressionName: "null", Schema: `{"type":"int"}`}, ) ensureError(t, err, "cannot write OCF header", "short write") } func TestNewOCFWriterWhenFileEmpty(t *testing.T) { // NOTE: When given an empty file, NewOCFWriter ought to behave exactly as // if it's merely given an non-file io.Writer. fh, err := os.OpenFile("fixtures/temp0.avro", os.O_CREATE|os.O_RDWR|os.O_TRUNC, 0666) if err != nil { t.Fatal(err) } _, err = NewOCFWriter(OCFConfig{ W: fh, CompressionName: "*invalid*", Schema: `{"type":"int"}`}, ) ensureError(t, err, "cannot create OCFWriter", "unrecognized compression algorithm") } func TestNewOCFWriterWhenFileNotEmptyWhenCannotReadOCFHeader(t *testing.T) { fh, err := os.Open("fixtures/bad-header.avro") if err != nil { t.Fatal(err) } _, err = NewOCFWriter(OCFConfig{ W: fh, CompressionName: "*invalid*", Schema: `{"type":"int"}`}, ) ensureError(t, err, "cannot create OCFWriter", "cannot read OCF header") } func testNewOCFWriterWhenFile(t *testing.T, pathname string, expected ...string) { t.Helper() fh, err := os.Open(pathname) if err != nil { t.Fatal(err) } defer func() { if err := fh.Close(); err != nil { t.Fatal(err) } }() _, err = NewOCFWriter(OCFConfig{W: fh}) ensureError(t, err, append([]string{"cannot create OCFWriter"}, expected...)...) } func TestNewOCFWriterWhenFileNotEmptyWhenCannotQuickScanToTail(t *testing.T) { testNewOCFWriterWhenFile(t, "fixtures/firstBlockCountNotGreaterThanZero.avro", "block count is not greater") testNewOCFWriterWhenFile(t, "fixtures/blockCountExceedsMaxBlockCount.avro", "block count exceeds") testNewOCFWriterWhenFile(t, "fixtures/cannotReadBlockSize.avro", "cannot read block size") testNewOCFWriterWhenFile(t, "fixtures/blockSizeNotGreaterThanZero.avro", "block size is not greater than 0") testNewOCFWriterWhenFile(t, "fixtures/blockSizeExceedsMaxBlockSize.avro", "block size exceeds") testNewOCFWriterWhenFile(t, "fixtures/cannotDiscardBlockBytes.avro", "cannot seek to next block", "EOF") testNewOCFWriterWhenFile(t, "fixtures/cannotReadSyncMarker.avro", "cannot read sync marker", "EOF") testNewOCFWriterWhenFile(t, "fixtures/syncMarkerMismatch.avro", "sync marker mismatch") testNewOCFWriterWhenFile(t, "fixtures/secondBlockCountZero.avro", "block count is not greater") } func TestNewOCFWriterWhenFileNotEmptyWhenProvidedDifferentCompressionAndSchema(t *testing.T) { createTestFile(t, "fixtures/temp1.avro", []byte("Obj\x01\x04\x14avro.codec\x0edeflate\x16avro.schema\x1e{\"type\":\"long\"}\x000123456789abcdef\x02\x04ab0123456789abcdef")) fh, err := os.Open("fixtures/temp1.avro") if err != nil { t.Fatal(err) } defer func() { if err := fh.Close(); err != nil { t.Fatal(err) } }() ocfw, err := NewOCFWriter(OCFConfig{ W: fh, Schema: `{"type":"int"}`, CompressionName: "null", }) if err != nil { t.Fatal(err) } if actual, expected := ocfw.Codec().Schema(), `{"type":"long"}`; actual != expected { t.Errorf("GOT: %v; WANT: %v", actual, expected) } if actual, expected := ocfw.CompressionName(), CompressionDeflateLabel; actual != expected { t.Errorf("GOT: %v; WANT: %v", actual, expected) } } func TestOCFWriterAppendWhenCannotWrite(t *testing.T) { testPathname := "fixtures/temp2.avro" createTestFile(t, testPathname, []byte("Obj\x01\x02\x16avro.schema\x1e{\"type\":\"long\"}\x000123456789abcdef")) appender, err := os.OpenFile(testPathname, os.O_RDONLY, 0666) // open for read only will cause expected error when attempt to append if err != nil { t.Fatal(err) } defer func(ioc io.Closer) { if err := ioc.Close(); err != nil { t.Fatal(err) } }(appender) ocfw, err := NewOCFWriter(OCFConfig{W: appender}) if err != nil { t.Fatal(err) } err = ocfw.Append([]interface{}{13, 42}) ensureError(t, err, testPathname) } func TestOCFWriterAppendSomeItemsToNothing(t *testing.T) { testPathname := "fixtures/temp3.avro" createTestFile(t, testPathname, []byte("Obj\x01\x02\x16avro.schema\x1e{\"type\":\"long\"}\x000123456789abcdef")) appender, err := os.OpenFile(testPathname, os.O_RDWR, 0666) if err != nil { t.Fatal(err) } defer func(ioc io.Closer) { if err := ioc.Close(); err != nil { t.Fatal(err) } }(appender) ocfw, err := NewOCFWriter(OCFConfig{W: appender}) if err != nil { t.Fatal(err) } if err = ocfw.Append([]interface{}{13, 42}); err != nil { t.Fatal(err) } // let's make sure data is there reader, err := os.Open(testPathname) if err != nil { t.Fatal(err) } defer func(ioc io.Closer) { if err := ioc.Close(); err != nil { t.Fatal(err) } }(reader) ocfr, err := NewOCFReader(reader) if err != nil { t.Fatal(err) } var values []int64 for ocfr.Scan() { value, err := ocfr.Read() if err != nil { t.Fatal(err) } values = append(values, value.(int64)) } if err := ocfr.Err(); err != nil { t.Fatal(err) } if actual, expected := len(values), 2; actual != expected { t.Errorf("GOT: %v; WANT: %v", actual, expected) } if actual, expected := values[0], int64(13); actual != expected { t.Errorf("GOT: %v; WANT: %v", actual, expected) } if actual, expected := values[1], int64(42); actual != expected { t.Errorf("GOT: %v; WANT: %v", actual, expected) } } func TestOCFWriterAppendSomeItemsToSomeItems(t *testing.T) { testPathname := "fixtures/temp4.avro" createTestFile(t, testPathname, []byte("Obj\x01\x02\x16avro.schema\x1e{\"type\":\"long\"}\x000123456789abcdef\x04\x04\x1a\x540123456789abcdef")) appender, err := os.OpenFile(testPathname, os.O_RDWR, 0666) if err != nil { t.Fatal(err) } defer func(ioc io.Closer) { if err := ioc.Close(); err != nil { t.Fatal(err) } }(appender) ocfw, err := NewOCFWriter(OCFConfig{W: appender}) if err != nil { t.Fatal(err) } if err = ocfw.Append([]interface{}{-10, -100}); err != nil { t.Fatal(err) } // let's make sure data is there reader, err := os.Open(testPathname) if err != nil { t.Fatal(err) } defer func(ioc io.Closer) { if err := ioc.Close(); err != nil { t.Fatal(err) } }(reader) ocfr, err := NewOCFReader(reader) if err != nil { t.Fatal(err) } var values []int64 for ocfr.Scan() { value, err := ocfr.Read() if err != nil { t.Fatal(err) } values = append(values, value.(int64)) } if err := ocfr.Err(); err != nil { t.Fatal(err) } if actual, expected := len(values), 4; actual != expected { t.Fatalf("GOT: %v; WANT: %v", actual, expected) } if actual, expected := values[0], int64(13); actual != expected { t.Errorf("GOT: %v; WANT: %v", actual, expected) } if actual, expected := values[1], int64(42); actual != expected { t.Errorf("GOT: %v; WANT: %v", actual, expected) } if actual, expected := values[2], int64(-10); actual != expected { t.Errorf("GOT: %v; WANT: %v", actual, expected) } if actual, expected := values[3], int64(-100); actual != expected { t.Errorf("GOT: %v; WANT: %v", actual, expected) } } goavro-2.10.1/rabin.go000066400000000000000000000211301412474230400145100ustar00rootroot00000000000000package goavro import ( "encoding/binary" "fmt" "io" ) // rabinEmpty is a constant used to initialize the crc64Table, and to compute // the CRC-64-AVRO fingerprint of every object schema. const rabinEmpty = uint64(0xc15d213aa4d7a795) // rabinTable is never modified after initialization but its values are read to // compute the CRC-64-AVRO fingerprint of every schema its given. var rabinTable = [256]uint64{ 0, 3238593523956797946, 6477187047913595892, 8435907220062204430, 12954374095827191784, 11472609148414072338, 16871814440124408860, 14327483619285186022, 16515860097293205755, 14539261057490653441, 13607494391182877455, 10387063993012335349, 6265406319754774291, 8791864835633305321, 1085550678754862311, 2585467722461443357, 5247393906202824413, 7215812591205457703, 1239030555549527337, 4449591751341063379, 18092457712352332085, 15556728100436498639, 11742789833002527425, 10234164645493242683, 12530812639509548582, 9302088354573213660, 17583729671266610642, 15633189885995973672, 2171101357509724622, 3661574416647526452, 5170935444922886714, 7724537325157989312, 10494787812405648826, 13642865964979244096, 14431625182410915406, 16480541316673728436, 2478061111099054674, 1049933365183482792, 8899183502682126758, 6300970840149272668, 8399466921467862337, 6368420890995002555, 3275086581351513781, 108854135608684367, 14364169659802000041, 16980263386864569171, 11435870349096892765, 12845837170396948647, 15669858317114364775, 17692196227407282845, 9265331945857609875, 12422293323479818601, 7688114635962061967, 5062151678603773301, 3698085083440658299, 2279937883717887617, 4342202715019449244, 1203395666939462246, 7323148833295052904, 5282940851558637970, 10341870889845773428, 11778178981837571470, 15449074650315978624, 18057156506771531386, 11669866394404287583, 10160817855121008037, 17874829710049597355, 15339802717267265105, 1311848476550706103, 4523114428088083021, 5464845951130112067, 7432843562972398009, 4956122222198109348, 7509300761534850398, 2099866730366965584, 3591042414950500010, 17798367005364253516, 15848531969535615670, 12601941680298545336, 9372796311334617410, 16798933842935724674, 14253900473960229752, 12736841781990005110, 11255500115345754252, 6550173162703027562, 8509314479008689296, 217708271217368734, 3455596968422674276, 870833084869474937, 2370047569572014979, 6194214610827729293, 8721096401170761847, 13822387873690697105, 10602378625989962859, 16587157392570359397, 14609853536892473247, 3483332339477899749, 2064482512161650719, 7616958077116566033, 4991418462803860459, 9480190278288059917, 12637572737790640119, 15741190762473065977, 17762823925471730691, 15376229271924123934, 17983608511393921252, 10124303357207546602, 11561034798826117904, 7396170166881316598, 5356383260452470540, 4559875767435775234, 1420363961462201592, 8684405430038898488, 6085769495188764354, 2406791333878924492, 979366144819647798, 14646297666590105808, 16695918618875998506, 10565881703117275940, 13713538703073841886, 11362911691697612739, 12772455230081578553, 14146576876296094775, 16763373153642681805, 3347869283551649835, 182341662412566993, 8616954185191982047, 6585487012709290533, 13933329357911598997, 17126321439046432367, 11006435164953838689, 12992741788688209307, 8257930048646602877, 6803747195591438727, 3132703159877387145, 542775339377431155, 2623696953101412206, 619515277774763668, 9046228856176166042, 5871394916501263712, 10929691902260224134, 13501751302614184316, 14865687125944796018, 16338017159720129160, 9912244444396218696, 11925134239902742706, 15018601523069700796, 18202706530865158982, 4199733460733931168, 1637543290675756890, 7182084829901000020, 5717935174548446382, 7834929158557182387, 4632665972928804937, 3844057317981030983, 1849042541720329149, 16103865201353027163, 17549867708331900833, 9700748483321744815, 12280807109898935381, 5834933197202143791, 8937414855024798677, 655924238275353051, 2732422975565056033, 16374796089197559239, 14974255385173568573, 13465025131935292979, 10821211621719183305, 13100346325406055124, 11041713811386575662, 17018628958017378592, 13897997918303815898, 435416542434737468, 3097107305413864646, 6911193936845348552, 8293578696285179698, 1741666169738949874, 3808479038558283016, 4740095139144029958, 7870595381236532988, 12388429221655458586, 9736009554713699040, 17442192802341523694, 16068516186704462100, 18239503069743100937, 15127152172900050419, 11888425678624364541, 9803746554456753671, 5681455845848806369, 7073288438148047387, 1673934641775824917, 4308477092595991023, 6966664678955799498, 5503217582476919344, 4128965024323301438, 1566351579938693572, 15233916154233132066, 18417600011429070296, 9982836925607720918, 11996431537128302124, 9627165335515697969, 12207926510359495371, 15886756170769674437, 17332335396841578815, 3917464579278591193, 1922028658990515491, 8051932600676513581, 4850374241660872407, 2917466598601071895, 327962119137676525, 8187398044598779619, 6732512565967646489, 11221777246008269567, 13207379120439233285, 14004037317153847563, 17197450482186430705, 14792340333762633196, 16265093719173729302, 10712766520904941080, 13284123302255603682, 9119751534871550468, 5944212839312182270, 2840727922924403184, 836967320887912458, 17368810860077796976, 15995557527495450506, 12171538990377528708, 9518416773021940862, 4813582667757848984, 7943378085384837218, 1958732289639295596, 4025966300338256790, 1458733299300535947, 4093699022299389809, 5610888623004134783, 7002018658576923781, 12103802978479819107, 10018419036150929561, 18310175810188503703, 15198246066092718957, 13391477134206599341, 10748366240846565719, 16157651908532642649, 14756687855020634787, 729366649650267973, 2805444311502067391, 6051901489239909553, 9155087905094251851, 6695738567103299670, 8078825954266321324, 364683324825133986, 3025950744619954776, 17233908370383964094, 14112856248920397380, 13170974025418581066, 11113046258555286960, } // rabin returns an unsigned 64-bit integer Rabin fingerprint for buf. NOTE: // This is only used during Codec instantiation to calculate the Rabin // fingerprint of the canonical schema. func rabin(buf []byte) uint64 { fp := rabinEmpty for i := 0; i < len(buf); i++ { fp = (fp >> 8) ^ rabinTable[(byte(fp)^buf[i])&0xff] // unsigned right shift >>> } return fp } const soeMagicPrefix = 2 // 2-byte prefix for SOE encoded data const soeHeaderLen = soeMagicPrefix + 8 // 2-byte prefix plus 8-byte fingerprint // FingerprintFromSOE returns the unsigned 64-bit Rabin fingerprint from the // header of a buffer that encodes a Single-Object Encoded datum. This function // is designed to be used to lookup a Codec that can decode the contents of the // buffer. Once a Codec is found that has the matching Rabin fingerprint, its // NativeFromBinary method may be used to decode the remaining bytes returned as // the second return value. On failure this function returns an // ErrNotSingleObjectEncoded error. // // func decode(codex map[uint64]*goavro.Codec, buf []byte) error { // // Perform a sanity check on the buffer, then return the Rabin fingerprint // // of the schema used to encode the data. // fingerprint, newBuf, err := goavro.FingerprintFromSOE(buf) // if err != nil { // return err // } // // // Get a previously stored Codec from the codex map. // codec, ok := codex[fingerprint] // if !ok { // return fmt.Errorf("unknown codec: %#x", fingerprint) // } // // // Use the fetched Codec to decode the buffer as a SOE. // // // // Faster because SOE magic prefix and schema fingerprint already // // checked and used to fetch the Codec. Just need to decode the binary // // bytes remaining after the prefix were removed. // datum, _, err := codec.NativeFromBinary(newBuf) // if err != nil { // return err // } // // _, err = fmt.Println(datum) // return err // } func FingerprintFromSOE(buf []byte) (uint64, []byte, error) { if len(buf) < soeHeaderLen { // Not enough bytes to encode schema fingerprint. return 0, nil, ErrNotSingleObjectEncoded(io.ErrShortBuffer.Error()) } if buf[0] != 0xC3 || buf[1] != 0x01 { // Currently only one SOE prefix is recognized. return 0, nil, ErrNotSingleObjectEncoded(fmt.Sprintf("unknown SOE prefix: %#x", buf[:soeMagicPrefix])) } // Only recognizes single-object encodings format version 1. return binary.LittleEndian.Uint64(buf[soeMagicPrefix:]), buf[soeHeaderLen:], nil } goavro-2.10.1/rabin_test.go000066400000000000000000000006511412474230400155540ustar00rootroot00000000000000package goavro import ( "testing" ) func TestRabin(t *testing.T) { t.Run("int", func(t *testing.T) { if got, want := rabin([]byte(`"int"`)), uint64(0x7275d51a3f395c8f); got != want { t.Errorf("GOT: %#x; WANT: %#x", got, want) } }) t.Run("string", func(t *testing.T) { if got, want := rabin([]byte(`"string"`)), uint64(0x8f014872634503c7); got != want { t.Errorf("GOT: %#x; WANT: %#x", got, want) } }) } goavro-2.10.1/race_test.go000066400000000000000000000103711412474230400153730ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "fmt" "sync" "testing" ) func TestRaceEncodeEncodeArray(t *testing.T) { codec, err := NewCodec(`{"type":"record","name":"record1","fields":[{"name":"field1","type":"array","items":"long"}]}`) if err != nil { t.Fatal(err) } var consumers, producers sync.WaitGroup consumers.Add(1) producers.Add(2) done := make(chan error, 10) go func() { defer consumers.Done() for err := range done { t.Error(err) } }() go func() { defer producers.Done() for i := 0; i < 10000; i++ { if _, err := codec.BinaryFromNative(nil, map[string]interface{}{"field1": []int{i}}); err != nil { done <- err return } } }() go func() { defer producers.Done() for i := 0; i < 10000; i++ { rec := map[string]interface{}{ "field1": []interface{}{i}, } if _, err := codec.BinaryFromNative(nil, rec); err != nil { done <- err return } } }() producers.Wait() close(done) consumers.Wait() } func TestRaceEncodeEncodeRecord(t *testing.T) { codec, err := NewCodec(`{"type":"record","name":"record1","fields":[{"type":"long","name":"field1"}]}`) if err != nil { t.Fatal(err) } var consumers, producers sync.WaitGroup consumers.Add(1) producers.Add(2) done := make(chan error, 10) go func() { defer consumers.Done() for err := range done { t.Error(err) } }() go func() { defer producers.Done() for i := 0; i < 10000; i++ { rec := map[string]interface{}{"field1": i} if _, err := codec.BinaryFromNative(nil, rec); err != nil { done <- err return } } }() go func() { defer producers.Done() for i := 0; i < 10000; i++ { rec := map[string]interface{}{"field1": i} if _, err := codec.BinaryFromNative(nil, rec); err != nil { done <- err return } } }() producers.Wait() close(done) consumers.Wait() } func TestRaceCodecConstructionDecode(t *testing.T) { codec, err := NewCodec(`{"type": "long"}`) if err != nil { t.Fatal(err) } comms := make(chan []byte, 1000) var consumers sync.WaitGroup consumers.Add(1) done := make(chan error, 10) go func() { defer consumers.Done() for err := range done { t.Error(err) } }() go func() { defer close(comms) for i := 0; i < 10000; i++ { // Completely unrelated stateful objects were causing races if i%100 == 0 { _, _ = NewCodec(`{"type": "long"}`) } buf, err := codec.BinaryFromNative(nil, i) if err != nil { done <- err return } comms <- buf } }() go func() { defer close(done) var i int64 for buf := range comms { datum, _, err := codec.NativeFromBinary(buf) if err != nil { done <- err return } result := datum.(int64) // Avro long values always decoded as int64 if result != i { done <- fmt.Errorf("GOT: %v; WANT: %v", result, i) return } i++ } }() consumers.Wait() } func TestRaceCodecConstruction(t *testing.T) { comms := make(chan []byte, 1000) done := make(chan error, 1000) go func() { defer close(comms) recordSchemaJSON := `{"type": "long"}` codec, err := NewCodec(recordSchemaJSON) if err != nil { done <- err return } for i := 0; i < 10000; i++ { buf, err := codec.BinaryFromNative(nil, i) if err != nil { done <- err return } comms <- buf } }() go func() { defer close(done) recordSchemaJSON := `{"type": "long"}` codec, err := NewCodec(recordSchemaJSON) if err != nil { done <- err return } var i int64 for encoded := range comms { decoded, _, err := codec.NativeFromBinary(encoded) if err != nil { done <- err return } result := decoded.(int64) // Avro long values always decoded as int64 if result != i { done <- fmt.Errorf("GOT: %v; WANT: %v", result, i) return } i++ } }() for err := range done { if err != nil { t.Fatal(err) } } } goavro-2.10.1/record.go000066400000000000000000000217701412474230400147050ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "fmt" ) func makeRecordCodec(st map[string]*Codec, enclosingNamespace string, schemaMap map[string]interface{}, cb *codecBuilder) (*Codec, error) { // NOTE: To support recursive data types, create the codec and register it // using the specified name, and fill in the codec functions later. c, err := registerNewCodec(st, schemaMap, enclosingNamespace) if err != nil { return nil, fmt.Errorf("Record ought to have valid name: %s", err) } fields, ok := schemaMap["fields"] if !ok { return nil, fmt.Errorf("Record %q ought to have fields key", c.typeName) } fieldSchemas, ok := fields.([]interface{}) if !ok || fieldSchemas == nil { return nil, fmt.Errorf("Record %q fields ought to be non-nil array: %v", c.typeName, fields) } codecFromFieldName := make(map[string]*Codec) codecFromIndex := make([]*Codec, len(fieldSchemas)) nameFromIndex := make([]string, len(fieldSchemas)) defaultValueFromName := make(map[string]interface{}, len(fieldSchemas)) for i, fieldSchema := range fieldSchemas { fieldSchemaMap, ok := fieldSchema.(map[string]interface{}) if !ok { return nil, fmt.Errorf("Record %q field %d ought to be valid Avro named type; received: %v", c.typeName, i+1, fieldSchema) } // NOTE: field names are not registered in the symbol table, because // field names are not individually addressable codecs. fieldCodec, err := buildCodec(st, c.typeName.namespace, fieldSchemaMap, cb) if err != nil { return nil, fmt.Errorf("Record %q field %d ought to be valid Avro named type: %s", c.typeName, i+1, err) } // However, when creating a full name for the field name, be sure to use // record's namespace n, err := newNameFromSchemaMap(c.typeName.namespace, fieldSchemaMap) if err != nil { return nil, fmt.Errorf("Record %q field %d ought to have valid name: %v", c.typeName, i+1, fieldSchemaMap) } fieldName := n.short() if _, ok := codecFromFieldName[fieldName]; ok { return nil, fmt.Errorf("Record %q field %d ought to have unique name: %q", c.typeName, i+1, fieldName) } if defaultValue, ok := fieldSchemaMap["default"]; ok { typeNameShort := fieldCodec.typeName.short() switch typeNameShort { case "boolean": v, ok := defaultValue.(bool) if !ok { return nil, fmt.Errorf("Record %q field %q: default value ought to encode using field schema: %s", c.typeName, fieldName, err) } defaultValue = bool(v) case "bytes": v, ok := defaultValue.(string) if !ok { return nil, fmt.Errorf("Record %q field %q: default value ought to encode using field schema: %s", c.typeName, fieldName, err) } defaultValue = []byte(v) case "double": v, ok := defaultValue.(float64) if !ok { return nil, fmt.Errorf("Record %q field %q: default value ought to encode using field schema: %s", c.typeName, fieldName, err) } defaultValue = float64(v) case "float": v, ok := defaultValue.(float64) if !ok { return nil, fmt.Errorf("Record %q field %q: default value ought to encode using field schema: %s", c.typeName, fieldName, err) } defaultValue = float32(v) case "int": v, ok := defaultValue.(float64) if !ok { return nil, fmt.Errorf("Record %q field %q: default value ought to encode using field schema: %s", c.typeName, fieldName, err) } defaultValue = int32(v) case "long": v, ok := defaultValue.(float64) if !ok { return nil, fmt.Errorf("Record %q field %q: default value ought to encode using field schema: %s", c.typeName, fieldName, err) } defaultValue = int64(v) case "string": v, ok := defaultValue.(string) if !ok { return nil, fmt.Errorf("Record %q field %q: default value ought to encode using field schema: %s", c.typeName, fieldName, err) } defaultValue = string(v) case "union": // When codec is union, then default value ought to encode using // first schema in union. NOTE: To support a null default // value, the string literal "null" must be coerced to a `nil` if defaultValue == "null" { defaultValue = nil } // NOTE: To support record field default values, union schema // set to the type name of first member // TODO: change to schemaCanonical below defaultValue = Union(fieldCodec.schemaOriginal, defaultValue) default: debug("fieldName: %q; type: %q; defaultValue: %T(%#v)\n", fieldName, c.typeName, defaultValue, defaultValue) } // attempt to encode default value using codec _, err = fieldCodec.binaryFromNative(nil, defaultValue) if err != nil { return nil, fmt.Errorf("Record %q field %q: default value ought to encode using field schema: %s", c.typeName, fieldName, err) } defaultValueFromName[fieldName] = defaultValue } nameFromIndex[i] = fieldName codecFromIndex[i] = fieldCodec codecFromFieldName[fieldName] = fieldCodec } c.binaryFromNative = func(buf []byte, datum interface{}) ([]byte, error) { valueMap, ok := datum.(map[string]interface{}) if !ok { return nil, fmt.Errorf("cannot encode binary record %q: expected map[string]interface{}; received: %T", c.typeName, datum) } // records encoded in order fields were defined in schema for i, fieldCodec := range codecFromIndex { fieldName := nameFromIndex[i] // NOTE: If field value was not specified in map, then set // fieldValue to its default value (which may or may not have been // specified). fieldValue, ok := valueMap[fieldName] if !ok { if fieldValue, ok = defaultValueFromName[fieldName]; !ok { return nil, fmt.Errorf("cannot encode binary record %q field %q: schema does not specify default value and no value provided", c.typeName, fieldName) } } var err error buf, err = fieldCodec.binaryFromNative(buf, fieldValue) if err != nil { return nil, fmt.Errorf("cannot encode binary record %q field %q: value does not match its schema: %s", c.typeName, fieldName, err) } } return buf, nil } c.nativeFromBinary = func(buf []byte) (interface{}, []byte, error) { recordMap := make(map[string]interface{}, len(codecFromIndex)) for i, fieldCodec := range codecFromIndex { name := nameFromIndex[i] var value interface{} var err error value, buf, err = fieldCodec.nativeFromBinary(buf) if err != nil { return nil, nil, fmt.Errorf("cannot decode binary record %q field %q: %s", c.typeName, name, err) } recordMap[name] = value } return recordMap, buf, nil } c.nativeFromTextual = func(buf []byte) (interface{}, []byte, error) { var mapValues map[string]interface{} var err error // NOTE: Setting `defaultCodec == nil` instructs genericMapTextDecoder // to return an error when a field name is not found in the // codecFromFieldName map. mapValues, buf, err = genericMapTextDecoder(buf, nil, codecFromFieldName) if err != nil { return nil, nil, fmt.Errorf("cannot decode textual record %q: %s", c.typeName, err) } if actual, expected := len(mapValues), len(codecFromFieldName); actual != expected { // set missing field keys to their respective default values, then // re-check number of keys for fieldName, defaultValue := range defaultValueFromName { if _, ok := mapValues[fieldName]; !ok { mapValues[fieldName] = defaultValue } } if actual, expected = len(mapValues), len(codecFromFieldName); actual != expected { return nil, nil, fmt.Errorf("cannot decode textual record %q: only found %d of %d fields", c.typeName, actual, expected) } } return mapValues, buf, nil } c.textualFromNative = func(buf []byte, datum interface{}) ([]byte, error) { // NOTE: Ensure only schema defined field names are encoded; and if // missing in datum, either use the provided field default value or // return an error. sourceMap, ok := datum.(map[string]interface{}) if !ok { return nil, fmt.Errorf("cannot encode textual record %q: expected map[string]interface{}; received: %T", c.typeName, datum) } destMap := make(map[string]interface{}, len(codecFromIndex)) for fieldName := range codecFromFieldName { fieldValue, ok := sourceMap[fieldName] if !ok { defaultValue, ok := defaultValueFromName[fieldName] if !ok { return nil, fmt.Errorf("cannot encode textual record %q field %q: schema does not specify default value and no value provided", c.typeName, fieldName) } fieldValue = defaultValue } destMap[fieldName] = fieldValue } datum = destMap // NOTE: Setting `defaultCodec == nil` instructs genericMapTextEncoder // to return an error when a field name is not found in the // codecFromFieldName map. return genericMapTextEncoder(buf, datum, nil, codecFromFieldName) } return c, nil } goavro-2.10.1/record_test.go000066400000000000000000000542101412474230400157370ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "bytes" "fmt" "testing" ) func TestRecordName(t *testing.T) { testSchemaInvalid(t, `{"type":"record","fields":[{"name":"name","type":"string"},{"name":"age","type":"int"}]}`, "Record ought to have valid name: schema ought to have name key") testSchemaInvalid(t, `{"type":"record","name":3}`, "Record ought to have valid name: schema name ought to be non-empty string") testSchemaInvalid(t, `{"type":"record","name":""}`, "Record ought to have valid name: schema name ought to be non-empty string") testSchemaInvalid(t, `{"type":"record","name":"&foo","fields":[{"name":"name","type":"string"},{"name":"age","type":"int"}]}`, "Record ought to have valid name: schema name ought to start with") testSchemaInvalid(t, `{"type":"record","name":"foo&","fields":[{"name":"name","type":"string"},{"name":"age","type":"int"}]}`, "Record ought to have valid name: schema name ought to have second and remaining") } func TestRecordFields(t *testing.T) { testSchemaInvalid(t, `{"type":"record","name":"r1"}`, `Record "r1" ought to have fields key`) testSchemaInvalid(t, `{"type":"record","name":"r1","fields":3}`, `Record "r1" fields ought to be non-nil array`) testSchemaInvalid(t, `{"type":"record","name":"r1","fields":null}`, `Record "r1" fields ought to be non-nil array`) } func TestRecordFieldInvalid(t *testing.T) { testSchemaInvalid(t, `{"type":"record","name":"r1","fields":[3]}`, `Record "r1" field 1 ought to be valid Avro named type`) testSchemaInvalid(t, `{"type":"record","name":"r1","fields":[""]}`, `Record "r1" field 1 ought to be valid Avro named type`) testSchemaInvalid(t, `{"type":"record","name":"r1","fields":[{}]}`, `Record "r1" field 1 ought to be valid Avro named type`) testSchemaInvalid(t, `{"type":"record","name":"r1","fields":[{"type":"int"}]}`, `Record "r1" field 1 ought to have valid name`) testSchemaInvalid(t, `{"type":"record","name":"r1","fields":[{"name":"f1"}]}`, `Record "r1" field 1 ought to be valid Avro named type`) testSchemaInvalid(t, `{"type":"record","name":"r1","fields":[{"name":"f1","type":"integer"}]}`, `Record "r1" field 1 ought to be valid Avro named type`) testSchemaInvalid(t, `{"type":"record","name":"r1","fields":[{"name":"f1","type":"int"},{"name":"f1","type":"long"}]}`, `Record "r1" field 2 ought to have unique name`) } func TestSchemaRecord(t *testing.T) { testSchemaValid(t, `{ "name": "person", "type": "record", "fields": [ { "name": "height", "type": "long" }, { "name": "weight", "type": "long" }, { "name": "name", "type": "string" } ] }`) } func TestSchemaRecordFieldWithDefaults(t *testing.T) { testSchemaValid(t, `{ "name": "person", "type": "record", "fields": [ { "name": "height", "type": "long" }, { "name": "weight", "type": "long" }, { "name": "name", "type": "string" }, { "name": "hacker", "type": "boolean", "default": false } ] }`) } func TestRecordDecodedEmptyBuffer(t *testing.T) { testBinaryDecodeFailShortBuffer(t, `{"type":"record","name":"foo","fields":[{"name":"field1","type":"int"}]}`, nil) } func TestRecordFieldTypeHasPrimitiveName(t *testing.T) { codec, err := NewCodec(`{ "type": "record", "name": "r1", "namespace": "com.example", "fields": [ { "name": "f1", "type": "string" }, { "name": "f2", "type": { "type": "int" } } ] }`) ensureError(t, err) datumIn := map[string]interface{}{ "f1": "thirteen", "f2": 13, } buf, err := codec.BinaryFromNative(nil, datumIn) ensureError(t, err) if expected := []byte{ 0x10, // field1 size = 8 't', 'h', 'i', 'r', 't', 'e', 'e', 'n', 0x1a, // field2 == 13 }; !bytes.Equal(buf, expected) { t.Errorf("GOT: %#v; WANT: %#v", buf, expected) } // round trip datumOut, buf, err := codec.NativeFromBinary(buf) ensureError(t, err) if actual, expected := len(buf), 0; actual != expected { t.Errorf("GOT: %#v; WANT: %#v", actual, expected) } datumOutMap, ok := datumOut.(map[string]interface{}) if !ok { t.Errorf("GOT: %#v; WANT: %#v", ok, true) } if actual, expected := len(datumOutMap), len(datumIn); actual != expected { t.Errorf("GOT: %#v; WANT: %#v", actual, expected) } for k, v := range datumIn { if actual, expected := fmt.Sprintf("%v", datumOutMap[k]), fmt.Sprintf("%v", v); actual != expected { t.Errorf("GOT: %#v; WANT: %#v", actual, expected) } } } func TestSchemaRecordRecursive(t *testing.T) { testSchemaValid(t, `{ "type": "record", "name": "recursive", "fields": [ { "name": "label", "type": "string" }, { "name": "children", "type": { "type": "array", "items": "recursive" } } ] }`) } func TestSchemaNamespaceRecursive(t *testing.T) { testSchemaValid(t, `{ "type": "record", "name": "Container", "namespace": "namespace1", "fields": [ { "name": "contained", "type": { "type": "record", "name": "MutuallyRecursive", "fields": [ { "name": "label", "type": "string" }, { "name": "children", "type": { "type": "array", "items": { "type": "record", "name": "MutuallyRecursive", "namespace": "namespace2", "fields": [ { "name": "value", "type": "int" }, { "name": "children", "type": { "type": "array", "items": "namespace1.MutuallyRecursive" } }, { "name": "morechildren", "type": { "type": "array", "items": "MutuallyRecursive" } } ] } } }, { "name": "anotherchild", "type": "namespace2.MutuallyRecursive" } ] } } ] }`) } func TestSchemaRecordNamespaceComposite(t *testing.T) { testSchemaValid(t, `{ "type": "record", "namespace": "x", "name": "Y", "fields": [ { "name": "e", "type": { "type": "record", "name": "Z", "fields": [ { "name": "f", "type": "x.Z" } ] } } ] }`) } func TestSchemaRecordNamespaceFullName(t *testing.T) { testSchemaValid(t, `{ "type": "record", "name": "x.Y", "fields": [ { "name": "e", "type": { "type": "record", "name": "Z", "fields": [ { "name": "f", "type": "x.Y" }, { "name": "g", "type": "x.Z" } ] } } ] }`) } func TestSchemaRecordNamespaceEnum(t *testing.T) { testSchemaValid(t, `{"type": "record", "name": "org.apache.avro.tests.Hello", "fields": [ {"name": "f1", "type": {"type": "enum", "name": "MyEnum", "symbols": ["Foo", "Bar", "Baz"]}}, {"name": "f2", "type": "org.apache.avro.tests.MyEnum"}, {"name": "f3", "type": "MyEnum"}, {"name": "f4", "type": {"type": "enum", "name": "other.namespace.OtherEnum", "symbols": ["one", "two", "three"]}}, {"name": "f5", "type": "other.namespace.OtherEnum"}, {"name": "f6", "type": {"type": "enum", "name": "ThirdEnum", "namespace": "some.other", "symbols": ["Alice", "Bob"]}}, {"name": "f7", "type": "some.other.ThirdEnum"} ]}`) } func TestSchemaRecordNamespaceFixed(t *testing.T) { testSchemaValid(t, `{"type": "record", "name": "org.apache.avro.tests.Hello", "fields": [ {"name": "f1", "type": {"type": "fixed", "name": "MyFixed", "size": 16}}, {"name": "f2", "type": "org.apache.avro.tests.MyFixed"}, {"name": "f3", "type": "MyFixed"}, {"name": "f4", "type": {"type": "fixed", "name": "other.namespace.OtherFixed", "size": 18}}, {"name": "f5", "type": "other.namespace.OtherFixed"}, {"name": "f6", "type": {"type": "fixed", "name": "ThirdFixed", "namespace": "some.other", "size": 20}}, {"name": "f7", "type": "some.other.ThirdFixed"} ]}`) } func TestRecordNamespace(t *testing.T) { c, err := NewCodec(`{ "type": "record", "name": "org.foo.Y", "fields": [ { "name": "X", "type": { "type": "fixed", "size": 4, "name": "fixed_4" } }, { "name": "Z", "type": { "type": "fixed_4" } } ] }`) ensureError(t, err) datumIn := map[string]interface{}{ "X": []byte("abcd"), "Z": []byte("efgh"), } buf, err := c.BinaryFromNative(nil, datumIn) ensureError(t, err) if expected := []byte("abcdefgh"); !bytes.Equal(buf, expected) { t.Errorf("GOT: %#v; WANT: %#v", buf, expected) } // round trip datumOut, buf, err := c.NativeFromBinary(buf) ensureError(t, err) if actual, expected := len(buf), 0; actual != expected { t.Errorf("GOT: %#v; WANT: %#v", actual, expected) } datumOutMap, ok := datumOut.(map[string]interface{}) if !ok { t.Errorf("GOT: %#v; WANT: %#v", ok, true) } if actual, expected := len(datumOutMap), len(datumIn); actual != expected { t.Errorf("GOT: %#v; WANT: %#v", actual, expected) } for k, v := range datumIn { if actual, expected := fmt.Sprintf("%s", datumOutMap[k]), fmt.Sprintf("%s", v); actual != expected { t.Errorf("GOT: %#v; WANT: %#v", actual, expected) } } } func TestRecordEncodeFail(t *testing.T) { schema := `{ "type": "record", "name": "r1", "fields": [ {"name": "f1", "type": "string"}, {"name": "f2", "type": "string"} ] }` testBinaryEncodeFail(t, schema, map[string]interface{}{"f1": "foo"}, `field "f2": schema does not specify default value and no value provided`) testBinaryEncodeFail(t, schema, map[string]interface{}{"f1": "foo", "f2": 13}, `field "f2": value does not match its schema`) } func TestRecordTextDecodeFail(t *testing.T) { schema := `{"name":"r1","type":"record","fields":[{"name":"string","type":"string"},{"name":"bytes","type":"bytes"}]}` testTextDecodeFail(t, schema, []byte(` "string" : "silly" , "bytes" : "silly" } `), "expected: '{'") testTextDecodeFail(t, schema, []byte(` { 16 : "silly" , "bytes" : "silly" } `), "expected initial \"") testTextDecodeFail(t, schema, []byte(` { "badName" : "silly" , "bytes" : "silly" } `), "cannot determine codec") testTextDecodeFail(t, schema, []byte(` { "string" , "silly" , "bytes" : "silly" } `), "expected: ':'") testTextDecodeFail(t, schema, []byte(` { "string" : 13 , "bytes" : "silly" } `), "expected initial \"") testTextDecodeFail(t, schema, []byte(` { "string" : "silly" : "bytes" : "silly" } `), "expected ',' or '}'") testTextDecodeFail(t, schema, []byte(` { "string" : "silly" , "bytes" : "silly" `), "short buffer") testTextDecodeFail(t, schema, []byte(` { "string" : "silly" `), "short buffer") testTextDecodeFail(t, schema, []byte(` { "string" : "silly" } `), "only found 1 of 2 fields") } func TestRecordTextCodecPass(t *testing.T) { silly := "⌘ " testTextEncodePass(t, `{"name":"r1","type":"record","fields":[{"name":"string","type":"string"}]}`, map[string]interface{}{"string": silly}, []byte(`{"string":"\u0001\u2318 "}`)) testTextEncodePass(t, `{"name":"r1","type":"record","fields":[{"name":"bytes","type":"bytes"}]}`, map[string]interface{}{"bytes": []byte(silly)}, []byte(`{"bytes":"\u0001\u00E2\u008C\u0098 "}`)) testTextDecodePass(t, `{"name":"r1","type":"record","fields":[{"name":"string","type":"string"},{"name":"bytes","type":"bytes"}]}`, map[string]interface{}{"string": silly, "bytes": []byte(silly)}, []byte(` { "string" : "\u0001\u2318 " , "bytes" : "\u0001\u00E2\u008C\u0098 " }`)) } func TestRecordFieldDefaultValue(t *testing.T) { testSchemaValid(t, `{"type":"record","name":"r1","fields":[{"name":"f1","type":"int","default":13}]}`) testSchemaValid(t, `{"type":"record","name":"r1","fields":[{"name":"f1","type":"string","default":"foo"}]}`) testSchemaInvalid(t, `{"type":"record","name":"r1","fields":[{"name":"f1","type":"int","default":"foo"}]}`, "default value ought to encode using field schema") } func TestRecordFieldUnionDefaultValue(t *testing.T) { testSchemaValid(t, `{"type":"record","name":"r1","fields":[{"name":"f1","type":["int","null"],"default":13}]}`) testSchemaValid(t, `{"type":"record","name":"r1","fields":[{"name":"f1","type":["null","int"],"default":null}]}`) } func TestRecordFieldUnionInvalidDefaultValue(t *testing.T) { testSchemaInvalid(t, `{"type":"record","name":"r1","fields":[{"name":"f1","type":["null","int"],"default":13}]}`, "default value ought to encode using field schema") testSchemaInvalid(t, `{"type":"record","name":"r1","fields":[{"name":"f1","type":["int","null"],"default":null}]}`, "default value ought to encode using field schema") } func TestRecordRecursiveRoundTrip(t *testing.T) { codec, err := NewCodec(` { "type": "record", "name": "LongList", "fields" : [ {"name": "next", "type": ["null", "LongList"], "default": null} ] } `) ensureError(t, err) // NOTE: May omit fields when using default value initial := `{"next":{"LongList":{}}}` // NOTE: Textual encoding will show all fields, even those with values that // match their default values final := `{"next":{"LongList":{"next":null}}}` // Convert textual Avro data (in Avro JSON format) to native Go form datum, _, err := codec.NativeFromTextual([]byte(initial)) ensureError(t, err) // Convert native Go form to binary Avro data buf, err := codec.BinaryFromNative(nil, datum) ensureError(t, err) // Convert binary Avro data back to native Go form datum, _, err = codec.NativeFromBinary(buf) ensureError(t, err) // Convert native Go form to textual Avro data buf, err = codec.TextualFromNative(nil, datum) ensureError(t, err) if actual, expected := string(buf), final; actual != expected { t.Fatalf("GOT: %v; WANT: %v", actual, expected) } } func ExampleRecordRecursiveRoundTrip() { codec, err := NewCodec(` { "type": "record", "name": "LongList", "fields" : [ {"name": "next", "type": ["null", "LongList"], "default": null} ] } `) if err != nil { fmt.Println(err) } // NOTE: May omit fields when using default value textual := []byte(`{"next":{"LongList":{"next":{"LongList":{}}}}}`) // Convert textual Avro data (in Avro JSON format) to native Go form native, _, err := codec.NativeFromTextual(textual) if err != nil { fmt.Println(err) } // Convert native Go form to binary Avro data binary, err := codec.BinaryFromNative(nil, native) if err != nil { fmt.Println(err) } // Convert binary Avro data back to native Go form native, _, err = codec.NativeFromBinary(binary) if err != nil { fmt.Println(err) } // Convert native Go form to textual Avro data textual, err = codec.TextualFromNative(nil, native) if err != nil { fmt.Println(err) } // NOTE: Textual encoding will show all fields, even those with values that // match their default values fmt.Println(string(textual)) // Output: {"next":{"LongList":{"next":{"LongList":{"next":null}}}}} } func ExampleBinaryFromNative() { codec, err := NewCodec(` { "type": "record", "name": "LongList", "fields" : [ {"name": "next", "type": ["null", "LongList"], "default": null} ] } `) if err != nil { fmt.Println(err) } // Convert native Go form to binary Avro data binary, err := codec.BinaryFromNative(nil, map[string]interface{}{ "next": map[string]interface{}{ "LongList": map[string]interface{}{ "next": map[string]interface{}{ "LongList": map[string]interface{}{ // NOTE: May omit fields when using default value }, }, }, }, }) if err != nil { fmt.Println(err) } fmt.Printf("%#v", binary) // Output: []byte{0x2, 0x2, 0x0} } func ExampleNativeFromBinary() { codec, err := NewCodec(` { "type": "record", "name": "LongList", "fields" : [ {"name": "next", "type": ["null", "LongList"], "default": null} ] } `) if err != nil { fmt.Println(err) } // Convert native Go form to binary Avro data binary := []byte{0x2, 0x2, 0x0} native, _, err := codec.NativeFromBinary(binary) if err != nil { fmt.Println(err) } fmt.Printf("%v", native) // Output: map[next:map[LongList:map[next:map[LongList:map[next:]]]]] } func ExampleNativeFromTextual() { codec, err := NewCodec(` { "type": "record", "name": "LongList", "fields" : [ {"name": "next", "type": ["null", "LongList"], "default": null} ] } `) if err != nil { fmt.Println(err) } // Convert native Go form to text Avro data text := []byte(`{"next":{"LongList":{"next":{"LongList":{"next":null}}}}}`) native, _, err := codec.NativeFromTextual(text) if err != nil { fmt.Println(err) } fmt.Printf("%v", native) // Output: map[next:map[LongList:map[next:map[LongList:map[next:]]]]] } func ExampleTextualFromNative() { codec, err := NewCodec(` { "type": "record", "name": "LongList", "fields" : [ {"name": "next", "type": ["null", "LongList"], "default": null} ] } `) if err != nil { fmt.Println(err) } // Convert native Go form to text Avro data text, err := codec.TextualFromNative(nil, map[string]interface{}{ "next": map[string]interface{}{ "LongList": map[string]interface{}{ "next": map[string]interface{}{ "LongList": map[string]interface{}{ // NOTE: May omit fields when using default value }, }, }, }, }) if err != nil { fmt.Println(err) } fmt.Printf("%s", text) // Output: {"next":{"LongList":{"next":{"LongList":{"next":null}}}}} } func TestRecordFieldFixedDefaultValue(t *testing.T) { testSchemaValid(t, `{"type": "record", "name": "r1", "fields":[{"name": "f1", "type": {"type": "fixed", "name": "someFixed", "size": 1}, "default": "\u0000"}]}`) } func TestRecordFieldDefaultValueTypes(t *testing.T) { t.Run("success", func(t *testing.T) { codec, err := NewCodec(`{"type": "record", "name": "r1", "fields":[{"name": "someBoolean", "type": "boolean", "default": true},{"name": "someBytes", "type": "bytes", "default": "0"},{"name": "someDouble", "type": "double", "default": 0},{"name": "someFloat", "type": "float", "default": 0},{"name": "someInt", "type": "int", "default": 0},{"name": "someLong", "type": "long", "default": 0},{"name": "someString", "type": "string", "default": "0"}]}`) ensureError(t, err) r1, _, err := codec.NativeFromTextual([]byte("{}")) ensureError(t, err) r1m := r1.(map[string]interface{}) someBoolean := r1m["someBoolean"] if _, ok := someBoolean.(bool); !ok { t.Errorf("GOT: %T; WANT: []byte", someBoolean) } someBytes := r1m["someBytes"] if _, ok := someBytes.([]byte); !ok { t.Errorf("GOT: %T; WANT: []byte", someBytes) } someDouble := r1m["someDouble"] if _, ok := someDouble.(float64); !ok { t.Errorf("GOT: %T; WANT: float64", someDouble) } someFloat := r1m["someFloat"] if _, ok := someFloat.(float32); !ok { t.Errorf("GOT: %T; WANT: float32", someFloat) } someInt := r1m["someInt"] if _, ok := someInt.(int32); !ok { t.Errorf("GOT: %T; WANT: int32", someInt) } someLong := r1m["someLong"] if _, ok := someLong.(int64); !ok { t.Errorf("GOT: %T; WANT: int64", someLong) } someString := r1m["someString"] if _, ok := someString.(string); !ok { t.Errorf("GOT: %T; WANT: string", someString) } }) t.Run("provided default is wrong type", func(t *testing.T) { t.Run("long", func(t *testing.T) { _, err := NewCodec(`{"type": "record", "name": "r1", "fields":[{"name": "someLong", "type": "long", "default": "0"},{"name": "someInt", "type": "int", "default": 0},{"name": "someFloat", "type": "float", "default": 0},{"name": "someDouble", "type": "double", "default": 0}]}`) ensureError(t, err, "field schema") }) t.Run("int", func(t *testing.T) { _, err := NewCodec(`{"type": "record", "name": "r1", "fields":[{"name": "someLong", "type": "long", "default": 0},{"name": "someInt", "type": "int", "default": "0"},{"name": "someFloat", "type": "float", "default": 0},{"name": "someDouble", "type": "double", "default": 0}]}`) ensureError(t, err, "field schema") }) t.Run("float", func(t *testing.T) { _, err := NewCodec(`{"type": "record", "name": "r1", "fields":[{"name": "someLong", "type": "long", "default": 0},{"name": "someInt", "type": "int", "default": 0},{"name": "someFloat", "type": "float", "default": "0"},{"name": "someDouble", "type": "double", "default": 0}]}`) ensureError(t, err, "field schema") }) t.Run("double", func(t *testing.T) { _, err := NewCodec(`{"type": "record", "name": "r1", "fields":[{"name": "someLong", "type": "long", "default": 0},{"name": "someInt", "type": "int", "default": 0},{"name": "someFloat", "type": "float", "default": 0},{"name": "someDouble", "type": "double", "default": "0"}]}`) ensureError(t, err, "field schema") }) }) t.Run("union of int and long", func(t *testing.T) { t.Skip("FIXME: should encode default value as int64 rather than float64") codec, err := NewCodec(`{"type":"record","name":"r1","fields":[{"name":"f1","type":["int","long"],"default":13}]}`) ensureError(t, err) r1, _, err := codec.NativeFromTextual([]byte("{}")) ensureError(t, err) r1m := r1.(map[string]interface{}) someUnion := r1m["f1"] someMap, ok := someUnion.(map[string]interface{}) if !ok { t.Fatalf("GOT: %T; WANT: map[string]interface{}", someUnion) } if got, want := len(someMap), 1; got != want { t.Errorf("GOT: %v; WANT: %v", got, want) } t.Logf("someMap: %#v", someMap) for k, v := range someMap { // The "int" type is the first type option of the union. if got, want := k, "int"; got != want { t.Errorf("GOT: %v; WANT: %v", got, want) } switch tv := v.(type) { case int64: if got, want := tv, int64(13); got != want { t.Errorf("GOT: %v; WANT: %v", got, want) } default: t.Errorf("GOT: %T; WANT: int64", v) } } }) } goavro-2.10.1/schema_test.go000066400000000000000000000165311412474230400157250ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "fmt" "testing" ) func testSchemaPrimativeCodec(t *testing.T, primitiveTypeName string) { t.Helper() if _, err := NewCodec(primitiveTypeName); err != nil { t.Errorf("Bare primitive type: Schema: %q; Actual: %#v; Expected: %#v", primitiveTypeName, err, nil) } full := fmt.Sprintf(`{"type":%s}`, primitiveTypeName) if _, err := NewCodec(full); err != nil { t.Errorf("Full primitive type: Schema: %q; Actual: %#v; Expected: %#v", full, err, nil) } extra := fmt.Sprintf(`{"type":%s,"ignoredKey":"ignoredValue"}`, primitiveTypeName) if _, err := NewCodec(extra); err != nil { t.Errorf("Full primitive type with extra attributes: Schema: %q; Actual: %#v; Expected: %#v", extra, err, nil) } } func testSchemaInvalid(t *testing.T, schema, errorMessage string) { t.Helper() _, err := NewCodec(schema) ensureError(t, err, errorMessage) } func testSchemaValid(t *testing.T, schema string) { t.Helper() _, err := NewCodec(schema) if err != nil { t.Errorf("GOT: %v; WANT: %v", err, nil) } } func TestSchemaFailInvalidType(t *testing.T) { testSchemaInvalid(t, `{"type":"flubber"}`, "unknown type name") } func TestSchemaWeather(t *testing.T) { testSchemaValid(t, ` {"type": "record", "name": "test.Weather", "doc": "A weather reading.", "fields": [ {"name": "station", "type": "string", "order": "ignore"}, {"name": "time", "type": "long"}, {"name": "temp", "type": "int"} ] } `) } func TestSchemaFooBarSpecificRecord(t *testing.T) { testSchemaValid(t, ` { "type": "record", "name": "FooBarSpecificRecord", "namespace": "org.apache.avro", "fields": [ {"name": "id", "type": "int"}, {"name": "name", "type": "string"}, {"name": "nicknames", "type": {"type": "array", "items": "string"}}, {"name": "relatedids", "type": {"type": "array", "items": "int"}}, {"name": "typeEnum", "type": ["null", { "type": "enum", "name": "TypeEnum", "namespace": "org.apache.avro", "symbols" : ["a","b", "c"] }], "default": null } ] } `) } func TestSchemaInterop(t *testing.T) { testSchemaValid(t, ` {"type": "record", "name":"Interop", "namespace": "org.apache.avro", "fields": [ {"name": "intField", "type": "int"}, {"name": "longField", "type": "long"}, {"name": "stringField", "type": "string"}, {"name": "boolField", "type": "boolean"}, {"name": "floatField", "type": "float"}, {"name": "doubleField", "type": "double"}, {"name": "bytesField", "type": "bytes"}, {"name": "nullField", "type": "null"}, {"name": "arrayField", "type": {"type": "array", "items": "double"}}, {"name": "mapField", "type": {"type": "map", "values": {"type": "record", "name": "Foo", "fields": [{"name": "label", "type": "string"}]}}}, {"name": "unionField", "type": ["boolean", "double", {"type": "array", "items": "bytes"}]}, {"name": "enumField", "type": {"type": "enum", "name": "Kind", "symbols": ["A","B","C"]}}, {"name": "fixedField", "type": {"type": "fixed", "name": "MD5", "size": 16}}, {"name": "recordField", "type": {"type": "record", "name": "Node", "fields": [ {"name": "label", "type": "string"}, {"name": "children", "type": {"type": "array", "items": "Node"}}]}} ] } `) } func TestSchemaFixedNameCanBeUsedLater(t *testing.T) { schema := `{"type":"record","name":"record1","fields":[ {"name":"field1","type":{"type":"fixed","name":"fixed_4","size":4}}, {"name":"field2","type":"fixed_4"}]}` datum := map[string]interface{}{ "field1": []byte("abcd"), "field2": []byte("efgh"), } testBinaryEncodePass(t, schema, datum, []byte("abcdefgh")) } // func ExampleCodecSchema() { // schema := `{"type":"map","values":{"type":"enum","name":"foo","symbols":["alpha","bravo"]}}` // codec, err := NewCodec(schema) // if err != nil { // fmt.Println(err) // } // fmt.Println(codec.Schema()) // // Output: {"type":"map","values":{"name":"foo","type":"enum","symbols":["alpha","bravo"]}} // } func TestMapValueTypeEnum(t *testing.T) { schema := `{"type":"map","values":{"type":"enum","name":"foo","symbols":["alpha","bravo"]}}` datum := map[string]interface{}{"someKey": "bravo"} expected := []byte{ 0x2, // blockCount = 1 pair 0xe, // key size = 7 's', 'o', 'm', 'e', 'K', 'e', 'y', 0x2, // value = index 1 ("bravo") 0, // blockCount = 0 pairs } testBinaryCodecPass(t, schema, datum, expected) } func TestMapValueTypeRecord(t *testing.T) { schema := `{"type":"map","values":{"type":"record","name":"foo","fields":[{"name":"field1","type":"string"},{"name":"field2","type":"int"}]}}` datum := map[string]interface{}{ "map-key": map[string]interface{}{ "field1": "unlucky", "field2": 13, }, } expected := []byte{ 0x2, // blockCount = 1 key-value pair in top level map 0xe, // first key size = 7 'm', 'a', 'p', '-', 'k', 'e', 'y', // first key = "map-key" // this key's value is a record, which is encoded by concatenated its field values 0x0e, // field one string size = 7 'u', 'n', 'l', 'u', 'c', 'k', 'y', 0x1a, // 13 0, // map has no more blocks } // cannot decode because order of map key enumeration random, and records // are returned as a Go map testBinaryEncodePass(t, schema, datum, expected) } func TestDefaultValueOughtToEncodeUsingFieldSchemaOK(t *testing.T) { testSchemaValid(t, ` { "namespace": "universe.of.things", "type": "record", "name": "Thing", "fields": [ { "name": "attributes", "type": [ "null", { "type": "array", "items": { "namespace": "universe.of.things", "type": "record", "name": "attribute", "fields": [ { "name": "name", "type": "string" }, { "name": "value", "type": "string" } ] } } ], "default": "null" } ] }`) } func TestUnionOfRecordsDefaultValueOughtToEncodeUsingFieldSchemaOK(t *testing.T) { testSchemaValid(t, ` { "type": "record", "name": "Thing", "namespace": "universe.of.things", "fields": [ { "name": "layout", "type": [ { "type": "record", "name": "AnotherThing", "namespace": "another.universe.of.things", "fields": [ { "name": "text", "type": "string", "default": "someText" } ] }, { "type": "record", "name": "AnotherThing2", "namespace": "another.universe.of.things", "fields": [ { "name": "text", "type": "string", "default": "someOtherText" } ] } ], "default": { "another.universe.of.things.AnotherThing": { "text": "someDefaultText" } } } ] }`) } goavro-2.10.1/text.go000066400000000000000000000025131412474230400144050ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "fmt" "io" "unicode" ) // advanceAndConsume advances to non whitespace and returns an error if the next // non whitespace byte is not what is expected. func advanceAndConsume(buf []byte, expected byte) ([]byte, error) { var err error if buf, err = advanceToNonWhitespace(buf); err != nil { return nil, err } if actual := buf[0]; actual != expected { return nil, fmt.Errorf("expected: %q; actual: %q", expected, actual) } return buf[1:], nil } // advanceToNonWhitespace consumes bytes from buf until non-whitespace character // is found. It returns error when no more bytes remain, because its purpose is // to scan ahead to the next non-whitespace character. func advanceToNonWhitespace(buf []byte) ([]byte, error) { for i, b := range buf { if !unicode.IsSpace(rune(b)) { return buf[i:], nil } } return nil, io.ErrShortBuffer } goavro-2.10.1/text_test.go000066400000000000000000000172271412474230400154540ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "bytes" "fmt" "math" "testing" ) func testTextDecodeFail(t *testing.T, schema string, buf []byte, errorMessage string) { t.Helper() c, err := NewCodec(schema) if err != nil { t.Fatal(err) } value, newBuffer, err := c.NativeFromTextual(buf) ensureError(t, err, errorMessage) if value != nil { t.Errorf("GOT: %v; WANT: %v", value, nil) } if !bytes.Equal(buf, newBuffer) { t.Errorf("GOT: %v; WANT: %v", newBuffer, buf) } } func testTextEncodeFail(t *testing.T, schema string, datum interface{}, errorMessage string) { t.Helper() c, err := NewCodec(schema) if err != nil { t.Fatal(err) } buf, err := c.TextualFromNative(nil, datum) ensureError(t, err, errorMessage) if buf != nil { t.Errorf("GOT: %v; WANT: %v", buf, nil) } } func testTextEncodeFailBadDatumType(t *testing.T, schema string, datum interface{}) { t.Helper() testTextEncodeFail(t, schema, datum, "received: ") } func testTextDecodeFailShortBuffer(t *testing.T, schema string, buf []byte) { t.Helper() testTextDecodeFail(t, schema, buf, "short buffer") } func testTextDecodePass(t *testing.T, schema string, datum interface{}, encoded []byte) { t.Helper() codec, err := NewCodec(schema) if err != nil { t.Fatalf("schema: %s; %s", schema, err) } toNativeAndCompare(t, schema, datum, encoded, codec) } func testJSONDecodePass(t *testing.T, schema string, datum interface{}, encoded []byte) { t.Helper() codec, err := NewCodecFrom(schema, &codecBuilder{ buildCodecForTypeDescribedByMap, buildCodecForTypeDescribedByString, buildCodecForTypeDescribedBySliceJSON, }) if err != nil { t.Fatalf("schema: %s; %s", schema, err) } toNativeAndCompare(t, schema, datum, encoded, codec) } func toNativeAndCompare(t *testing.T, schema string, datum interface{}, encoded []byte, codec *Codec) { t.Helper() decoded, remaining, err := codec.NativeFromTextual(encoded) if err != nil { t.Fatalf("schema: %s; %s", schema, err) } // remaining ought to be empty because there is nothing remaining to be // decoded if actual, expected := len(remaining), 0; actual != expected { t.Errorf("schema: %s; Datum: %#v; Actual: %v; Expected: %v", schema, datum, actual, expected) } const ( _ = iota isInt = iota isFloat32 = iota isFloat64 = iota isMap = iota isSlice = iota isString = iota ) var datumType int var datumInt int64 var datumFloat32 float32 var datumFloat64 float64 var datumMap map[string]interface{} var datumSlice []interface{} var datumString string switch v := datum.(type) { case float64: datumFloat64 = v datumType = isFloat64 case float32: datumFloat32 = v datumType = isFloat32 case int: datumInt = int64(v) datumType = isInt case int32: datumInt = int64(v) datumType = isInt case int64: datumInt = v datumType = isInt case string: datumString = v datumType = isString case []interface{}: datumSlice = v datumType = isSlice case map[string]interface{}: datumMap = v datumType = isMap } var decodedType int var decodedInt int64 var decodedFloat32 float32 var decodedFloat64 float64 var decodedMap map[string]interface{} var decodedSlice []interface{} var decodedString string switch v := decoded.(type) { case float64: decodedFloat64 = v decodedType = isFloat64 case float32: decodedFloat32 = v decodedType = isFloat32 case int: decodedInt = int64(v) decodedType = isInt case int32: decodedInt = int64(v) decodedType = isInt case int64: decodedInt = v decodedType = isInt case string: decodedString = v decodedType = isString case []interface{}: decodedSlice = v decodedType = isSlice case map[string]interface{}: decodedMap = v decodedType = isMap } if datumType == isInt && decodedType == isInt { if datumInt != decodedInt { t.Errorf("numerical comparison: schema: %s; Datum: %v; Actual: %v; Expected: %v", schema, datum, decodedInt, datumInt) } return } // NOTE: Special handling when both datum and decoded values are floating // point to test whether both are NaN, -Inf, or +Inf. if datumType == isFloat64 && decodedType == isFloat64 { if !(math.IsNaN(datumFloat64) && math.IsNaN(decodedFloat64)) && !(math.IsInf(datumFloat64, 1) && math.IsInf(decodedFloat64, 1)) && !(math.IsInf(datumFloat64, -1) && math.IsInf(decodedFloat64, -1)) && datumFloat64 != decodedFloat64 { t.Errorf("numerical comparison: schema: %s; Datum: %v; Actual: %v; Expected: %v", schema, datum, decodedFloat64, datumFloat64) } return } if datumType == isFloat32 && decodedType == isFloat32 { a := float64(datumFloat32) b := float64(decodedFloat32) if !(math.IsNaN(a) && math.IsNaN(b)) && !(math.IsInf(a, 1) && math.IsInf(b, 1)) && !(math.IsInf(a, -1) && math.IsInf(b, -1)) && a != b { t.Errorf("numerical comparison: schema: %s; Datum: %v; Actual: %v; Expected: %v", schema, datum, decodedFloat32, datumFloat32) } return } if datumType == isMap && decodedType == isMap { if actual, expected := len(decodedMap), len(datumMap); actual != expected { t.Fatalf("map comparison: length mismatch; Actual: %v; Expected: %v", actual, expected) } for key, datumValue := range datumMap { decodedValue, ok := decodedMap[key] if !ok { t.Fatalf("map comparison: decoded missing key: %q: Actual: %v; Expected: %v", key, decodedMap, datumMap) } if actual, expected := fmt.Sprintf("%v", decodedValue), fmt.Sprintf("%v", datumValue); actual != expected { t.Errorf("map comparison: values differ for key: %q; Actual: %v; Expected: %v", key, actual, expected) } } return } if datumType == isSlice && decodedType == isSlice { if actual, expected := len(decodedMap), len(datumMap); actual != expected { t.Fatalf("slice comparison: length mismatch; Actual: %v; Expected: %v", actual, expected) } for i, datumValue := range datumSlice { decodedValue := decodedSlice[i] if actual, expected := fmt.Sprintf("%v", decodedValue), fmt.Sprintf("%v", datumValue); actual != expected { t.Errorf("slice comparison: values differ for index: %d: Actual: %v; Expected: %v", i+1, actual, expected) } } return } if datumType == isString && decodedType == isString { if actual, expected := decodedString, datumString; actual != expected { t.Errorf("string comparison: Actual: %v; Expected: %v", actual, expected) } return } if actual, expected := fmt.Sprintf("%v", decoded), fmt.Sprintf("%v", datum); actual != expected { t.Errorf("schema: %s; Datum: %v; Actual: %s; Expected: %s", schema, datum, actual, expected) } } func testTextEncodePass(t *testing.T, schema string, datum interface{}, expected []byte) { t.Helper() codec, err := NewCodec(schema) if err != nil { t.Fatalf("Schma: %q %s", schema, err) } actual, err := codec.TextualFromNative(nil, datum) if err != nil { t.Fatalf("schema: %s; Datum: %v; %s", schema, datum, err) } if !bytes.Equal(actual, expected) { t.Errorf("schema: %s; Datum: %v; Actual: %+q; Expected: %+q", schema, datum, actual, expected) } } // testTextCodecPass does a bi-directional codec check, by encoding datum to // bytes, then decoding bytes back to datum. func testTextCodecPass(t *testing.T, schema string, datum interface{}, buf []byte) { t.Helper() testTextDecodePass(t, schema, datum, buf) testTextEncodePass(t, schema, datum, buf) } goavro-2.10.1/union.go000066400000000000000000000317761412474230400145660ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "bytes" "encoding/json" "errors" "fmt" "sort" ) // codecInfo is a set of quick lookups it holds all the lookup info for the // all the schemas we need to handle the list of types for this union type codecInfo struct { allowedTypes []string codecFromIndex []*Codec codecFromName map[string]*Codec indexFromName map[string]int } // Union wraps a datum value in a map for encoding as a Union, as required by // Union encoder. // // When providing a value for an Avro union, the encoder will accept `nil` for a // `null` value. If the value is non-`nil`, it must be a // `map[string]interface{}` with a single key-value pair, where the key is the // Avro type name and the value is the datum's value. As a convenience, the // `Union` function wraps any datum value in a map as specified above. // // func ExampleUnion() { // codec, err := goavro.NewCodec(`["null","string","int"]`) // if err != nil { // fmt.Println(err) // } // buf, err := codec.TextualFromNative(nil, goavro.Union("string", "some string")) // if err != nil { // fmt.Println(err) // } // fmt.Println(string(buf)) // // Output: {"string":"some string"} // } func Union(name string, datum interface{}) interface{} { if datum == nil && name == "null" { return nil } return map[string]interface{}{name: datum} } // makeCodecInfo takes the schema array // and builds some lookup indices // returning a codecInfo func makeCodecInfo(st map[string]*Codec, enclosingNamespace string, schemaArray []interface{}, cb *codecBuilder) (codecInfo, error) { allowedTypes := make([]string, len(schemaArray)) // used for error reporting when encoder receives invalid datum type codecFromIndex := make([]*Codec, len(schemaArray)) codecFromName := make(map[string]*Codec, len(schemaArray)) indexFromName := make(map[string]int, len(schemaArray)) for i, unionMemberSchema := range schemaArray { unionMemberCodec, err := buildCodec(st, enclosingNamespace, unionMemberSchema, cb) if err != nil { return codecInfo{}, fmt.Errorf("Union item %d ought to be valid Avro type: %s", i+1, err) } fullName := unionMemberCodec.typeName.fullName if _, ok := indexFromName[fullName]; ok { return codecInfo{}, fmt.Errorf("Union item %d ought to be unique type: %s", i+1, unionMemberCodec.typeName) } allowedTypes[i] = fullName codecFromIndex[i] = unionMemberCodec codecFromName[fullName] = unionMemberCodec indexFromName[fullName] = i } return codecInfo{ allowedTypes: allowedTypes, codecFromIndex: codecFromIndex, codecFromName: codecFromName, indexFromName: indexFromName, }, nil } func nativeFromBinary(cr *codecInfo) func(buf []byte) (interface{}, []byte, error) { return func(buf []byte) (interface{}, []byte, error) { var decoded interface{} var err error decoded, buf, err = longNativeFromBinary(buf) if err != nil { return nil, nil, err } index := decoded.(int64) // longDecoder always returns int64, so elide error checking if index < 0 || index >= int64(len(cr.codecFromIndex)) { return nil, nil, fmt.Errorf("cannot decode binary union: index ought to be between 0 and %d; read index: %d", len(cr.codecFromIndex)-1, index) } c := cr.codecFromIndex[index] decoded, buf, err = c.nativeFromBinary(buf) if err != nil { return nil, nil, fmt.Errorf("cannot decode binary union item %d: %s", index+1, err) } if decoded == nil { // do not wrap a nil value in a map return nil, buf, nil } // Non-nil values are wrapped in a map with single key set to type name of value return Union(cr.allowedTypes[index], decoded), buf, nil } } func binaryFromNative(cr *codecInfo) func(buf []byte, datum interface{}) ([]byte, error) { return func(buf []byte, datum interface{}) ([]byte, error) { switch v := datum.(type) { case nil: index, ok := cr.indexFromName["null"] if !ok { return nil, fmt.Errorf("cannot encode binary union: no member schema types support datum: allowed types: %v; received: %T", cr.allowedTypes, datum) } return longBinaryFromNative(buf, index) case map[string]interface{}: if len(v) != 1 { return nil, fmt.Errorf("cannot encode binary union: non-nil Union values ought to be specified with Go map[string]interface{}, with single key equal to type name, and value equal to datum value: %v; received: %T", cr.allowedTypes, datum) } // will execute exactly once for key, value := range v { index, ok := cr.indexFromName[key] if !ok { return nil, fmt.Errorf("cannot encode binary union: no member schema types support datum: allowed types: %v; received: %T", cr.allowedTypes, datum) } c := cr.codecFromIndex[index] buf, _ = longBinaryFromNative(buf, index) return c.binaryFromNative(buf, value) } } return nil, fmt.Errorf("cannot encode binary union: non-nil Union values ought to be specified with Go map[string]interface{}, with single key equal to type name, and value equal to datum value: %v; received: %T", cr.allowedTypes, datum) } } func nativeFromTextual(cr *codecInfo) func(buf []byte) (interface{}, []byte, error) { return func(buf []byte) (interface{}, []byte, error) { if len(buf) >= 4 && bytes.Equal(buf[:4], []byte("null")) { if _, ok := cr.indexFromName["null"]; ok { return nil, buf[4:], nil } } var datum interface{} var err error datum, buf, err = genericMapTextDecoder(buf, nil, cr.codecFromName) if err != nil { return nil, nil, fmt.Errorf("cannot decode textual union: %s", err) } return datum, buf, nil } } func textualFromNative(cr *codecInfo) func(buf []byte, datum interface{}) ([]byte, error) { return func(buf []byte, datum interface{}) ([]byte, error) { switch v := datum.(type) { case nil: _, ok := cr.indexFromName["null"] if !ok { return nil, fmt.Errorf("cannot encode textual union: no member schema types support datum: allowed types: %v; received: %T", cr.allowedTypes, datum) } return append(buf, "null"...), nil case map[string]interface{}: if len(v) != 1 { return nil, fmt.Errorf("cannot encode textual union: non-nil Union values ought to be specified with Go map[string]interface{}, with single key equal to type name, and value equal to datum value: %v; received: %T", cr.allowedTypes, datum) } // will execute exactly once for key, value := range v { index, ok := cr.indexFromName[key] if !ok { return nil, fmt.Errorf("cannot encode textual union: no member schema types support datum: allowed types: %v; received: %T", cr.allowedTypes, datum) } buf = append(buf, '{') var err error buf, err = stringTextualFromNative(buf, key) if err != nil { return nil, fmt.Errorf("cannot encode textual union: %s", err) } buf = append(buf, ':') c := cr.codecFromIndex[index] buf, err = c.textualFromNative(buf, value) if err != nil { return nil, fmt.Errorf("cannot encode textual union: %s", err) } return append(buf, '}'), nil } } return nil, fmt.Errorf("cannot encode textual union: non-nil values ought to be specified with Go map[string]interface{}, with single key equal to type name, and value equal to datum value: %v; received: %T", cr.allowedTypes, datum) } } func buildCodecForTypeDescribedBySlice(st map[string]*Codec, enclosingNamespace string, schemaArray []interface{}, cb *codecBuilder) (*Codec, error) { if len(schemaArray) == 0 { return nil, errors.New("Union ought to have one or more members") } cr, err := makeCodecInfo(st, enclosingNamespace, schemaArray, cb) if err != nil { return nil, err } rv := &Codec{ // NOTE: To support record field default values, union schema set to the // type name of first member // TODO: add/change to schemaCanonical below schemaOriginal: cr.codecFromIndex[0].typeName.fullName, typeName: &name{"union", nullNamespace}, nativeFromBinary: nativeFromBinary(&cr), binaryFromNative: binaryFromNative(&cr), nativeFromTextual: nativeFromTextual(&cr), textualFromNative: textualFromNative(&cr), } return rv, nil } // Standard JSON // // The default avro library supports a json that would result from your data into json // instead of serializing it into binary // // JSON in the wild differs from that in one critical way - unions // the avro spec requires unions to have their type indicated // which means every value that is of a union type // is actually sent as a small map {"string", "some string"} // instead of simply as the value itself, which is the way of wild JSON // https://avro.apache.org/docs/current/spec.html#json_encoding // // In order to use this to avro encode standard json the unions have to be rewritten // so the can encode into unions as expected by the avro schema // // so the technique is to read in the json in the usual way // when a union type is found, read the next json object // try to figure out if it fits into any of the types // that are specified for the union per the supplied schema // if so, then wrap the value into a map and return the expected Union // // the json is morphed on the read side // and then it will remain avro-json object // avro data is not serialized back into standard json // the data goes to avro-json and stays that way func buildCodecForTypeDescribedBySliceJSON(st map[string]*Codec, enclosingNamespace string, schemaArray []interface{}, cb *codecBuilder) (*Codec, error) { if len(schemaArray) == 0 { return nil, errors.New("Union ought to have one or more members") } cr, err := makeCodecInfo(st, enclosingNamespace, schemaArray, cb) if err != nil { return nil, err } rv := &Codec{ // NOTE: To support record field default values, union schema set to the // type name of first member // TODO: add/change to schemaCanonical below schemaOriginal: cr.codecFromIndex[0].typeName.fullName, typeName: &name{"union", nullNamespace}, nativeFromBinary: nativeFromBinary(&cr), binaryFromNative: binaryFromNative(&cr), nativeFromTextual: nativeAvroFromTextualJson(&cr), textualFromNative: textualFromNative(&cr), } return rv, nil } func checkAll(allowedTypes []string, cr *codecInfo, buf []byte) (interface{}, []byte, error) { for _, name := range cr.allowedTypes { if name == "null" { // skip null since we know we already got type float64 continue } theCodec, ok := cr.codecFromName[name] if !ok { continue } rv, rb, err := theCodec.NativeFromTextual(buf) if err != nil { continue } return map[string]interface{}{name: rv}, rb, nil } return nil, buf, fmt.Errorf("could not decode any json data in input %v", string(buf)) } func nativeAvroFromTextualJson(cr *codecInfo) func(buf []byte) (interface{}, []byte, error) { return func(buf []byte) (interface{}, []byte, error) { reader := bytes.NewReader(buf) dec := json.NewDecoder(reader) var m interface{} // i should be able to grab the next json "value" with decoder.Decode() // https://pkg.go.dev/encoding/json#Decoder.Decode // that dec.More() loop will give the next // whatever then dec.Decode(&m) // if m is interface{} // it goes one legit json object at a time like this // json.Delim: [ // Q:map[string]interface {}: map[Name:Ed Text:Knock knock.] // Q:map[string]interface {}: map[Name:Sam Text:Who's there?] // Q:map[string]interface {}: map[Name:Ed Text:Go fmt.] // Q:map[string]interface {}: map[Name:Sam Text:Go fmt who?] // Q:map[string]interface {}: map[Name:Ed Text:Go fmt yourself!] // string: eewew // bottom:json.Delim: ] // // so right here, grab whatever this object is // grab the object specified as the value // and try to figure out what it is and handle it err := dec.Decode(&m) if err != nil { return nil, buf, err } allowedTypes := cr.allowedTypes switch m.(type) { case nil: if len(buf) >= 4 && bytes.Equal(buf[:4], []byte("null")) { if _, ok := cr.codecFromName["null"]; ok { return nil, buf[4:], nil } } case float64: // dec.Decode turns them all into float64 // avro spec knows about int, long (variable length zig-zag) // and then float and double (32 bits, 64 bits) // https://avro.apache.org/docs/current/spec.html#binary_encode_primitive // // double // doubleNativeFromTextual // float // floatNativeFromTextual // long // longNativeFromTextual // int // intNativeFromTextual // sorted so it would be // double, float, int, long // that makes the priorities right by chance sort.Strings(cr.allowedTypes) case map[string]interface{}: // try to decode it as a map // because a map should fail faster than a record // if that fails assume record and return it sort.Strings(cr.allowedTypes) } return checkAll(allowedTypes, cr, buf) } } goavro-2.10.1/union_test.go000066400000000000000000000362341412474230400156170ustar00rootroot00000000000000// Copyright [2019] LinkedIn Corp. Licensed under the Apache License, Version // 2.0 (the "License"); you may not use this file except in compliance with the // License. You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, WITHOUT // WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. package goavro import ( "bytes" "fmt" "math" "strconv" "testing" ) func TestSchemaUnion(t *testing.T) { testSchemaInvalid(t, `[{"type":"enum","name":"e1","symbols":["alpha","bravo"]},"e1"]`, "Union item 2 ought to be unique type") testSchemaInvalid(t, `[{"type":"enum","name":"com.example.one","symbols":["red","green","blue"]},{"type":"enum","name":"one","namespace":"com.example","symbols":["dog","cat"]}]`, "Union item 2 ought to be unique type") } func TestUnion(t *testing.T) { testBinaryCodecPass(t, `["null"]`, Union("null", nil), []byte("\x00")) testBinaryCodecPass(t, `["null","int"]`, Union("null", nil), []byte("\x00")) testBinaryCodecPass(t, `["int","null"]`, Union("null", nil), []byte("\x02")) testBinaryCodecPass(t, `["null","int"]`, Union("int", 3), []byte("\x02\x06")) testBinaryCodecPass(t, `["null","long"]`, Union("long", 3), []byte("\x02\x06")) testBinaryCodecPass(t, `["int","null"]`, Union("int", 3), []byte("\x00\x06")) testBinaryEncodePass(t, `["int","null"]`, Union("int", 3), []byte("\x00\x06")) // can encode a bare 3 testBinaryEncodeFail(t, `[{"type":"enum","name":"colors","symbols":["red","green","blue"]},{"type":"enum","name":"animals","symbols":["dog","cat"]}]`, Union("colors", "bravo"), "value ought to be member of symbols") testBinaryEncodeFail(t, `[{"type":"enum","name":"colors","symbols":["red","green","blue"]},{"type":"enum","name":"animals","symbols":["dog","cat"]}]`, Union("animals", "bravo"), "value ought to be member of symbols") testBinaryCodecPass(t, `[{"type":"enum","name":"colors","symbols":["red","green","blue"]},{"type":"enum","name":"animals","symbols":["dog","cat"]}]`, Union("colors", "green"), []byte{0, 2}) testBinaryCodecPass(t, `[{"type":"enum","name":"colors","symbols":["red","green","blue"]},{"type":"enum","name":"animals","symbols":["dog","cat"]}]`, Union("animals", "cat"), []byte{2, 2}) } func TestUnionRejectInvalidType(t *testing.T) { testBinaryEncodeFailBadDatumType(t, `["null","long"]`, 3) testBinaryEncodeFailBadDatumType(t, `["null","int","long","float"]`, float64(3.5)) testBinaryEncodeFailBadDatumType(t, `["null","long"]`, Union("int", 3)) testBinaryEncodeFailBadDatumType(t, `["null","int","long","float"]`, Union("double", float64(3.5))) } func TestUnionWillCoerceTypeIfPossible(t *testing.T) { testBinaryCodecPass(t, `["null","long","float","double"]`, Union("long", int32(3)), []byte("\x02\x06")) testBinaryCodecPass(t, `["null","int","float","double"]`, Union("int", int64(3)), []byte("\x02\x06")) testBinaryCodecPass(t, `["null","int","long","double"]`, Union("double", float32(3.5)), []byte("\x06\x00\x00\x00\x00\x00\x00\f@")) testBinaryCodecPass(t, `["null","int","long","float"]`, Union("float", float64(3.5)), []byte("\x06\x00\x00\x60\x40")) } func TestUnionNumericCoercionGuardsPrecision(t *testing.T) { testBinaryEncodeFail(t, `["null","int","long","double"]`, Union("int", float32(3.5)), "lose precision") } func TestUnionWithArray(t *testing.T) { testBinaryCodecPass(t, `["null",{"type":"array","items":"int"}]`, Union("null", nil), []byte("\x00")) testBinaryCodecPass(t, `["null",{"type":"array","items":"int"}]`, Union("array", []interface{}{}), []byte("\x02\x00")) testBinaryCodecPass(t, `["null",{"type":"array","items":"int"}]`, Union("array", []interface{}{1}), []byte("\x02\x02\x02\x00")) testBinaryCodecPass(t, `["null",{"type":"array","items":"int"}]`, Union("array", []interface{}{1, 2}), []byte("\x02\x04\x02\x04\x00")) testBinaryCodecPass(t, `[{"type": "array", "items": "string"}, "null"]`, Union("null", nil), []byte{2}) testBinaryCodecPass(t, `[{"type": "array", "items": "string"}, "null"]`, Union("array", []string{"foo"}), []byte("\x00\x02\x06foo\x00")) testBinaryCodecPass(t, `[{"type": "array", "items": "string"}, "null"]`, Union("array", []string{"foo", "bar"}), []byte("\x00\x04\x06foo\x06bar\x00")) } func TestUnionWithMap(t *testing.T) { testBinaryCodecPass(t, `["null",{"type":"map","values":"string"}]`, Union("null", nil), []byte("\x00")) testBinaryCodecPass(t, `["string",{"type":"map","values":"string"}]`, Union("map", map[string]interface{}{"He": "Helium"}), []byte("\x02\x02\x04He\x0cHelium\x00")) testBinaryCodecPass(t, `["string",{"type":"array","items":"string"}]`, Union("string", "Helium"), []byte("\x00\x0cHelium")) } func TestUnionMapRecordFitsInRecord(t *testing.T) { // union value may be either map or a record codec, err := NewCodec(`["null",{"type":"map","values":"double"},{"type":"record","name":"com.example.record","fields":[{"name":"field1","type":"int"},{"name":"field2","type":"float"}]}]`) if err != nil { t.Fatal(err) } // the provided datum value could be encoded by either the map or the record // schemas above datum := map[string]interface{}{ "field1": 3, "field2": 3.5, } datumIn := Union("com.example.record", datum) buf, err := codec.BinaryFromNative(nil, datumIn) if err != nil { t.Fatal(err) } if !bytes.Equal(buf, []byte{ 0x04, // prefer record (union item 2) over map (union item 1) 0x06, // field1 == 3 0x00, 0x00, 0x60, 0x40, // field2 == 3.5 }) { t.Errorf("GOT: %#v; WANT: %#v", buf, []byte{byte(2)}) } // round trip datumOut, buf, err := codec.NativeFromBinary(buf) if err != nil { t.Fatal(err) } if actual, expected := len(buf), 0; actual != expected { t.Errorf("GOT: %#v; WANT: %#v", actual, expected) } datumOutMap, ok := datumOut.(map[string]interface{}) if !ok { t.Fatalf("GOT: %#v; WANT: %#v", ok, false) } if actual, expected := len(datumOutMap), 1; actual != expected { t.Fatalf("GOT: %#v; WANT: %#v", actual, expected) } datumValue, ok := datumOutMap["com.example.record"] if !ok { t.Fatalf("GOT: %#v; WANT: %#v", datumOutMap, "have `com.example.record` key") } datumValueMap, ok := datumValue.(map[string]interface{}) if !ok { t.Errorf("GOT: %#v; WANT: %#v", ok, true) } if actual, expected := len(datumValueMap), len(datum); actual != expected { t.Errorf("GOT: %#v; WANT: %#v", actual, expected) } for k, v := range datum { if actual, expected := fmt.Sprintf("%v", datumValueMap[k]), fmt.Sprintf("%v", v); actual != expected { t.Errorf("GOT: %#v; WANT: %#v", actual, expected) } } } func TestUnionRecordFieldWhenNull(t *testing.T) { schema := `{ "type": "record", "name": "r1", "fields": [ {"name": "f1", "type": [{"type": "array", "items": "string"}, "null"]} ] }` testBinaryCodecPass(t, schema, map[string]interface{}{"f1": Union("array", []interface{}{})}, []byte("\x00\x00")) testBinaryCodecPass(t, schema, map[string]interface{}{"f1": Union("array", []string{"bar"})}, []byte("\x00\x02\x06bar\x00")) testBinaryCodecPass(t, schema, map[string]interface{}{"f1": Union("array", []string{})}, []byte("\x00\x00")) testBinaryCodecPass(t, schema, map[string]interface{}{"f1": Union("null", nil)}, []byte("\x02")) testBinaryCodecPass(t, schema, map[string]interface{}{"f1": nil}, []byte("\x02")) } func TestUnionText(t *testing.T) { testTextEncodeFail(t, `["null","int"]`, Union("null", 3), "expected") testTextCodecPass(t, `["null","int"]`, Union("null", nil), []byte("null")) testTextCodecPass(t, `["null","int"]`, Union("int", 3), []byte(`{"int":3}`)) testTextCodecPass(t, `["null","int","string"]`, Union("string", "😂 "), []byte(`{"string":"\u0001\uD83D\uDE02 "}`)) } func ExampleUnion() { codec, err := NewCodec(`["null","string","int"]`) if err != nil { fmt.Println(err) } buf, err := codec.TextualFromNative(nil, Union("string", "some string")) if err != nil { fmt.Println(err) } fmt.Println(string(buf)) // Output: {"string":"some string"} } func ExampleUnion3() { // Imagine a record field with the following union type. I have seen this // sort of type in many schemas. I have been told the reasoning behind it is // when the writer desires to encode data to JSON that cannot be written as // a JSON number, then to encode it as a string and allow the reader to // parse the string accordingly. codec, err := NewCodec(`["null","double","string"]`) if err != nil { fmt.Println(err) } native, _, err := codec.NativeFromTextual([]byte(`{"string":"NaN"}`)) if err != nil { fmt.Println(err) } value := math.NaN() if native == nil { fmt.Print("decoded null: ") } else { for k, v := range native.(map[string]interface{}) { switch k { case "double": fmt.Print("decoded double: ") value = v.(float64) case "string": fmt.Print("decoded string: ") s := v.(string) switch s { case "NaN": value = math.NaN() case "+Infinity": value = math.Inf(1) case "-Infinity": value = math.Inf(-1) default: var err error value, err = strconv.ParseFloat(s, 64) if err != nil { fmt.Println(err) } } } } } fmt.Println(value) // Output: decoded string: NaN } func ExampleJSONUnion() { codec, err := NewCodec(`["null","string","int"]`) if err != nil { fmt.Println(err) } buf, err := codec.TextualFromNative(nil, Union("string", "some string")) if err != nil { fmt.Println(err) } fmt.Println(string(buf)) // Output: {"string":"some string"} } // // The following examples show the way to put a new codec into use // Currently the only new codec is ont that supports standard json // which does not indicate unions in any way // so standard json data needs to be guided into avro unions // show how to use the default codec via the NewCodecFrom mechanism func ExampleCustomCodec() { codec, err := NewCodecFrom(`"string"`, &codecBuilder{ buildCodecForTypeDescribedByMap, buildCodecForTypeDescribedByString, buildCodecForTypeDescribedBySlice, }) if err != nil { fmt.Println(err) } buf, err := codec.TextualFromNative(nil, "some string 22") if err != nil { fmt.Println(err) } fmt.Println(string(buf)) // Output: "some string 22" } // Use the standard JSON codec instead func ExampleJSONStringToTextual() { codec, err := NewCodecFrom(`["null","string","int"]`, &codecBuilder{ buildCodecForTypeDescribedByMap, buildCodecForTypeDescribedByString, buildCodecForTypeDescribedBySliceJSON, }) if err != nil { fmt.Println(err) } buf, err := codec.TextualFromNative(nil, Union("string", "some string")) if err != nil { fmt.Println(err) } fmt.Println(string(buf)) // Output: {"string":"some string"} } func ExampleJSONStringToNative() { codec, err := NewCodecFrom(`["null","string","int"]`, &codecBuilder{ buildCodecForTypeDescribedByMap, buildCodecForTypeDescribedByString, buildCodecForTypeDescribedBySliceJSON, }) if err != nil { fmt.Println(err) } // send in a legit json string t, _, err := codec.NativeFromTextual([]byte("\"some string one\"")) if err != nil { fmt.Println(err) } // see it parse into a map like the avro encoder does o, ok := t.(map[string]interface{}) if !ok { fmt.Printf("its a %T not a map[string]interface{}", t) } // pull out the string to show its all good _v := o["string"] v, ok := _v.(string) fmt.Println(v) // Output: some string one } func TestUnionJSON(t *testing.T) { testJSONDecodePass(t, `["null","int"]`, nil, []byte("null")) testJSONDecodePass(t, `["null","int","long"]`, Union("int", 3), []byte(`3`)) testJSONDecodePass(t, `["null","long","int"]`, Union("int", 3), []byte(`3`)) testJSONDecodePass(t, `["null","int","long"]`, Union("long", 333333333333333), []byte(`333333333333333`)) testJSONDecodePass(t, `["null","long","int"]`, Union("long", 333333333333333), []byte(`333333333333333`)) testJSONDecodePass(t, `["null","float","int","long"]`, Union("float", 6.77), []byte(`6.77`)) testJSONDecodePass(t, `["null","int","float","long"]`, Union("float", 6.77), []byte(`6.77`)) testJSONDecodePass(t, `["null","double","int","long"]`, Union("double", 6.77), []byte(`6.77`)) testJSONDecodePass(t, `["null","int","float","double","long"]`, Union("double", 6.77), []byte(`6.77`)) testJSONDecodePass(t, `["null",{"type":"array","items":"int"}]`, Union("array", []interface{}{1, 2}), []byte(`[1,2]`)) testJSONDecodePass(t, `["null",{"type":"map","values":"int"}]`, Union("map", map[string]interface{}{"k1": 13}), []byte(`{"k1":13}`)) testJSONDecodePass(t, `["null",{"name":"r1","type":"record","fields":[{"name":"field1","type":"string"},{"name":"field2","type":"string"}]}]`, Union("r1", map[string]interface{}{"field1": "value1", "field2": "value2"}), []byte(`{"field1": "value1", "field2": "value2"}`)) testJSONDecodePass(t, `["null","boolean"]`, Union("boolean", true), []byte(`true`)) testJSONDecodePass(t, `["null","boolean"]`, Union("boolean", false), []byte(`false`)) testJSONDecodePass(t, `["null",{"type":"enum","name":"e1","symbols":["alpha","bravo"]}]`, Union("e1", "bravo"), []byte(`"bravo"`)) testJSONDecodePass(t, `["null", "bytes"]`, Union("bytes", []byte("")), []byte("\"\"")) testJSONDecodePass(t, `["null", "bytes", "string"]`, Union("bytes", []byte("")), []byte("\"\"")) testJSONDecodePass(t, `["null", "string", "bytes"]`, Union("string", "value1"), []byte(`"value1"`)) testJSONDecodePass(t, `["null", {"type":"enum","name":"e1","symbols":["alpha","bravo"]}, "string"]`, Union("e1", "bravo"), []byte(`"bravo"`)) testJSONDecodePass(t, `["null", {"type":"fixed","name":"f1","size":4}]`, Union("f1", []byte(`abcd`)), []byte(`"abcd"`)) testJSONDecodePass(t, `"string"`, "abcd", []byte(`"abcd"`)) testJSONDecodePass(t, `{"type":"record","name":"kubeEvents","fields":[{"name":"field1","type":"string","default":""}]}`, map[string]interface{}{"field1": "value1"}, []byte(`{"field1":"value1"}`)) testJSONDecodePass(t, `{"type":"record","name":"kubeEvents","fields":[{"name":"field1","type":"string","default":""},{"name":"field2","type":"string"}]}`, map[string]interface{}{"field1": "", "field2": "deef"}, []byte(`{"field2": "deef"}`)) testJSONDecodePass(t, `{"type":"record","name":"kubeEvents","fields":[{"name":"field1","type":["string","null"],"default":""}]}`, map[string]interface{}{"field1": Union("string", "value1")}, []byte(`{"field1":"value1"}`)) testJSONDecodePass(t, `{"type":"record","name":"kubeEvents","fields":[{"name":"field1","type":["string","null"],"default":""}]}`, map[string]interface{}{"field1": nil}, []byte(`{"field1":null}`)) // union of null which has minimal syntax testJSONDecodePass(t, `{"type":"record","name":"LongList","fields":[{"name":"next","type":["null","LongList"],"default":null}]}`, map[string]interface{}{"next": nil}, []byte(`{"next": null}`)) // record containing union of record (recursive record) testJSONDecodePass(t, `{"type":"record","name":"LongList","fields":[{"name":"next","type":["null","LongList"],"default":null}]}`, map[string]interface{}{"next": Union("LongList", map[string]interface{}{"next": nil})}, []byte(`{"next":{"next":null}}`)) testJSONDecodePass(t, `{"type":"record","name":"LongList","fields":[{"name":"next","type":["null","LongList"],"default":null}]}`, map[string]interface{}{"next": Union("LongList", map[string]interface{}{"next": Union("LongList", map[string]interface{}{"next": nil})})}, []byte(`{"next":{"next":{"next":null}}}`)) }