pax_global_header00006660000000000000000000000064117110444510014510gustar00rootroot0000000000000052 comment=9ddc8e1eb65d7795af989ae5c5c7a6b114238abb espeakedit-1.46.02/000077500000000000000000000000001171104445100140005ustar00rootroot00000000000000espeakedit-1.46.02/License.txt000066400000000000000000000772201171104445100161330ustar00rootroot00000000000000 GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007 Copyright (C) 2007 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The GNU General Public License is a free, copyleft license for software and other kinds of works. The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change all versions of a program--to make sure it remains free software for all its users. We, the Free Software Foundation, use the GNU General Public License for most of our software; it applies also to any other work released this way by its authors. You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for them if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs, and that you know you can do these things. To protect your rights, we need to prevent others from denying you these rights or asking you to surrender the rights. Therefore, you have certain responsibilities if you distribute copies of the software, or if you modify it: responsibilities to respect the freedom of others. For example, if you distribute copies of such a program, whether gratis or for a fee, you must pass on to the recipients the same freedoms that you received. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. Developers that use the GNU GPL protect your rights with two steps: (1) assert copyright on the software, and (2) offer you this License giving you legal permission to copy, distribute and/or modify it. For the developers' and authors' protection, the GPL clearly explains that there is no warranty for this free software. For both users' and authors' sake, the GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions. Some devices are designed to deny users access to install or run modified versions of the software inside them, although the manufacturer can do so. This is fundamentally incompatible with the aim of protecting users' freedom to change the software. The systematic pattern of such abuse occurs in the area of products for individuals to use, which is precisely where it is most unacceptable. Therefore, we have designed this version of the GPL to prohibit the practice for those products. If such problems arise substantially in other domains, we stand ready to extend this provision to those domains in future versions of the GPL, as needed to protect the freedom of users. Finally, every program is threatened constantly by software patents. States should not allow patents to restrict development and use of software on general-purpose computers, but in those that do, we wish to avoid the special danger that patents applied to a free program could make it effectively proprietary. To prevent this, the GPL assures that patents cannot be used to render the program non-free. The precise terms and conditions for copying, distribution and modification follow. TERMS AND CONDITIONS 0. Definitions. "This License" refers to version 3 of the GNU General Public License. "Copyright" also means copyright-like laws that apply to other kinds of works, such as semiconductor masks. "The Program" refers to any copyrightable work licensed under this License. Each licensee is addressed as "you". "Licensees" and "recipients" may be individuals or organizations. To "modify" a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy. The resulting work is called a "modified version" of the earlier work or a work "based on" the earlier work. A "covered work" means either the unmodified Program or a work based on the Program. To "propagate" a work means to do anything with it that, without permission, would make you directly or secondarily liable for infringement under applicable copyright law, except executing it on a computer or modifying a private copy. Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well. To "convey" a work means any kind of propagation that enables other parties to make or receive copies. Mere interaction with a user through a computer network, with no transfer of a copy, is not conveying. An interactive user interface displays "Appropriate Legal Notices" to the extent that it includes a convenient and prominently visible feature that (1) displays an appropriate copyright notice, and (2) tells the user that there is no warranty for the work (except to the extent that warranties are provided), that licensees may convey the work under this License, and how to view a copy of this License. If the interface presents a list of user commands or options, such as a menu, a prominent item in the list meets this criterion. 1. Source Code. The "source code" for a work means the preferred form of the work for making modifications to it. "Object code" means any non-source form of a work. A "Standard Interface" means an interface that either is an official standard defined by a recognized standards body, or, in the case of interfaces specified for a particular programming language, one that is widely used among developers working in that language. The "System Libraries" of an executable work include anything, other than the work as a whole, that (a) is included in the normal form of packaging a Major Component, but which is not part of that Major Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an implementation is available to the public in source code form. A "Major Component", in this context, means a major essential component (kernel, window system, and so on) of the specific operating system (if any) on which the executable work runs, or a compiler used to produce the work, or an object code interpreter used to run it. The "Corresponding Source" for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities. However, it does not include the work's System Libraries, or general-purpose tools or generally available free programs which are used unmodified in performing those activities but which are not part of the work. For example, Corresponding Source includes interface definition files associated with source files for the work, and the source code for shared libraries and dynamically linked subprograms that the work is specifically designed to require, such as by intimate data communication or control flow between those subprograms and other parts of the work. The Corresponding Source need not include anything that users can regenerate automatically from other parts of the Corresponding Source. The Corresponding Source for a work in source code form is that same work. 2. Basic Permissions. All rights granted under this License are granted for the term of copyright on the Program, and are irrevocable provided the stated conditions are met. This License explicitly affirms your unlimited permission to run the unmodified Program. The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work. This License acknowledges your rights of fair use or other equivalent, as provided by copyright law. You may make, run and propagate covered works that you do not convey, without conditions so long as your license otherwise remains in force. You may convey covered works to others for the sole purpose of having them make modifications exclusively for you, or provide you with facilities for running those works, provided that you comply with the terms of this License in conveying all material for which you do not control copyright. Those thus making or running the covered works for you must do so exclusively on your behalf, under your direction and control, on terms that prohibit them from making any copies of your copyrighted material outside their relationship with you. Conveying under any other circumstances is permitted solely under the conditions stated below. Sublicensing is not allowed; section 10 makes it unnecessary. 3. Protecting Users' Legal Rights From Anti-Circumvention Law. No covered work shall be deemed part of an effective technological measure under any applicable law fulfilling obligations under article 11 of the WIPO copyright treaty adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention of such measures. When you convey a covered work, you waive any legal power to forbid circumvention of technological measures to the extent such circumvention is effected by exercising rights under this License with respect to the covered work, and you disclaim any intention to limit operation or modification of the work as a means of enforcing, against the work's users, your or third parties' legal rights to forbid circumvention of technological measures. 4. Conveying Verbatim Copies. You may convey verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice; keep intact all notices stating that this License and any non-permissive terms added in accord with section 7 apply to the code; keep intact all notices of the absence of any warranty; and give all recipients a copy of this License along with the Program. You may charge any price or no price for each copy that you convey, and you may offer support or warranty protection for a fee. 5. Conveying Modified Source Versions. You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions: a) The work must carry prominent notices stating that you modified it, and giving a relevant date. b) The work must carry prominent notices stating that it is released under this License and any conditions added under section 7. This requirement modifies the requirement in section 4 to "keep intact all notices". c) You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it. d) If the work has interactive user interfaces, each must display Appropriate Legal Notices; however, if the Program has interactive interfaces that do not display Appropriate Legal Notices, your work need not make them do so. A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an "aggregate" if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate. 6. Conveying Non-Source Forms. You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways: a) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by the Corresponding Source fixed on a durable physical medium customarily used for software interchange. b) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by a written offer, valid for at least three years and valid for as long as you offer spare parts or customer support for that product model, to give anyone who possesses the object code either (1) a copy of the Corresponding Source for all the software in the product that is covered by this License, on a durable physical medium customarily used for software interchange, for a price no more than your reasonable cost of physically performing this conveying of source, or (2) access to copy the Corresponding Source from a network server at no charge. c) Convey individual copies of the object code with a copy of the written offer to provide the Corresponding Source. This alternative is allowed only occasionally and noncommercially, and only if you received the object code with such an offer, in accord with subsection 6b. d) Convey the object code by offering access from a designated place (gratis or for a charge), and offer equivalent access to the Corresponding Source in the same way through the same place at no further charge. You need not require recipients to copy the Corresponding Source along with the object code. If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements. e) Convey the object code using peer-to-peer transmission, provided you inform other peers where the object code and Corresponding Source of the work are being offered to the general public at no charge under subsection 6d. A separable portion of the object code, whose source code is excluded from the Corresponding Source as a System Library, need not be included in conveying the object code work. A "User Product" is either (1) a "consumer product", which means any tangible personal property which is normally used for personal, family, or household purposes, or (2) anything designed or sold for incorporation into a dwelling. In determining whether a product is a consumer product, doubtful cases shall be resolved in favor of coverage. For a particular product received by a particular user, "normally used" refers to a typical or common use of that class of product, regardless of the status of the particular user or of the way in which the particular user actually uses, or expects or is expected to use, the product. A product is a consumer product regardless of whether the product has substantial commercial, industrial or non-consumer uses, unless such uses represent the only significant mode of use of the product. "Installation Information" for a User Product means any methods, procedures, authorization keys, or other information required to install and execute modified versions of a covered work in that User Product from a modified version of its Corresponding Source. The information must suffice to ensure that the continued functioning of the modified object code is in no case prevented or interfered with solely because modification has been made. If you convey an object code work under this section in, or with, or specifically for use in, a User Product, and the conveying occurs as part of a transaction in which the right of possession and use of the User Product is transferred to the recipient in perpetuity or for a fixed term (regardless of how the transaction is characterized), the Corresponding Source conveyed under this section must be accompanied by the Installation Information. But this requirement does not apply if neither you nor any third party retains the ability to install modified object code on the User Product (for example, the work has been installed in ROM). The requirement to provide Installation Information does not include a requirement to continue to provide support service, warranty, or updates for a work that has been modified or installed by the recipient, or for the User Product in which it has been modified or installed. Access to a network may be denied when the modification itself materially and adversely affects the operation of the network or violates the rules and protocols for communication across the network. Corresponding Source conveyed, and Installation Information provided, in accord with this section must be in a format that is publicly documented (and with an implementation available to the public in source code form), and must require no special password or key for unpacking, reading or copying. 7. Additional Terms. "Additional permissions" are terms that supplement the terms of this License by making exceptions from one or more of its conditions. Additional permissions that are applicable to the entire Program shall be treated as though they were included in this License, to the extent that they are valid under applicable law. If additional permissions apply only to part of the Program, that part may be used separately under those permissions, but the entire Program remains governed by this License without regard to the additional permissions. When you convey a copy of a covered work, you may at your option remove any additional permissions from that copy, or from any part of it. (Additional permissions may be written to require their own removal in certain cases when you modify the work.) You may place additional permissions on material, added by you to a covered work, for which you have or can give appropriate copyright permission. Notwithstanding any other provision of this License, for material you add to a covered work, you may (if authorized by the copyright holders of that material) supplement the terms of this License with terms: a) Disclaiming warranty or limiting liability differently from the terms of sections 15 and 16 of this License; or b) Requiring preservation of specified reasonable legal notices or author attributions in that material or in the Appropriate Legal Notices displayed by works containing it; or c) Prohibiting misrepresentation of the origin of that material, or requiring that modified versions of such material be marked in reasonable ways as different from the original version; or d) Limiting the use for publicity purposes of names of licensors or authors of the material; or e) Declining to grant rights under trademark law for use of some trade names, trademarks, or service marks; or f) Requiring indemnification of licensors and authors of that material by anyone who conveys the material (or modified versions of it) with contractual assumptions of liability to the recipient, for any liability that these contractual assumptions directly impose on those licensors and authors. All other non-permissive additional terms are considered "further restrictions" within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term. If a license document contains a further restriction but permits relicensing or conveying under this License, you may add to a covered work material governed by the terms of that license document, provided that the further restriction does not survive such relicensing or conveying. If you add terms to a covered work in accord with this section, you must place, in the relevant source files, a statement of the additional terms that apply to those files, or a notice indicating where to find the applicable terms. Additional terms, permissive or non-permissive, may be stated in the form of a separately written license, or stated as exceptions; the above requirements apply either way. 8. Termination. You may not propagate or modify a covered work except as expressly provided under this License. Any attempt otherwise to propagate or modify it is void, and will automatically terminate your rights under this License (including any patent licenses granted under the third paragraph of section 11). However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation. Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice. Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, you do not qualify to receive new licenses for the same material under section 10. 9. Acceptance Not Required for Having Copies. You are not required to accept this License in order to receive or run a copy of the Program. Ancillary propagation of a covered work occurring solely as a consequence of using peer-to-peer transmission to receive a copy likewise does not require acceptance. However, nothing other than this License grants you permission to propagate or modify any covered work. These actions infringe copyright if you do not accept this License. Therefore, by modifying or propagating a covered work, you indicate your acceptance of this License to do so. 10. Automatic Licensing of Downstream Recipients. Each time you convey a covered work, the recipient automatically receives a license from the original licensors, to run, modify and propagate that work, subject to this License. You are not responsible for enforcing compliance by third parties with this License. An "entity transaction" is a transaction transferring control of an organization, or substantially all assets of one, or subdividing an organization, or merging organizations. If propagation of a covered work results from an entity transaction, each party to that transaction who receives a copy of the work also receives whatever licenses to the work the party's predecessor in interest had or could give under the previous paragraph, plus a right to possession of the Corresponding Source of the work from the predecessor in interest, if the predecessor has it or can get it with reasonable efforts. You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License. For example, you may not impose a license fee, royalty, or other charge for exercise of rights granted under this License, and you may not initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging that any patent claim is infringed by making, using, selling, offering for sale, or importing the Program or any portion of it. 11. Patents. A "contributor" is a copyright holder who authorizes use under this License of the Program or a work on which the Program is based. The work thus licensed is called the contributor's "contributor version". A contributor's "essential patent claims" are all patent claims owned or controlled by the contributor, whether already acquired or hereafter acquired, that would be infringed by some manner, permitted by this License, of making, using, or selling its contributor version, but do not include claims that would be infringed only as a consequence of further modification of the contributor version. For purposes of this definition, "control" includes the right to grant patent sublicenses in a manner consistent with the requirements of this License. Each contributor grants you a non-exclusive, worldwide, royalty-free patent license under the contributor's essential patent claims, to make, use, sell, offer for sale, import and otherwise run, modify and propagate the contents of its contributor version. In the following three paragraphs, a "patent license" is any express agreement or commitment, however denominated, not to enforce a patent (such as an express permission to practice a patent or covenant not to sue for patent infringement). To "grant" such a patent license to a party means to make such an agreement or commitment not to enforce a patent against the party. If you convey a covered work, knowingly relying on a patent license, and the Corresponding Source of the work is not available for anyone to copy, free of charge and under the terms of this License, through a publicly available network server or other readily accessible means, then you must either (1) cause the Corresponding Source to be so available, or (2) arrange to deprive yourself of the benefit of the patent license for this particular work, or (3) arrange, in a manner consistent with the requirements of this License, to extend the patent license to downstream recipients. "Knowingly relying" means you have actual knowledge that, but for the patent license, your conveying the covered work in a country, or your recipient's use of the covered work in a country, would infringe one or more identifiable patents in that country that you have reason to believe are valid. If, pursuant to or in connection with a single transaction or arrangement, you convey, or propagate by procuring conveyance of, a covered work, and grant a patent license to some of the parties receiving the covered work authorizing them to use, propagate, modify or convey a specific copy of the covered work, then the patent license you grant is automatically extended to all recipients of the covered work and works based on it. A patent license is "discriminatory" if it does not include within the scope of its coverage, prohibits the exercise of, or is conditioned on the non-exercise of one or more of the rights that are specifically granted under this License. You may not convey a covered work if you are a party to an arrangement with a third party that is in the business of distributing software, under which you make payment to the third party based on the extent of your activity of conveying the work, and under which the third party grants, to any of the parties who would receive the covered work from you, a discriminatory patent license (a) in connection with copies of the covered work conveyed by you (or copies made from those copies), or (b) primarily for and in connection with specific products or compilations that contain the covered work, unless you entered into that arrangement, or that patent license was granted, prior to 28 March 2007. Nothing in this License shall be construed as excluding or limiting any implied license or other defenses to infringement that may otherwise be available to you under applicable patent law. 12. No Surrender of Others' Freedom. If conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot convey a covered work so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not convey it at all. For example, if you agree to terms that obligate you to collect a royalty for further conveying from those to whom you convey the Program, the only way you could satisfy both those terms and this License would be to refrain entirely from conveying the Program. 13. Use with the GNU Affero General Public License. Notwithstanding any other provision of this License, you have permission to link or combine any covered work with a work licensed under version 3 of the GNU Affero General Public License into a single combined work, and to convey the resulting work. The terms of this License will continue to apply to the part which is the covered work, but the special requirements of the GNU Affero General Public License, section 13, concerning interaction through a network will apply to the combination as such. 14. Revised Versions of this License. The Free Software Foundation may publish revised and/or new versions of the GNU General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies that a certain numbered version of the GNU General Public License "or any later version" applies to it, you have the option of following the terms and conditions either of that numbered version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the GNU General Public License, you may choose any version ever published by the Free Software Foundation. If the Program specifies that a proxy can decide which future versions of the GNU General Public License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Program. Later license versions may give you additional or different permissions. However, no additional obligations are imposed on any author or copyright holder as a result of your choosing to follow a later version. 15. Disclaimer of Warranty. THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. Limitation of Liability. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 17. Interpretation of Sections 15 and 16. If the disclaimer of warranty and limitation of liability provided above cannot be given local legal effect according to their terms, reviewing courts shall apply local law that most closely approximates an absolute waiver of all civil liability in connection with the Program, unless a warranty or assumption of liability accompanies a copy of the Program in return for a fee. END OF TERMS AND CONDITIONS espeakedit-1.46.02/ReadMe000066400000000000000000000047571171104445100150750ustar00rootroot00000000000000ESPEAKEDIT This is the initial sourceforge release of the espeakedit program. It is used to prepare and compile phoneme data for the eSpeak speech synthesier. See docs/editor.html In addition to its own functions, espeakedit is compiled including the source files from the speak program. So it uses the same synthesis routines when it produced sound from phoneme data. Sometimes I change the format of the compiled phoneme or dictionary data that espeakedit produces and that speak uses. So it's important to have compatible versions of the two programs. DEPENDENCIES espeakedit has a GUI by using the wxWidgets library (wxGTK version 2.6) (www.wxwidgets.org). It needs the following packages (these are the names from the Ubuntu "Dapper" repository): To run: libwxgtk2.6-0 libportaudio0 sox To compile, it also needs: libwxgtk2.6-dev libportaudio-dev The binary has been compiled to use V18 of the PortAudio library. If you have V19 you will need to recompile espeakedit, after copying portaudio19.h to replace the original portaudio.h file in the src directory. COMPILING I have now included a Makefile in the src directory, so it should compile, if the wxWigets include files (from the libwxgtk2.6-dev package) are present, by using: make DOCUMENTATION Not much yet. docs/editor.html is a quick run through of the main functions. docs/editor-if.html gives some details of the user interface. If you want to use it to add or improve phoneme data, and you have questions, please email me. DATA Directory "phsource" contains the master phonemes file: "phonemes", the additional phoneme files for various languages, and all the sound files needed to compile the phoneme data into espeak-dara/phondata, phontab, phonindex. PRAAT I use praat (from www.praat.org) to view and analyse speech output, either sound recordings, or output from the eSpeak synthesizer. I make a modification to Praat to add a function to analyse a sample of speech and produce a file containing sequence of time-slice spectra which you can load into espeakedit to display and use as a basic for matching or creating vowel and other voiced sounds. Detials of the modification are in the directory praat-mod. Unfortunately I am currently unable to compile and run praat on my computer, even an unmodified copy. It compiles OK, but won't run (gets stuck with 100% CPU usage without displaying any GUI). I don't know why this is. I have an old binary (build before I last upgraded by Linux) and that runs OK. espeakedit-1.46.02/docs/000077500000000000000000000000001171104445100147305ustar00rootroot00000000000000espeakedit-1.46.02/docs/add_language.html000077500000000000000000000234011171104445100202140ustar00rootroot00000000000000

6. ADDING OR IMPROVING A LANGUAGE


Most of the work doesn't need any programming knowledge. Just an understanding of the language, an awareness of its features, patience and attention to detail. Wikipedia is a good source of basic phonetic information, eg http://en.wikipedia.org/wiki/Vowel.

In many cases it should be fairly easy to add a rough implementation of a new language, hopefully enough to be intelligible. After that it's a gradual process of improvement.


6.1 Language Code

Generally, the language's international ISO 639-1 code is used to identify the language. It is used in the filenames which contain the language's data. In the examples below the code "fr" is used as an example. Replace this with the code of your language.

If the language does not have a 2-letter ISO_639-1 code, then use the 3-letter ISO_639-3 code. Language codes may differ from country codes.

It is possible to have different variants of a language for different dialects. For example the sound of some phonemes are changed, or some of the pronunciation rules differ.


6.2 Language Files

The following files are needed for your language.

The fr_rules and fr_list files are compiled to produce the file espeak-data/fr_dict, which eSpeak uses when it is speaking.


6.3 Voice File

Each language needs a voice file in espeak-data/voices or espeak-data/voices/test. The filename of the default voice for a language should be the same as the language code (eg. "fr" for French).

Details of the contents of voice files are given in voices.html.

The simplest voice file would contain just 2 lines to give the language name and language code, eg:

  name french
  language fr

This language code specifies which phoneme table and dictionary to use (i.e. phonemetable fr and espeak-data/fr_dict) to be used. If needed, these can be overridden by phonemes and dictionary attributes in the voice file. For example you may want to start the implementation of a new language by using the phoneme table of an existing language.


6.4 Phoneme Definition File

You must first decide on the set of phonemes (vowel and consonant sounds) for the language. These should be defined in a phoneme definition file ph_xxxx, where "ph_xxxx" is the name of your language. A reference to this file is then included at the end of the master phoneme file, phsource/phonemes, eg:

  phonemetable  fr  base
  include  ph_french

This example defines a phoneme table "fr" which inherits the contents of phoneme table "base". Its contents are found in the file ph_french.

The base phoneme table contains definitions of a basic set of consonants, and also some "control" phonemes such as stress marks and pauses. These are defined in phsource/phonemes. The phoneme table for a language will inherit these, or alternatively it may inherit the phoneme table of another language which in turn inherits the base phoneme table.

The phonemes file for the language defines those additional phonemes which are not inherited (generally the vowels and diphthongs, plus any additional consonants that are needed), or phonemes whose definitions differ from the inherited version (eg. the redefinition of a consonant).

Details of phonemes files are given in phontab.html.

The Compile phoneme data function of the espeakedit program compiles the phonemes files of all languages to produce the files espeak-data/phontab, phonindex, and phondata which are used by eSpeak.

For many languages, the consonant phonemes which are already available in eSpeak, together with the available vowel files which can be used to define vowel phonemes, will be sufficient. At least for an initial implementation.


6.5 Dictionary Files

Once the language's phonemes have been defined, then pronunciation dictionary data can be produced in order to translate the language's source text into phonemes. This consists of two source files: fr_rules (the spelling to phoneme rules) and fr_list (an exceptions list, and attributes of certain words). The corresponding compiled data file is espeak-data/fr_dict which is produced from fr_rules and fr_list sources by the command:

espeak --compile=fr.

Or by using the espeakedit program.

Details of the contents of the dictionary files are given in dictionary.html.

The fr_list file contains:


6.6 Program Code

The behaviour of the eSpeak program is controlled by various options such as:

The function SetTranslator() at the start of the source code file tr_languages.cpp recognizes the language code and sets the appropriate options. For a new language, you would add its language code and the required options in SetTranslator(). However, this may not be necessary during testing because most of the options can also be set in the voice file in espeak-data/voices (see Voice files).


6.7 Improving a Language

Listen carefully to the eSpeak voice. Try to identify what sounds wrong and what needs to be improved.

If you are interested in working on a language, please contact me so that I can set up the initial data and discuss the features of the language.

For most of the eSpeak voices, I do not speak or understand the language, and I do not know how it should sound. I can only make improvements as a result of feedback from speakers of that language. If you want to help to improve a language, listen carefully and try to identify individual errors, either in the spelling-to-phoneme translation, the position of stressed syllables within words, or the sound of phonemes, or problems with rhythm and vowel lengths.


espeakedit-1.46.02/docs/analyse.html000066400000000000000000000124421171104445100172550ustar00rootroot00000000000000 Back

ANALYSIS


(Further notes are needed)

Recordings of spoken words and phrases can be analysed to try and make eSpeak match a language more closely. Unlike most other (larger and better quality) synthesizers, eSpeak's data is not produced directly from recorded sounds. To use an analogy, it's like a drawing or sketch compared with a photograph. Or vector graphics compared with a bitmap image. It's smaller, less accurate, with less subtlety, but it can sometimes show some aspects of the picture more clearly than a more accurate image.

Recording Sounds

Recordings should be made while speaking slowly, clearly, and firmly and loudly (but not shouting). Speak about half a metre from the microphone. Try to avoid background noise and hum interference from electrical power cables.

Praat

I use a modified version of the praat program (www.praat.org) to view and analyse both sound recordings and output from eSpeak. The modification adds a new function (Spectrum->To_eSpeak) which analysis a voiced sound and produces a file which can be loaded into espeakedit. Details of the modification are in the "praat-mod" directory in the espeakedit package. The analysis contains a sequence of frames, one per cycle at the speech's fundamental frequency. Each frame is a short time spectrum, together with praat's estimation of the f1 to f5 formant frequencies at the time of that cycle. I also use Praat's New->Record_mono_sound function to make sound recordings.

Vowels and Diphthongs

Analysing a Recording

Make a recording, with a male voice, and trim it in Praat to keep just the required vowel sound. Then use the new Spectrum->To_eSpeak modification (this was named To_Spectrogram2 in earlier versions) to analyse the sound. It produces a file named "spectrum.dat". Load the "spectrum.dat" file into espeakedit. Espeakedit has two Open functions, File->Open and File->Open2. They are the same, except that they remember different paths. I generally use File->Open2 for reading the "spectrum.dat" file. The data is displayed in espeakedit as a sequence of spectrum frames (see editor.html).

Tone Quality

It can be difficult to match the tonal quality of a new vowel to be compatible with existing vowel files. This is determined by the relative heights and widths of the formant peaks. These vary depending on how the recording was made, the microphone, and the strength and tone of the voice. Also the positions of the higher peaks (F3 upwards) can vary depending on the characteristics of the speaker's voice. Formant peaks correspond to resonances within the mouth and throat, and they depend on its size and shape. With a female voice, all the formants (F1 upwards) are generally shifted to higher frequencies. For these reasons, it's best to use a male voice, and to use its analysed spectra only as guidance. Rather than construct formant-peaks entirely to match the analysed data, instead copy keyframes from a similar existing vowel. Then make small adjustments to match the position of the F1, F2, F3 formant peaks and hopefully produce the required vowel sound.

Using an Existing Vowel File

Choose a similar vowel file from phsource/vowel and open it into espeakedit. It may be useful to use phsource/vowel/vowelchart as a map to show how vowel files compare with each other. You can select a keyframe from the vowel file and use CTRL-C and CTRL-V to copy the green formant peaks onto a frame of the new spectrum sequence. Then adjust the peaks to match the new frame. Press F1 to hear the sound of the formant peaks in the selected frame. The F0 peak is provided in order to adjust the correct balance of low frequencies, below the F1 peak. If the sound is too muffled, or conversely, too "thin", try adjusting the amplitude or position of the F0 peak.

Length and Amplitude

Use an existing vowel file as a guide for how to set the amplitude and length of the keyframes. At the right of each keyframe, its length is shown in mS and under that is its relative (RMS) amplitude. The second keyframe should be marked with a red marker (use CTRL-M to toggle this). This divides the vowel into the front-part (with one frame), and the rest. Use F2 to play the sound of the new vowel sequence. It will also produce a WAV file (the default name is speech.wav) which you can read into praat to see whether it has a sensible shape.

Using the New Vowel

Make a new directory (eg. vwl_xx) in phsource for your new vowels. Save the spectrum sequence with a name which you have chosen for it. You can then edit the phoneme file for your language (eg. phsource/ph_xxx), and change a phoneme to refer to your new vowel file. Then do Data->Compile_Phoneme_Data from espeakedit's menubar to re-compile the phoneme data. espeakedit-1.46.02/docs/commands.html000066400000000000000000000275001171104445100174230ustar00rootroot00000000000000 eSpeak Speech Synthesizer Back

2.1 INSTALLATION


2.1.1 Linux and other Posix systems

There are two versions of the command line program. They both have the same command parameters (see below).
  1. espeak uses speech engine in the libespeak shared library. The libespeak library must first be installed.

  2. speak is a stand-alone version which includes its own copy of the speech engine.
Place the espeak or speak executable file in the command path, eg in /usr/local/bin

Place the "espeak-data" directory in /usr/share as /usr/share/espeak-data.
Alternatively if it is placed in the user's home directory (i.e. /home/<user>/espeak-data) then that will be used instead.

Dependencies

espeak uses the PortAudio sound library (version 18), so you will need to have the libportaudio0 library package installed. It may be already, since it's used by other software, such as OpenOffice.org and the Audacity sound editor.

Some Linux distrubitions (eg. SuSe 10) have version 19 of PortAudio which has a slightly different API. The speak program can be compiled to use version 19 of PortAudio by copying the file portaudio19.h to portaudio.h before compiling.

The speak program may be compiled without using PortAudio, by removing the line

   #define USE_PORTAUDIO
in the file speech.h.

 


2.1.2 Use with KDE Text-to-Speech (KTTS)

To add to KDE-Text-to-Speech Manager (KTTSMgr), use it as a "Command" talker with "command for speaking texts" set to:
cat %f | espeak --stdin -w %w -v en -s190

In this example, "en" is the voice name, "190" is the speed.

Note:

 


2.1.3 Windows

The installer: setup_espeak.exe installs the SAPI5 version of eSpeak. It also installs a command line program espeak in the espeak directory.

 


2.2 COMMAND OPTIONS


2.2.1 Examples

To use at the command line, type:
  espeak "This is a test"
or
  espeak -f <text file>

Or just type
  espeak
followed by text on subsequent lines. Each line is spoken when RETURN is pressed.

Use espeak -x to see the corresponding phoneme codes.

 


2.2.2 The Command Line Options

espeak [options] ["text words"]
Text input can be taken either from a file, from a string in the command, or from stdin.

-f <text file>
Speaks a text file.

--stdin
Takes the text input from stdin.

If neither -f nor --stdin is given, then the text input is taken from "text words" (a text string within double quotes).
If that is not present then text is taken from stdin, but each line is treated as a separate sentence.

-a <integer>
Sets amplitude (volume) in a range of 0 to 200. The default is 100.

-p <integer>
Adjusts the pitch in a range of 0 to 99. The default is 50.

-s <integer>
Sets the speed in words-per-minute (approximate values for the default English voice, others may differ slightly). The default value is 175. I generally use a faster speed of 200. Range 80 to 450. Larger value are rounded down to the maximum.

-b <integer>
Input text character format.

1   UTF-8. This is the default.

2   The 8-bit character set which corresponds to the language (eg. Latin-2 for Polish).

4   16 bit Unicode.

Without this option, eSpeak assumes text is UTF-8, but will automatically switch to the 8-bit character set if it finds an illegal UTF-8 sequence.

-g <integer>
Word gap. This option inserts a pause between words. The value is the length of the pause, in units of 10 mS (at the default speed of 170 wpm).

-h or --help
The first line of output gives the eSpeak version number.

-k <integer>
Indicate words which begin with capital letters.

1   eSpeak uses a click sound to indicate when a word starts with a capital letter, or double click if word is all capitals.

2   eSpeak speaks the word "capital" before a word which begins with a capital letter.

Other values:   eSpeak increases the pitch for words which begin with a capital letter. The greater the value, the greater the increase in pitch. Try -k20.

-l <integer>
Line-break length, default value 0. If set, then lines which are shorter than this are treated as separate clauses and spoken separately with a break between them. This can be useful for some text files, but bad for others.

-m
Indicates that the text contains SSML (Speech Synthesis Markup Language) tags or other XML tags. Those SSML tags which are supported are interpreted. Other tags, including HTML, are ignored, except that some HTML tags such as <hr> <h2> and <li> ensure a break in the speech.

-q
Quiet. No sound is generated. This may be useful with options such as -x and --pho.

-v <voice filename>[+<variant>]
Sets a Voice for the speech, usually to select a language. eg:
   espeak -vaf
To use the Afrikaans voice. A modifier after the voice name can be used to vary the tone of the voice, eg:
   espeak -vaf+3
The variants are +m1 +m2 +m3 +m4 +m5 +m6 +m7 for male voices and +f1 +f2 +f3 +f4 which simulate female voices by using higher pitches. Other variants include +croak and +whisper.

<voice filename> is a file within the espeak-data/voices directory.
<variant> is a file within the espeak-data/voices/!v directory.

Voice files can specify a language, alternative pronunciations or phoneme sets, different pitches, tonal qualities, and prosody for the voice. See the voices.html file.

Voice names which start with mb- are for use with Mbrola diphone voices, see mbrola.html

Some languages may need additional dictionary data, see languages.html

-w <wave file>
Writes the speech output to a file in WAV format, rather than speaking it.

-x
The phoneme mnemonics, into which the input text is translated, are written to stdout.

-X
As -x, but in addition, details are shown of the pronunciation rule and dictionary list lookup. This can be useful to see why a certain pronunciation is being produced. Each matching pronunciation rule is listed, together with its score, the highest scoring rule being used in the translation. "Found:" indicates the word was found in the dictionary lookup list, and "Flags:" means the word was found with only properties and not a pronunciation. You can see when a word has been retranslated after removing a prefix or suffix.

-z
The option removes the end-of-sentence pause which normally occurs at the end of the text.

--stdout
Writes the speech output to stdout as it is produced, rather than speaking it. The data starts with a WAV file header which indicates the sample rate and format of the data. The length field is set to zero because the length of the data is unknown when the header is produced.

--compile [=<voice name>]
Compile the pronunciation rule and dictionary lookup data from their source files in the current directory. The Voice determines which language's files are compiled. For example, if it's an English voice, then en_rules, en_list, and en_extra (if present), are compiled to replace en_dict in the speak-data directory. If no Voice is specified then the default Voice is used.

--compile-debug [=<voice name>]
The same as --compile, but source line numbers from the *_rules file are included. These are included in the rules trace when the -X option is used.

--ipa
Writes phonemes to stdout, using the International Phonetic Alphabet (IPA).

--path [="<directory path>"]
Specifies the directory which contains the espeak-data directory.

--pho
When used with an mbrola voice (eg. -v mb-en1), it writes mbrola phoneme data (.pho file format) to stdout. This includes the mbrola phoneme names with duration and pitch information, in a form which is suitable as input to this mbrola voice. The --phonout option can be used to write this data to a file.

--phonout [="<filename>"]
If specified, the output from -x, -X, --ipa, and --pho options is written to this file, rather than to stdout.

--punct [="<characters>"]
Speaks the names of punctuation characters when they are encountered in the text. If <characters> are given, then only those listed punctuation characters are spoken, eg. --punct=".,;?"

--split [=<minutes>]
Used with -w, it starts a new WAV file every <minutes> minutes, at the next sentence boundary.

--voices [=<language code>]
Lists the available voices.
If =<language code> is present then only those voices which are suitable for that language are listed.
--voices=mbrola lists the voices which use mbrola diphone voices. These are not included in the default --voices list
--voices=variant lists the available voice variants (voice modifiers).

 


2.2.3 The Input Text

HTML Input
If the -m option is used to indicate marked-up text, then HTML can be spoken directly.

Phoneme Input
As well as plain text, phoneme mnemonics can be used in the text input to espeak. They are enclosed within double square brackets. Spaces are used to separate words and all stressed syllables must be marked explicitly.

  eg:   espeak -v en "[[D,Is Iz sVm f@n'EtIk t'Ekst 'InpUt]]"

This command will speak: "This is some phonetic text input".


SourceForge.net Logo espeakedit-1.46.02/docs/dictionary.html000066400000000000000000000633641171104445100177770ustar00rootroot00000000000000 eSpeak: Pronunciation Dictionaries Back

4. TEXT TO PHONEME TRANSLATION


4.1 Translation Files

There is a separate set of pronunciation files for each language, their names starting with the language name.

There are two separate methods for translating words into phonemes:

These two files are compiled into the file <language>_dict  in the espeak-data directory (eg. espeak-data/en_dict)

 


4.2 Phoneme names

Each of the language's phonemes is represented by a mnemonic of 1, 2, 3, or 4 characters. Together with a number of utility codes (eg. stress marks and pauses), these are defined in the phoneme data file (see *spec not yet available*).

The utility 'phonemes' are:

It is not necessary to specify the stress of every syllable. Stress markers are only needed in order to change the effect of the language's default stress rule.

The phonemes which are used to represent a language's sounds are based loosely on the Kirshenbaum ascii character representation of the International Phonetic Alphabet www.kirshenbaum.net/IPA/ascii-ipa.pdf

 


4.3 Pronunciation Rules

The rules in the <language>_rules  file specify the phonemes which are used to pronounce each letter, or sequence of letters. Some rules only apply when the letter or letters are preceded by, or followed by, other specified letters.

To find the pronunciation of a word, the rules are searched and any which match the letters at the in the word are given a score depending on how many letters are matched. The pronunciation from the best matching rule is chosen. The pointer into the source word is then advanced past those letters which have been matched and the process is repeated until all the letters of the word have been processed.

4.3.1 Rule Groups

The rules are organized in groups, each starting with a ".group" line: When matching a word, firstly the 2-letter group for the two letters at the current position in the word (if such a group exists) is searched, and then the single-letter group. The highest scoring rule in either of those two groups is used.

4.3.2 Rules

Each rule is on separate line, and has the syntax: eg. "oo" is pronounced as [u:], but when also preceded by "b" and followed by "k", it is pronounced [U].

In the case of a single-letter group, the first character of <match> much be the group letter. In the case of a 2-letter group, the first two characters of <match> must be the group letters. The second and third rules above may be in either .group o or .group oo

Alphabetic characters in the <pre>, <match>, and <post> parts must be lower case, and matching is case-insensitive. Some upper case letters are used in <pre> and <post> with special meanings.

4.3.3 Special characters in <phoneme string>:

4.3.4 Special Characters in both <pre> and <post>:

The sets of letters indicated by A, B, C, E, F G may be defined differently for each language.

Examples of rules:

     _)  a         // "a" at the start of a word
         a (CC     // "a" followed by two consonants
         a (C%     // "a" followed by a double consonant (the same letter twice)
         a (/%     // "a" followed by a percent sign
     %C) a         // "a" preceded by a double consonants

4.3.5 Special characters only in <pre>:

eg.
     @@)  bi      // "bi" preceded by at least two syllables
     @@a) bi      // "bi" preceded by at least 2 syllables and following 'a'
Note, that matching characters in the <pre> part do not affect the syllable counting.

4.3.6 Special characters only in <post>:

eg.
   @) ly (_$2   lI      // "ly", at end of a word with at least one other
                        //   syllable, is a suffix pronounced [lI].  Remove
                        //   it and retranslate the word.

   _) un (@P2   ¬Vn     // "un" at the start of a word is an unstressed
                        //   prefix pronounced [Vn]
   _) un (i     ju:     // ... except in words starting "uni"
   _) un (inP2  ,Vn     // ... but it is for words starting "unin"
S and P must be at the end of the <post> string.

S<number> may be followed by additonal letters (eg. S2ei ). Some of these are probably specific to English, but similar functions could be made for other languages.

P<number> may be followed by additonal letters (eg. P3v ).

 


4.4 Pronunciation Dictionary List

The <language>_list  file contains a list of words whose pronunciations are given explicitly, rather than determined by the Pronunciation Rules. The <language>_extra  file, if present, is also used and it's contents are taken as coming after those in <language>_list.

Also the list can be used to specify the stress pattern, or other properties, of a word.

If the Pronunciation rules are applied to a word and indicate a standard prefix or suffix, then the word is again looked up in Pronunciation Dictionary List after the prefix or suffix has been removed.

Lines in the dictionary list have the form:

eg.
     book      bUk
Rather than a full pronunciation, just the stress may be given, to change where it would be otherwise placed by the Pronunciation Rules:
     berlin       $2      // stress on second syllable
     absolutely   $3      // stress on third syllable
     for          $u      // an unstressed word

4.4.1 Multiple Words

A pronunciation may also be specified for a group of words, when these appear together. Up to four words may be given, enclosed in brackets. This may be used for change the pronunciation or stress pattern when these words occur together,
    (de jure)    deI||dZ'U@rI2   // note || used as a word break in the phoneme string
or to run them together, pronounced as a single word
    (of a)       @v@
or to give them a flag when they occur together
    (such as)    sVtS||a2z   $pause	   // precede with a pause
Hyphenated words in the <language>_list  file must also be enclosed within brackets, because the two parts are considered as separate words.

4.4.2 Special characters in <phoneme string>:

4.4.3 Flags

A word (or group of words) may be given one or more flags, either instead of, or as well as, the phonetic translation. The last group are probably English specific, but something similar may be useful in other languages. They are a crude attempt to improve the accuracy of pairs like ob'ject (verb) v 'object (noun) and read (present) v read (past).

The dictionary list is searched from bottom to top. The first match that satisfies any conditions is used (i.e. the one lowest down the list). So if we have:

    to    t@               // unstressed version
    to    tu:   $atend     // stressed version
then if "to" is at the end of the clause, we get [tu:], if not then we get [t@].

4.4.4 Translating a Word to another Word

Rather than specifying the pronunciation of a word by a phoneme string, you can specify another "sounds like" word.

Use the attribute $text eg.

    cough    coff   $text
Alternatively, use the command $textmode on a line by itself to turn this on for all subsequent entries in the file, until it's turned off by $phonememode. eg.

    $textmode
    cough     coff
    through   threw
    $phonememode
This feature cannot be used for the special entries in the _list files which start with an underscore, such as numbers.

Currently "textmode" entries are only recognized for complete words, and not for for stems from which a prefix or suffix has been removed (eg. the word "coughs" would not match the example above).

 


4.5 Conditional Rules

Rules in a _rules file and entries in a _list file can be made conditional. They apply only to some voices. This can be useful to specify different pronunciations for different variants of a language (dialects or accents).

Conditional rules have   ?   and a condition number at the start if the line in the _rules or _list file. This means that the rule only applies of that condition number is specified in a dictrules line in the voice file.

If the rule starts with   ?!   then the rule only applies if the condition number is not specified in the voice file. eg.

   ?3     can't     kant    // only use this if the voice has:  dictrules 3
   ?!3    rather    rA:D3   // only use if the voice doesn't have:  dictrules 3

 


4.6 Numbers and Character Names

4.6.1 Letter names

The names of individual letters can be given either in the _rules or _list file. Sometimes an individual letter is also used as a word in the language and its pronunciation as a word differs from its letter name. If so, it should be listed in the _list file, preceded by an underscore, to give the letter name (as distinct from its pronunciation as a word). eg. in English:
   _a   eI

4.6.2 Numbers

The operation the TranslateNumber() function is controlled by the language's langopts.numbers option. This constructs spoken numbers from fragments according to various options which can be set for each language. The number fragments are given in the _list file.

 


4.7 Character Substitution

Character substitutions can be specified by using a .replace section at the start of the _rules file. Each line specified either one or two alphabetic characters to be replaced by another one or two alphabetic characters. This substitution is done to a word before it is translated using the spelling-to-phoneme rules. Only the lower-case version of the characters needs to be specified. eg.

  .replace
    ô   ő   // (Hungarian) allow the use of o-circumflex instead of o-double-accute
    û   ű

    cx   ĉ   // (Esperanto) allow "cx" as an alternative to c-circumflex

    fi   fi   // replace a single character ligature by two characters

espeakedit-1.46.02/docs/docindex.html000066400000000000000000000047021171104445100174160ustar00rootroot00000000000000 eSpeak Speech Synthesizer
SourceForge.net Logo

eSpeak - Documents

Home

Usage

Languages

Voice Files

Voice files specify a language and other characteristics of a voice.

Mbrola Voices

eSpeak can be used as a front-end for Mbrola diphone voices.

Pronunciation Dictionary

  • How to add pronunciation corrections.
  • How to build up pronunciation rules for a new language.

Adding a Language

How to add or improve a language.

Phonemes

The list of phoneme mnemonics for English, for use in the Pronunciation Dictionary.

Phoneme Tables

The tables of the phonemes used by each language, with their properties and sound production.

Intonation

Different intonation "tunes" may be defined for different languages for clauses which end in full-stop, comma, question-mark, and exclamation-mark.

eSpeak Libary API

API definition and header file for a shared library version of eSpeak.

Markup tags

SSML (Speech Synthesis Markup Language) and HTML tags recognized by eSpeak.

The espeakedit program

GUI software to edit vowel files and to compile the phoneme data for use by eSpeak.
espeakedit-1.46.02/docs/editor.html000066400000000000000000000200001171104445100170740ustar00rootroot00000000000000 espeakedit Back


ESPEAKEDIT PROGRAM


The espeakedit program is used to prepare phoneme data for the eSpeak speech synthesizer.

It has two main functions:


Installation

espeakedit needs the following packages:
(The package names mentioned here are those from the Ubuntu "Dapper" Linux distribution). In addition, a modified version of praat (www.praat.org) is used to view and analyse WAV sound files. This needs the package libmotif3 to run and libmotif-dev to compile.

Quick Guide

This will quickly illustrate the main features. Details of the interface and key commands are given in editor_if.html

For more detailed information on analysing sound recordings and preparing phoneme definitions and keyframe data see analyse.html (to be written).

Compiling Phoneme Data

  1. Run the espeakedit program.

  2. Select Data->Compile phoneme data from the menu bar. Dialog boxes will ask you to locate the directory (phsource) which contains the master phonemes file, and the directory (dictsource,) which contains the dictionary files (en_rules, en_list, etc). Once specified, espeakedit will remember their locations, although they can be changed later from Options->Paths.

  3. A message in the status line at the bottom of the espeakedit window will indicate whether there are any errors in the phoneme data, and how many language's dictionary files have been compiled. The compiled data is placed into the espeak-data directory, ready for use by the speak program. If errors are found in the phoneme data, they are listed in a file error_log in the phsource directory.
  4. NOTE: espeakedit can be used from the command line to compile the phoneme data, with the command: espeakedit --compile

  5. Select Tools->Make vowels chart->From compiled phoneme data. This will look for the vowels in the compiled phoneme data of each language and produce a vowel chart (.png file) in phsource/vowelcharts. These charts plot the vowels' F1 (formant 1) frequency against their F2 frequency, which corresponds approximately to their open/close and front/back positions. The colour in the circle for each vowel indicates its F3 frequency, red indicates a low F3, through yellow and green to blue and violet for a high F3. In the case of a diphthong, a line is drawn from the circle to the position of the end of the vowel.

Keyframe Sequences

  1. Select File->Open from the menu bar and select a vowel file, phsource/vowel/a. This will open a tab in the espeakedit window which contains a sequence of 4 keyframes. Each keyframe shows a black graph, which is the outline of an original analysed spectrum from a sound recording, and also a green line, which shows the formant peaks which have been added (using the black graph as a guide) and which produce the sound.

  2. Click in the "a" tab window and then press the F2 key. This will produce and play the sound of the keyframe sequence. The first time you do this, you'll get a save dialog asking where you want the WAV file to be saved. Once you give a location all future sounds will be stored in that same location, although it can be changed from Options->Paths.

  3. Click on the second of the four frames, the one with the red square. Press F1. That plays the sound of just that frame.

  4. Press the 1 (number one) key. That selects formant F1 and a red triangle appears under the F1 formant peak to indicate that it's selected. Also an = sign appears next to formant 1 in the formants list in the left panel of the window.

  5. Press the left-arrow key a couple of times to move the F1 peak to the left. The red triangle and its associated green formant peak moves lower frequency. Its numeric value in the formants list in the left panel decreases.

  6. Press the F1 key again. The frame will give a slightly different vowel sound. As you move the F1 peak slightly up and down and then press F1 again, the sound changes. Similarly if you press the 2 key to select the F2 formant, then moving that will also change the sound. If you move the F1 peak down to about 700 Hz (and reduce its height a bit with the down-arrow key) and move F2 up to 1400 Hz, then you'll hear a "er" schwa [@] sound instead of the original [a].

  7. Select File->Open and choose phsource/vowel/aI. This opens a new tab labelled "aI" which contains more frames. This is the [aI] diphthong and if you click in the tab window and press F2 you'll hear the English word "eye". If you click on each frame in turn and press F1 then you can hear each of the keyframes in turn. They sound different, starting with an [A] sound (as in "palm"), going through something like [@] in "her" and ending with something like [I] in "kit" (or perhaps a French é). Together they make the diphthong [aI].

Text and Prosody Windows

  1. Click on the Text tab in the left panel. Two text windows appear in the panel with buttons Translate and Speak below them.

  2. Type some text into the top window and click the Translate button. The phonetic translation will appear in the lower window.

  3. Click the Speak button. The text will be spoken and a Prosody tab will open in the main window.

  4. Click on a vowel phoneme which is displayed in the Prosody tab. A red line appears under it to indicate that it has been selected.

  5. Use the up-arrow or down-arrow key to move the vowel's blue pitch contour up or down. Then click the Speak button again to hear the effect of the altered pitch. If the adjacent phoneme also has a pitch contour then you may hear a discontinuity in the sound if it no longer matches with the one which you have moved.

  6. Hold down the Ctrl key while using the up-arrow or down-arrow keys. The gradient of the pitch contour will change.

  7. Click with the right mouse button over a phoneme. A menu allows you to select a different pitch envelope shape. Details of the currently selected phoneme appear in the Status line at the bottom of the window. The Stress number gives the stress level of the phoneme (see voices.html for a list).

  8. Click the Translate button. This re-translates the text and restores the original pitches.

  9. Click on a vowel phoneme in the Prosody window and use the < and > keys to shorten or lengthen it.

The Prosody window can be used to experiment with different phoneme lengths and different intonation.


espeakedit-1.46.02/docs/editor_if.html000066400000000000000000000156711171104445100175740ustar00rootroot00000000000000 Editor - Spectrum Back

USER INTERFACE - FORMANT EDITOR


Frame Sequence Display

The eSpeak editor can display a number of frame-sequencies in tabbed windows. Each frame can contain a short-time frequency spectrum, covering the period of one cycle at the sound's pitch. Frames can also show:

Text Tab

Enter text in the top left text window. Click the Translate button to see the phonetic transcription in the text window below. Then click the Speak button to speak the text and show the results in the Prosody tab, if that is open.

If changes are made in the Prosody tab, then clicking Speak will speak the modified prosody while Translate will revert to the default prosody settings for the text.

To enter phonetic symbols (Kirschenbaum encoding) in the top left text window, enclose them within [[ ]].

Spect Tab

The "Spect" tab in the left panel of the eSpeak editor shows information about the currently selected frame and sequence.

Key Commands

 


USER INTERFACE - PROSODY EDITOR


espeakedit-1.46.02/docs/images/000077500000000000000000000000001171104445100161755ustar00rootroot00000000000000espeakedit-1.46.02/docs/images/lips.png000066400000000000000000000014261171104445100176550ustar00rootroot00000000000000‰PNG  IHDRÁ¬õö8*?PLTE"""333DUf""w33DU""f33wDDUUf"fw3wˆ™ª""»33ÌÝå[šYtRNS@æØfbKGDˆHxIDATxÚíÛA²Û0 Ñ>Dßÿ¬Ù¥²H%¢M3¿„˜ólSÞú°|ã}ã€r‚íŠÀz@/Áv‚?J`= ‰ð8KªàqÍ$<Ïb&áy$,DÉ,DÑDÂBM$¸A` àIL¬$Ñ@ÂR +I4°ÄDÁJ $,%1°”DóKA4°’D+Q4°Å¥ŠüÎâj >­zÀ‚í-'h9AË ZNðL´œ å-'h9AË ¾‚i‚–´œ å-'h9Áv¶l'l 0Gغòaû;w›pä“¿I8õÕ½F8¸õ.Žî»„ÃŽó„óWÄ+W½“/ë²}l‹×ì3k$ß7Ï®=?FÖÖ=#¸zšM¸[~¹n&`Û^kJíZ´¦«VøâÇp‡:³S‡ØÙÖòŧ¬[’“–-‡‘Îݧ/’"Hù 1°Æ7AÐlÓÛ7;'êÆFÛ;C­³§‚éIï÷ëõn®†wBh¼Ÿo`àÖ|b‘HAJ?ÿÓ¥Fg×Î:Asð<WþB–±ÉNÛ#Éc€Ã„¬G–SsðˆÓß0à!v.r#8¿›³+y€í„‚¦l`+ah6±í¼05\Ùuòt°ãüï(àë|MÎçvt&'Çs('Ð.€r” œÀ+˜&@9Ê PN ]åh'ÐN ]å('@9Ê ´  œíÚ ´  œå('@9v” œÝâª0ü7¶ë('@9Ê PN€nùUø·º ôT{þ¿è«öüxëYýôÂ@%kIEND®B`‚espeakedit-1.46.02/docs/images/sand-light.jpg000066400000000000000000000077211171104445100207400ustar00rootroot00000000000000ÿØÿàJFIFHHÿîAdobed@ÿÛ„      ÿÀ^^ÿÝ ÿÄ¢  s!1AQa"q2‘¡±B#ÁRÑá3bð$r‚ñ%C4S’¢²csÂ5D'“£³6TdtÃÒâ&ƒ „”EF¤´VÓU(òãóÄÔäôeu…•¥µÅÕåõfv†–¦¶ÆÖæö7GWgw‡—§·Ç×ç÷8HXhxˆ˜¨¸ÈØèø)9IYiy‰™©¹ÉÙéù*:JZjzŠšªºÊÚêúm!1AQa"q‘2¡±ðÁÑá#BRbrñ3$4C‚’S%¢c²ÂsÒ5âDƒT“ &6E'dtU7ò£³Ã()Óã󄔤´ÄÔäôeu…•¥µÅÕåõFVfv†–¦¶ÆÖæöGWgw‡—§·Ç×ç÷8HXhxˆ˜¨¸ÈØèø9IYiy‰™©¹ÉÙéù*:JZjzŠšªºÊÚêúÿÚ ?û²¨îÁˆ(è\JQƒlX/nÔ÷ðÍc’¢a’&‰U€ßˆv^#sUZ qÛÄu§m°RÚÿZ6ê ‰ÀCÄ3E´*+Cì0ÚÒ„#hã^1Џpnd–èªVªÄR¿Ö¡%Y;ÉP®@%ON îOJ|þ…Wn @¼ÞdR5#ã^àЀ|p¡Æ> /˜¿YSá¥xÕ·‘ÈøªÇ‘™ÔÌ`T0+G!¹q4ÛmÆõ¥)\ ¶2CDž¦Îád2 ’=AN‚µ¦Ç¯jâ¥yy[Æ oB]n[‘C·JW¦*¾4X%@Î#òoQøž…‰¥>È"¢´éˆÙnÐæ8š‡€"?tUÅK14¯#Óì—¶4›ÿÐû§SéHìܨef¡ÔЩJïÓè¦jÜ•Á¸¢Ç•øWŒ( ÛaNU­GZ*Ô¯ð”äDl@XÕy15QDØ0ëZàH —‘ ©‰¨U†ôz ä  ZW5H*²újTF¬+ÈñâA,ÁˆÞŸ,4¤µñ4’KñøÁfe-Ö ÐT¶ß`®° Ó³9Hx€äˆ)Õ… oÓæªà¢W)Ä$´Rd†‡¥MV•ãǨ®O%?R4c”„¡@Sõt;Vüp*êú­+sYkVRÀ‚GPq$ŒP¼šÆ}`X¨di •HÔµ>]J©‘¡ !›Š˜ø·¯ÂHjõÜTôí‚ÖŸÿÑûž¼Ìaá³ ‚AØ®ÍUPÕ(+Û§Ñ«r•A‰ÿrB( íFòâê v5úw8¡iHcŒ\$F0àªv*ä8ªøŠv­1O6žp-º²Vž©,ñe‡Jì ë쪼"U&3 DeŒ#n*JJÐýð¡aEYLÞ°@LLPS~<ª€ü]OJ`UÓ+HÊ €Y™B¸#¿PO"*GÑ…Bê°pWì-šªªî7¦)Ud1F”âT¨cê3 lÌH©­Aò­0òBÈÖ!r¦9·ÛêüoH·F aµGáˆRUÅ!¤ŒêJŸIªæŒA§ÃC·-«QÛnê–—ƒ‰!·â#ˆ*—s¸,¤WbI© ­{Sº¶ìœØ©‡’‡ û\Í%¡QZÖ¾ýœ*´Or_@Ö67fbGùì+·?0 î±ÛÑ’nŒ|*Ì­G Íj:“Ä©5&¿% ?§¶ÛmLk¢û”ë"Å_ˆNâž§âjÃztèµ§ŽØÿÐû¢¦7h¨#ã"—“‰§ÂÍSBàTm¼FjܧI"Ç z«ò (Z–¥)Ó/ ²ÀÉ2,ˆììB‡$ÚŒíN´5ßÃ]$Bä­dhÊrvaCAE ’À‘ò¯Ó5äîBc)ä¦*%Uè˱ Üþ4ïB®‚Ew$Q#u!ªÕ¥Ø‘Zo_¤` W$Œ#—”BQ€Zš1Ø“N5=>ö*ÐAV–YÔ1XŽ;¨ í·lWuÀz`€Þ˜ HŒq¤`Ðp`ªÁywûtÇ’óSXÂ,JàÝH¬ÚŒ 7BCt$£Çï›T~e v”‰V. t5 zö­%zH§ìD#ΊÐU;×éÀ—ÿÑû°@*– /Õ™I"¤ÕªÌ»ª éò9¬rPì±gˆ@®@Žê)ÐBÒ¾ÃÀ`M®i—Ôbê³W’žQîJ¨5%jR~]©ÕÜ¡¹›h—Š"²Q˜lŠ•I­Vžõô"§–šŒÆ0',£Š²ñEfn*Õ>'ý¥ Š#) eSpeak: Speech Synthesizer
SourceForge.net Logo

eSpeak text to speech

(email)   jonsd at users dot sourceforge.net
Download         eSpeak Sourceforge page         Forum         Mailing list
Usage

Languages

Documents

Samples

License

eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows.   http://espeak.sourceforge.net

eSpeak uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings.

eSpeak is available as:

  • A command line program (Linux and Windows) to speak text from a file or from stdin.
  • A shared library version for use by other programs. (On Windows this is a DLL).
  • A SAPI5 version for Windows, so it can be used with screen-readers and other programs that support the Windows SAPI5 interface.
  • eSpeak has been ported to other platforms, including Solaris and Mac OSX.
Features.
  • Includes different Voices, whose characteristics can be altered.
  • Can produce speech output as a WAV file.
  • SSML (Speech Synthesis Markup Language) is supported (not complete), and also HTML.
  • Compact size. The program and its data, including many languages, totals about 1.4 Mbytes.
  • Can be used as a front-end to MBROLA diphone voices, see mbrola.html. eSpeak converts text to phonemes with pitch and length information.
  • Can translate text into phoneme codes, so it could be adapted as a front end for another speech synthesis engine.
  • Potential for other languages. Several are included in varying stages of progress. Help from native speakers for these or other languages is welcome.
  • Development tools are available for producing and tuning phoneme data.
  • Written in C.

I regularly use eSpeak to listen to blogs and news sites. I prefer the sound through a domestic stereo system rather than small computer speakers, which can sound rather harsh.


Languages. The eSpeak speech synthesizer supports several languages, however in many cases these are initial drafts and need more work to improve them. Assistance from native speakers is welcome for these, or other new languages. Please contact me if you want to help.

eSpeak does text to speech synthesis for the following languages, some better than others. Afrikaans, Albanian, Armenian, Cantonese, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Finnish, French, German, Greek, Hindi, Hungarian, Icelandic, Indonesian, Italian, Kurdish, Latvian, Lojban, Macedonian, Mandarin, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swahili, Swedish, Tamil, Turkish, Vietnamese, Welsh.


The latest development version is at: espeak.sf.net/test/latest.html.
espeakedit is a GUI program used to prepare and compile phoneme data. It is now available for download. Documentation is currently sparse, but if you want to use it to add or improve language support, let me know.
History. Originally known as speak and originally written for Acorn/RISC_OS computers starting in 1995. This version is an enhancement and re-write, including a relaxation of the original memory and processing power constraints, and with support for additional languages.
espeakedit-1.46.02/docs/intonation.html000066400000000000000000000123441171104445100200040ustar00rootroot00000000000000 eSpeak: Phoneme tables Back

INTONATION


In eSpeak's standard intonation model, a "tune" is applied to each clause depending on its punctuation. Other intonation models may be used for some languages, such as tone languages.

Named tunes are defined in the text file: phsource/intonation. This file must be compiled for use by eSpeak by using the espeakedit program, using the menu option: Compile -> Compile intonation data.

Clauses

The tunes which are used for a language can be specified by using a tunes statement in a voice file in espeak-data/voices. eg:

tunes   s1  c1  q1  e1

It's parameters are four tune names which are used for clauses which end in:

  1. Full-stop.
  2. Comma.
  3. Question mark.
  4. Exclamation mark.

A clause consists of the following parts:

  • Pre-head.
    These are any unstressed syllables before the first stressed syllable.

  • Head.
    This is the part from the first stressed syllable up to the last syllable before the nucleus.

  • Nucleus.
    This is stressed syllable which is the focus of the clause. eSpeak chooses the last stressed syllable of the clause.

  • Tail.
    These are the syllables after the nucleus.

Tune definitions

Here is an example tune definition from the file phsource/intonation.

tune s1
prehead   46 57
headenv   fall 16
head       4 80 55 -8 -5
headextend 0 63 38 13 0
nucleus  fall 70 18 24 12
nucleus0 fall 64 8
endtune

It contains:

tune <tune name>
Starts the definition of a tune. The tune name can be used in a tunes statements in voice files.

endtune <tune name>
Ends the definition of a tune.

prehead <start pitch> <end pitch>
Gives the pitch path for any series of unstressed syllables before the first stressed syllable.

headenv <envelope> <height>
Gives the pitch envelope which is used for stressed syllables in the head (before the nucleus), including onset and headlast syllables if these are specified. height gives a pitch range for the envelope.

head <steps> <start pitch> <end pitch> <unstressed start> <unstressed end>
start pitch and end pitch give a pitch path for the stressed syllables of the head. steps is the maximum number of stressed syllables for which this applies. If there are additional stressed syllables, then the headextend statement is used for them.

unstressed start and unstressed end give a pitch path for unstressed syllables between two stressed syllables. Their values are relative to the pitch of the previous stressed syllable. Values are usually negative, meaning that the unstressed syllables have lower pitch than the previous stressed syllable.

headextend <percentage list>
If the head contains more stressed syllables than is specified by steps, then percentage list is used. It contains up to 8 numbers which are used repeatedly for the additional stressed syllables. A value of 0 corresponds to the lower the start pitch and end pitch values of the head statement. 100 corresponds to the higher value. Negative values and values greater than 100 are allowed.

nucleus <envelope> <top pitch> <bottom pitch> <tail start> <tail end>
This gives the pitch envelope and pitch range of the last stressed syllable of the clause. tail start and tail end give a pitch path for the unstressed syllables which are after the last stressed syllable.

nucleus0 <envelope> <top pitch> <bottom pitch>
This is used instead of nucleus if there are no unstressed syllables after the last stressed syllable. In this case, the pitch changes of the nucleus and the tail and both included in the nucleus.

The following attributes may also be included:

onset <pitch> <unstressed start> <unstressed end>
This specifies the pitch for the first stressed syllable of the head. If the onset statement is present, then the head statement used for the stressed syllables after the first.

headlast <pitch> <unstressed start> <unstressed end>
This specifies the pitch for the last stressed syllable of the head (i.e. the stressed syllable before the nucleus).

espeakedit-1.46.02/docs/languages.html000066400000000000000000000314661171104445100175760ustar00rootroot00000000000000 eSpeak Speech Synthesizer Back

3. LANGUAGES


Help Needed

Many of these are just experimental attempts at these languages, produced after a quick reading of the corresponding article on wikipedia.org. They will need work or advice from native speakers to improve them. Please contact me if you want to advise or assist with these or other languages.

The sound of some phonemes may be poorly implemented, particularly [r] since I'm English and therefore unable to make a "proper" [r] sound.

A major factor is the rhythm or cadance. An Italian speaker told me the Italian voice improved from "difficult to understand" to "good" by changing the relative length of stressed syllables. Identifying unstressed function words in the xx_list file is also important to make the speech flow well. See Adding or Improving a Language

Character sets

Languages recognise text either as UTF8 or alternatively in an 8-bit character set which is appropriate for that language. For example, for Polish this is Latin2, for Russian it is KOI8-R. This choice can be overridden by a line in the voices file to specify an ISO 8859 character set, eg. for Russian the line:
     charset 5
will mean that ISO 8859-5 is used as the 8-bit character set rather than KOI8-R.

In the case of a language which uses a non-Latin character set (eg. Greek or Russian) if the text contains a word with Latin characters then that particular word will be pronounced using English pronunciation rules and English phonemes. Speaking entirely English text using a Greek or Russian voice will sound OK, but each word is spoken separately so it won't flow properly.

Sample texts in various languages can be found at http://<language>.wikipedia.org and www.gutenberg.org

3.1 Voice Files

A number of Voice files are provided in the espeak-data/voices directory. You can select one of these with the -v <voice filename> parameter to the speak command, eg:
   espeak -vaf
to speak using the Afrikaans voice.

Language voices generally start with the 2 letter ISO 639-1 code for the language. If the language does not have an ISO 639-1 code, then the 3 letter ISO 639-3 code can be used.

For details of the voice files see Voices.

Default Voice

    default
    This voice is used if none is specified in the speak command. Copy your preferred voice to "default" so you can use the speak command without the need to specify a voice.

3.2 English Voices

    en
    is the standard default English voice.

    en-us
    American English.

    en-sc
    English with a Scottish accent.

    en-n
    en-rp
    en-wm

    are different English voices. These can be considered caricatures of various British accents: Northern, Received Pronunciation, West Midlands respectively.

3.3 Voice Variants

To make alternative voices for a language, you can make additional voice files in espeak-data/voices which contains commands to change various voice and pronunciation attributes. See voices.html.

Alternatively there are some preset voice variants which can be applied to any of the language voices, by appending + and a variant name. Their effects are defined by files in espeak-data/voices/!v.

The variants are +m1 +m2 +m3 +m4 +m5 +m6 +m7 for male voices, +f1 +f2 +f3 +f4 +f5 for female voices, and +croak +whisper for other effects. For example:

   espeak -ven+m3
The available voice variants can be listed with:
   espeak --voices=variant

3.4 Other Languages

The eSpeak speech synthesizer does text to speech for the following additional langauges.

    af  Afrikaans
    This has been worked on by native speakers and it should be OK.

    bs  Bosnian
    Usable, but I'm unsure whether wrong stressed syllables are a problem. It accepts both Latin and Cyrillic characters. This voice is similar to sr Serbian and hr Croatian

    ca  Catalan

    cs  Czech
    Usable.

    da  Danish
    Usable.

    de  German
    This has improved from easlier versions. A problem is stress placement (which like English is irregular), prosody, and the use of compound words where correct detection of the sub-word boundaries would probably be needed for accurate pronunciation.

    el  Greek
    Stress position is marked in text and spelling is fairly regular, so it shouldn't be too bad. It uses a different alphabet and switches to English pronunciation for words which contain Latin characters a-z.

    eo  Esperanto
    Esperanto has simple and regular pronunciation rules, so it should be OK.

    es  Spanish
    Spanish has good spelling rules, so it should be OK.

    es-la  Spanish - Latin America
    This contains a few changes from es, notably the pronunciation of "z","ce","ci".

    fi  Finnish
    This has had assistance from native speakers and should be usable.

    fr  French
    This has been improved by a native speaker, and should be OK.

    hr  Croatian
    Usable, but I'm unsure whether wrong stressed syllables are a problem. It accepts both Latin and Cyrillic characters. This voice is similar to sr Serbian and bs Bosnian

    hu  Hungarian
    This has had assistance from a native speaker and it should be OK.

    it  Italian
    This has had some feedback from a native speaker but more work is needed. Spelling is fairly regular, but stress marks and vowel accents are often omitted from text, so for some words the dictionary/exceptions list will need to determine the stress position or whether to use open/close [e] or [E] and [o] or [O].

    kn  Kannada
    Not much feedback yet, but I'm told that it sounds reasonable.

    ku  Kurdish
    Not much work yet, but Kurdish has good spelling rules so it should be OK.

    lv  Latvian
    This has had assistance from a native speaker and it should be OK.

    nl  Dutch
    Needs improvement of the spelling-to-phoneme rules.

    pl  Polish
    Usable.

    pt  Portuguese (Brazil)
    Brazilian Portuguese. This has had assistance from a native speaker and it should be OK. Like Italian there is further work to do about the ambiguity in the spelling between open/close "e" and "o" vowels.

    pt-pt  Portuguese (European)

    ro  Romanian
    Probably OK. More work is needed to improve the position of stress within words.

    sk  Slovak
    This has had assistance from a native speaker, so it should be OK.

    sr  Serbian
    Usable. Wrong stressed syllables may be a problem. It accepts both Latin and Cyrillic characters. This voice is similar to hr Croatian and bs  Bosnian

    sv  Swedish
    This has now had some work done on the pronunciation rules, so it should be useable.

    sw  Swahihi
    Not much feedback yet, but the spelling and stress rules are fairly regular, so it's probably usable.

    ta  Tamil
    This has had assistance from a native speaker, so it should be OK.

    tr  Turkish
    Not much work yet, but I'm told it sounds reasonable.

    zh  Mandarin Chinese
    This speaks Pinyin text and Chinese characters. There is only a simple one-to-one translation of Chinese characters to a single Pinyin pronunciation. There is no attempt yet at recognising different pronunciations of Chinese characters in context, or of recognising sequences of characters as "words". The eSpeak installation includes a basic set of Chinese characters. More are available in an additional data file for Mandarin Chinese at: http://espeak.sourceforge.net/data/.

3.5 Provisional Languages

These languages are only initial naive implementations which have had little or no feedback and improvement from native speakers.

    cy  Welsh
    An initial guess, awaiting feedback.

    grc  Ancient Greek
    Includes a short pause between words to help understanding.

    hi  Hindi
    This is interesting because it uses the Devanagari characters. I'm not sure about Hindi stress rules, and I expect the sound of aspirated/unaspirated consonant pairs needs improvement.

    hy  Armenian
    Needs feedback from native speakers. The hy-west voice has different pronunciation of some consonants for Western Armenian pronunciation.

    id  Indonesian
    An initial guess, no feedback yet.

    is  Icelandic
    An initial guess, awaiting feedback.

    jbo  Lojban
    An artificial language.

    ka  Georgian
    An initial guess, awaiting feedback.

    la  Latin
    Stress rules are implemented, but it needs text where long vowels are marked with macrons.

    mk  Macedonian
    This is similar to hr Croatian, so it's probably usable. It accepts both Latin and Cyrillic characters.

    no  Norwegian
    An initial guess, awaiting feedback.

    ru  Russian
    So far it's just an initial attempt with basic pronunciation rules. Work is needed especially on the consonants. Russian has two versions of most consonants, "hard" and "soft" (palatalised) and in most cases eSpeak doesn't yet make a proper distinction.
    Russian stress position is unpredictable so a large lookup dictionary is needed of those words where eSpeak doesn't guess correctly. To avoid increasing the size of the basic eSpeak package, this is available separately at: http://espeak.sourceforge.net/data/

    sq  Albanian
    Some initial feedback, but needs more work.

    vi  Vietnamese
    This is interesting because it's a tone language. I don't know how it should sound, so it's just a guess and I need feedback.

    zh-yue  Cantonese Chinese
    Just a naive simple one-to-one translation from single Simplified Chinese characters to phonetic equivalents in Cantonese. There is limited attempt at disambiguation, grouping characters into words, or adjusting tones according to their surrounding syllables. This voice needs Chinese character to phonetic translation data, which is available as a separate download for Cantonese at: http://espeak.sourceforge.net/data/.
    The voice can also read Jyutping romanised text.

3.6 Mbrola Voices

Some additional voices, whose name start with mb- (for example mb-en1) use eSpeak as a front-end to Mbrola diphone voices. eSpeak does the spelling-to-phoneme translation and intonation. See mbrola.html.

espeakedit-1.46.02/docs/mbrola.html000066400000000000000000000165221171104445100171000ustar00rootroot00000000000000 espeakedit: Mbrola Voices Back


MBROLA VOICES


The Mbrola project is a collection of diphone voices for speech synthesis. They do not include any text-to-phoneme translation, so this must be done by another program. The Mbrola voices are cost-free but are not open source. They are available from the Mbrola website at:
http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html

eSpeak can be used as a front-end to Mbrola. It provides the spelling-to-phoneme translation and intonation, which Mbrola then uses to generate speech sound.

Voice Names

To use a Mbrola voice, eSpeak needs information to translate from its own phonemes to the equivalent Mbrola phonemes. This has been set up for only some voices so far.

The eSpeak voices which use Mbrola are named as:
  mb-xxx

where xxx is the name of a Mbrola voice (eg. mb-en1 for the Mbrola "en1" English voice). These voice files are in eSpeak's directory espeak-data/voices/mbrola.

The installation instructions below use the Mbrola voice "en1" as an example. You can use other mbrola voices for which there is an equivalent eSpeak voice in espeak-data/voices/mbrola.

There are some additional eSpeak Mbrola voices which speak English text using a Mbrola voice for a different language. These contain the name of the Mbrola voice with a suffix -en. For example, the voice mb-de4-en will speak English text with a German accent by using the Mbrola de4 voice.

Windows Installation

The SAPI5 version of eSpeak uses the mbrola.dll.
  1. Install eSpeak. Include the voice mb-en1 in the list of voices during the eSpeak installation.

  2. Install the PC/Windows version of Mbrola (MbrolaTools35.exe) from: http://www.tcts.fpms.ac.be/synthesis/mbrola/bin/pcwin/MbrolaTools35.exe.

  3. Get the en1 voice from: http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html unpack the archive, and copy the "en1" data file (not the whole "en1" directory) into C:/Program Files/eSpeak/espeak-data/mbrola.

  4. Use the voice espeak-MB-EN1 from the list of SAPI5 voices.

Linux Installation

From eSpeak version 1.44 onwards, eSpeak calls the mbrola program directly, rather than passing phoneme data to it using a pipe.
  1. To install the Linux Mbrola binary, download: http://www.tcts.fpms.ac.be/synthesis/mbrola/bin/pclinux/mbr301h.zip. Unpack the archive, and copy and rename the file from: mbrola-linux-i386 to mbrola somewhere in your executable path (eg. /usr/bin/mbrola ).

  2. Get the en1 voice from: http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html. Unpack the archive, and copy the "en1" data file (not the whole "en1" directory) to /usr/share/mbrola/en1.

    eSpeak will look for mbrola voices firstly in espeak-data/mbrola and then in /usr/share/mbrola

  3. If you use the eSpeak voice such as "mb-en1" then eSpeak will use the mbrola "en1" voice, eg:
    espeak -v mb-en1 "Hello world"

    To generate mbrola phoneme data (.pho file) you can use:
    espeak -v mb-en1 -q --pho "Hello world"
    or
    espeak -v mb-en1 -q --pho --phonout=out.pho "Hello world"

Mbrola Voice Files

eSpeak's voice files for Mbrola voices are in directory espeak-data/voices/mbrola. They contain a line:
  mbrola <voice> <translation>
eg.
  mbrola en1 en1_phtrans
  • <voice> is the name of the Mbrola voice.

  • <translation> is a translation file to convert between eSpeak phonemes and the equivalent Mbrola phonemes. These are kept in: espeak-data/mbrola_ph
They are binary files which are compiled, using espeakedit, from source files in phsource/mbrola, see below.

Mbrola Phoneme Translation Data

Mbrola phoneme translation files specify translations from eSpeak phoneme names to mbrola phoneme names. They are referenced from voice files.

The source files are in phsource/mbrola. These are compiled using the espeakedit program (Compile->Compile mbrola phonemes list) to produce data files in espeak-data/mbrola_ph which are used by eSpeak.

Each line in the mbrola phoneme translation file contains:

<control> <espeak ph1> <espeak ph2> <percent> <mbrola ph1> [<mbrola ph2>]

  • <control>
    • bit 0   skip the next phoneme
    • bit 1   match this and Previous phoneme
    • bit 2   only at the start of a word
    • bit 3   don't match two phonemes across a word boundary

  • <espeak ph1>
    The eSpeak phoneme which is to be translated to an mbrola phoneme.

  • <espeak ph2>
    If this field is not NULL, then the match only occurs if this field matches the next phoneme. If control bit 1 is set, then the previous rather than the next phoneme is matched. This field may also have the following values:
    VWL   matches any Vowel phoneme.

  • <percent>
    If this field is zero then only one mbrola phoneme is used. If this field is non-zero, then two mbrola phonemes are used, and this value gives the percentage length of the first mbrola phoneme.

  • <mbrola ph1>
    The mbrola phoneme to which the eSpeak phoneme is translated. This field may be NULL.

  • <mbrola ph2>
    The second mbrola phoneme. This field is only used if the <percent> field is not zero.

The list is searched from start to finish, until a match is found. Therefore, a line with more specific match condition should appear before a line which matches the same eSpeak phoneme but with a more general condition.

The file dictsource/dict_phonemes lists the eSpeak phonemes which are used for each language. Translations for all these should be given in the mbrola phoneme translation file. In addition, some phonemes which are referenced from phoneme files (eg. phsource/ph_language, phsource/phonemes) in lines such as:

   beforenotvowel   l/
   reduceto  a#  0
should also be included, even though they don't appear in dictsource/dict_phonemes.

If the language's *_list or *_rules files includes rules to speak words "as English" the mbrola phoneme translation file should include rules which translate English phonemes into near equivalents, so that they can spoken by the mbrola voice. espeakedit-1.46.02/docs/phonemes.html000066400000000000000000000122321171104445100174340ustar00rootroot00000000000000 sSpeak: Phonemes Back


PHONEMES


In general a different set of phonemes can be defined for each language.

In most cases different languages inherit the same basic set of consonants. They can add to these or modify them as needed.

The phoneme mnemonics are based on the scheme by Kirshenbaum which represents International Phonetic Alphabet symbols using ascii characters. See: www.kirshenbaum.net/IPA/ascii-ipa.pdf.

Phoneme mnemonics can be used directly in the text input to espeak. They are enclosed within double square brackets. Spaces are used to separate words, and all stressed syllables must be marked explicitly. eg:
[[D,Is Iz sVm f@n'EtIk t'Ekst 'InpUt]]

English Consonants

[p] [b]
[t] [d]
[tS]church [dZ]judge
[k] [g]

[f] [v]
[T]thin [D]this
[s] [z]
[S]shop [Z]pleasure
[h]

[m] [n]
[N]sing
[l] [r]red (Omitted if not immediately followed by a vowel).
[j]yes [w]

Some Additional Consonants
[C]German ich [x]German buch
[l^]Italian gli [n^]Spanish ñ

English Vowels

These are the phonemes which are used by the English spelling-to-phoneme translations (en_rules and en_list). In some varieties of English different phonemes may have the same sound, but they are kept separate because they may differ in another variety.

In rhotic accents, such as General American, the phonemes [3:], [A@], [e@], [i@], [O@], [U@] include the "r" sound.

[@] alphaschwa
[3] betterrhotic schwa. In British English this is the same as [@], but it includes 'r' colouring in American and other rhotic accents. In these cases a separate [r] should not be included unless it is followed immediately by another vowel.
[3:]nurse
[@L]simple
[@2]theUsed only for "the".
[@5]toUsed only for "to".

[a]trap
[aa]bathThis is [a] in some accents, [A:] in others.
[a2]aboutThis may be [@] or may be a more open schwa.
[A:]palm
[A@]start

[E]dress
[e@]square

[I]kit
[I2]intendAs [I], but also indicates an unstressed syllable.
[i]happyAn unstressed "i" sound at the end of a word.
[i:]fleece
[i@]near

[0]lot

[V]strut

[u:]goose
[U]foot
[U@]cure

[O:]thought
[O@]north
[o@]force

[aI]price
[eI]face
[OI]choice
[aU]mouth
[oU]goat
[aI@]science
[aU@]hour

Some Additional Vowels

Other languages will have their own vowel definitions, eg:
[e]German eh, French é
[o]German oo, French o
[y]German ü, French u
[Y]German ö, French oe
espeakedit-1.46.02/docs/phontab.html000066400000000000000000000437521171104445100172640ustar00rootroot00000000000000 eSpeak: Phoneme tables Back

PHONEME TABLES


A phoneme table defines all the phonemes which are used by a language, together with their properties and the data for their production as sounds.

Generally each language has its own phoneme table, although additional phoneme tables can be used for different voices within the language. These alternatives are referenced from Voice files.

A phoneme table does not need to define all the phonemes used by a language. It can inherit the phonemes from a previously defined phoneme table. For example, a phoneme table may redefine (or add) some of the vowels that it uses, but inherit most of its consonants from a standard set.

The source files for the phoneme data are in the "phsource" directory in the espeakedit download package. "Vowel files", which are referenced in FMT(), VowelStart(), and VowelEnding() instructions are made using the espeakedit program.

 


Phoneme files

The phoneme tables are defined in a master phoneme file, named phonemes. This starts with the base phoneme table followed by phoneme tables for other languages and voices. These inherit phonemes from the base table or previously defined tables.

In addition to phoneme definitions, the phoneme file can contain the following:

include <filename>
Includes the text of the specified file at this point. This allows different phoneme tables to be kept in different text files, for convenience. <filename> is a relative path. The included file can itself contain include statements.

phonemetable <name> <parent>
Starts a new phoneme table, and ends the previous table.
<name> Is the name of this phoneme table. This name is used in Voice files.
<parent> Is the name of a previously defined phoneme table whose phoneme definitions are inherited by this one. The name base indicates the first (base) phoneme table.

 


Phoneme definitions

Note: These new Phoneme definitions apply to eSpeak version 1.42.20 and later.

A phoneme table contains a list of phoneme definitions. Each starts with the keyword phoneme and the phoneme name (this is the name used in the pronunciation rules in a language's *_rules and *_list files), and ends with the keyword endphoneme. For example:

  phoneme aI
    vowel
    starttype #a endtype #i
    length 230
    FMT(vowels/ai)
  endphoneme

  phoneme s
    vls alv frc sibilant
    voicingswitch z
    lengthmod 3
    Vowelin  f1=0  f2=1700 -300 300  f3=-100 80
    Vowelout f1=0  f2=1700 -300 250  f3=-100 80  rms=20

    IF nextPh(isPause) THEN
      WAV(ufric/s_)
    ELIF nextPh(p) OR nextPh(t) OR nextPh(k) THEN
      WAV(ufric/s!)
    ENDIF
    WAV(ufric/s)
  endphoneme

A phoneme definition contains both static properties and executed instructions. The instructions may contain conditional statements, so that the effect of the phoneme may be different depending on adjacent phonemes, whether the syllable is stressed, etc.

The instructions of a phoneme are interpreted in two different phases. In the first phase, the instructions may change the phoneme and replace it by a different phoneme. In the second phase, instructions are used to produce the sound for the phoneme.

The import_phoneme statement can be used to copy a previously defined phoneme from a specified phoneme table. For example:

  phoneme t
    import_phoneme base/t[
  endphoneme 
means: phoneme t in this phoneme table is a copy of phoneme t[ from phoneme table "base". A length instruction can be used after import_phoneme to vary the length from the original.

 


Phoneme Properties

Within the phoneme definition the following lines may occur: ( (V) indicates only for vowels, (C) only for consonants)

    Type. One of these must be present.
    vowel
    liquidsemi-vowels, such as:  r, l, j, w
    nasalnasal eg:  m, n, N
    stopstop eg:  p, b, t, d, k, g
    frcfricative eg:  f, v, T, D, s, z, S, Z, C, x
    afraffricate eg:  tS, dZ
    pause
    stressused for stress symbols, eg: ' , = %
    virtualUsed to represent a class of phonemes.
    Properties:
    vls(C) voiceless eg. p, t, k, f, s
    vcd(C) voiced eg. b, d, g, v, z
    sibilant(C) eg: s, z, S, Z, tS, dZ
    palatal(C) A palatal or palatalized consonant.
    rhotic(C) An "r" type consonant.
    unstressed(V) This vowel is always unstressed, unless explicitly marked otherwise.
    nolinkPrevent any linking from the previous phoneme.
    nopauseUsed in a liquid or nasal phoneme to prevent eSpeak inserting a short pause if a word starts with this phoneme and the previous word ends with a vowel.
    trill(C) Apply trill to the voicing.
    Place of Articulation (C):
    blb  bi-labial ldb  labio-dental dnt  dental
    alvalveolar rfxretroflex plapalato-alveolar
    palpalatal velvelar lbvlabio-velar
    uvluvular phrpharyngeal gltglottal

    starttype <phoneme>
    Allocates this phoneme to a group so that conditions such as nextPh(#e) can test for any of a group of phonemes. Pre-defined groups for use for vowels are: #@ #a #e #i #o #u. Additional groups can be defined as phonemes with type "virtual".

    endtype <phoneme>
    Allocates this phoneme to a group so that conditions such as prevPh(#e) can test for any of a group of phonemes. Pre-defined groups for use for vowels are: #@ #a #e #i #o #u. Additional groups can be defined as phonemes with type "virtual".

    lengthmod <integer>
    (C) Determines how this consonant affects the length of the previous vowel. This value is used as index into the length_mods table in the CalcLengths() function in the eSpeak program.

    voicingswitch <phoneme>
    This is used for some languages to change between voiced and unvoiced phonemes.

 


Phoneme Instructions

Phoneme Instructions may be included within conditional statements.

During the first phase of phoneme interpretation, an instruction which causes a change to a different phoneme will terminate the instructions. During the second phase, FMT() and WAV() instructions will terminate the instructions.

    length <length>
    The relative length of the phoneme, typically about 140 for a short vowel and from 200 to 300 for a long vowel or diphong. A length() instruction is needed for vowels. It is optional for consonants.

    ipa <ipa string>
    In many cases, eSpeak makes IPA (International Phonetic Alpbabet) phoneme names automatically from eSpeak phoneme names. If this is not correct, then the phoneme definition can include an ipa instruction to specify the correct IPA name. IPA strings may include non-ascii characters. They may also include characters specified by their character codes in the form U+ followed by 4 hexadecimal digits. For example a string: aU+0303 indicates 'a' with a 'combining tilde'.

    WAV(<wav file>, <amplitude>)
     <wav file> is a path to a WAV file (22 kHz, 16 bits, mono) within phsource/ which will be played to produce the sound. This method is used for unvoiced consonants. <wavefile> does not include a .WAV filename extension, although the file to which it refers may or may not have one.
    <amplitude> is optional. It is a percentage change to the amplitude of the WAV file. So, WAV(ufric/s, 50) means: play file 'ufric/s.wav' at 50% amplitude.

    FMT(<vowel file>, <amplitude>)
    <vowel file> is a path to a file (within phsource/) which defines how to generate the sound (a vowel or voiced consonant) from a sequence of formant values. Vowel files are made using the espeakedit program.
    <amplitude> is optional. It is a percentage change to the amplitude of the sound which is synthesized from the FMT() instruction.

    FMT(<vowel file>, <amplitude>) addWav(<wav file>, <amplitude>)
    For voiced consonants, a FMT() instruction may be followed by an addWav() instruction. addWav() has the same format as a WAV() instruction, but the WAV file is mixed with the sound which is synthesized from the FMT() instruction.

    VowelStart(<vowel file>, <length adjust>)
    This is used to modify the start of a vowel when it follows a sonorant consonant (such as [l] or [j]). It replaces the first frame of the <vowel file> which is specified in a FMT() instruction by this <vowel file>, and adjusts the length of the original by a signed value <length adjust>. The VowelStart() instruction may be specified either in the phoneme definition of the vowel, or in the phoneme definition of the sonorant consonant which precedes the vowel. The former takes precedence.

    VowelEnding(<vowel file>, <length adjust>)
    This is used to modify the end of a vowel when it is followed by a sonorant consonant (such as [l] or [j]). It is appended to the <vowel file> which is specified in a FMT() instruction by this <vowel file>, and adjusts the length of the original by a signed value <length adjust>. The VowelEnding() instruction may be specified either in the phoneme definition of the vowel, or in the phoneme definition of the sonorant consonant which follows the vowel. The former takes precedence.

    Vowelin <vowel transition data>
    (C) Specifies the effects of this consonant on the formants of a following vowel. See "vowel transitions", below.

    Vowelout <vowel transition data>
    (C) Specifies the effects of this consonant on the formants of a preceding vowel. See "vowel transitions", below.

    ChangePhoneme(<phoneme>)
    Change to the specified phoneme.

    ChangeIfDiminished(<phoneme>)
    Change to the specified phoneme (such as schwa, @) if this syllable has "diminished" stress.

    ChangeIfUnstressed(<phoneme>)
    Change to the specified phoneme if this syllable has "diminished" or "unstressed" stress.

    ChangeIfNotStressed(<phoneme>)
    Change to the specified phoneme if this syllable does not have "primary" stress.

    ChangeIfStressed(<phoneme>)
    Change to the specified phoneme if this syllable has "primary" stress.

    IfNextVowelAppend(<phoneme>)
    If the following phoneme is a vowel then this additional phoneme will be inserted before it.

    RETURN
    Ends executions of instructions.

    CALL <phoneme table>/<phoneme>
    Executes the instructions of the specified phoneme.

 


Conditional Statements

Phoneme definitions can contain conditional statements such as:
  IF <condition> THEN
    <statements>
  ENDIF
or more generally:
  IF <condition> THEN
    <statements>
  ELIF <condition> THEN
    <statements>
  ...
  ELSE
    <statements>
  ENDIF
where the ELSE and multiple ELSE parts are optional.

Multiple conditions may be joined with AND or OR, but not a mixture of ANDs and ORs.

Condition Can be:

    prevPh(<attribute>)
    Test the previous phoneme

    prevPhW(<attribute>)
    Test the previous phoneme, but only within the same word. Returns false if there is no previous phoneme in the word.

    thisPh(<attribute>)
    Test this current phoneme

    nextPh(<attribute>)
    Test the following phoneme

    nextPhW(<attribute>)
    Test the following phoneme, but only within the same word. Returns false if there is no following phoneme in the word.

    next2Ph(<attribute>)
    Test the phoneme after the next phoneme.

    nextVowel(<attribute>)
    Test the next vowel after the current phoneme, but only within the same word. Returns false if there is none.

    PreVoicing()
    This is used as part of the instructions for voiced stop consonants (eg. [d] [g]). If true then produce a voiced murmur before the stop.

    KlattSynth()
    Returns true if the voice is using the Klatt synthesizer rather than the eSpeak synthesizer.
Attributes
    Note: Additional attributes could be added to eSpeak if needed.
    <phoneme name>
    True if the phoneme has this phoneme name.

    <phoneme group>
    True if the phoneme has this starttype (or if it has this endtype if it's used in prevPh() ). The pre-defined phoneme groups are #@, #a, #e, #i, #o, #u.

    isPause
    True if the phoneme is a pause.

    isPause2
    nextPh(isPause2) is used to test whether the next phoneme is not a vowel or liquid consonant within the same word.

    isVowel
    isNotVowel
    isLiquid
    isNasal
    isVFricative
    These test the phoneme type.

    isPalatal
    isRhotic
    These test whether the phoneme has this property.

    isWordStart
    notWordStart
    These text whether this is the first phoneme in a word.

    isWordEnd
    True if this is the final phoneme in a word.

    isFinalVowel
    True if this is the last vowel in a word.

    isAfterStress
    True if this phoneme is after the stressed vowel in a word.

    isVoiced
    True if this phoneme is a vowel or a voiced consonant.

    isDiminished
    True if the syllable stress is "diminished"

    isUnstressed
    True if the syllable stress is "diminished" or "unstressed"

    isNotStressed
    True if the syllable stress is not "primary stress".

    isStressed
    True if the syllable stress is "primary stress".

    isMaxStress
    True if this is the highest stressed syllable in the word.

 


Sound Specifications

There are three ways to produce sounds:
  • Playing a WAV file, by using a WAV() instruction. This is used for unvoiced consonants such as [p] [t] [s].
  • Generating a wave from a sequence of formant parameters, by using a FMT() instruction.This is used for vowels and also for sonorants such as [l] [j] [n].
  • A mixture of these. A stored WAV file is mixed with a wave generated from formant parameters. Use a FMT() instruction followed by addWav(). This is used for voiced stops and fricatives such as [b] [g] [v] [z].

 


Vowel Transitions

These specify how a consonant affects an adjacent vowel. A consonant may cause a transition in the vowel's formants as the mouth changes shape between the consonant and the vowel. The following attributes may be specified. Note that the maximum rate of change of formant frequencies is limited by the speak program.

    len=<integer>
    Nominal length of the transition in mS. If omitted a default value is used.
    rms=<integer>
    Adjusts the amplitude of the vowel at the end of the transition. If omitted a default value is used.
    f1=<integer>
    0:   f1 formant frequency unchanged.
    1:   f1 formant frequency decreases.
    2:   f1 formant frequency decreases more.
    f2=<freq> <min> <max>
    <freq>:   The frequency towards which the f2 formant moves (Hz).
    <min>:   Signed integer (Hz).  The minimum f2 frequency change.
    <max>:   Signed integer (Hz).  The maximum f2 frequency change.
    f3=<change> <amplitude>
    <change>:   Signed integer (Hz).  Frequence change of f3, f4, and f5 formants.
    <amplitude>:   Amplitude of the f3, f4, and f5 formants at the end of the transition. 100 = no change.
    brk
    Break. Do not merge the synthesized wave of the consonant into the vowel. This will produce a discontinuity in the formants.
    rate
    Allow a greater maximum rate of change of formant frequencies.
    glstop
    Indicates a glottal stop.
espeakedit-1.46.02/docs/speak_lib.h000066400000000000000000000524341171104445100170420ustar00rootroot00000000000000#ifndef SPEAK_LIB_H #define SPEAK_LIB_H /*************************************************************************** * Copyright (C) 2005 to 2010 by Jonathan Duddington * * email: jonsd@users.sourceforge.net * * * * This program is free software; you can redistribute it and/or modify * * it under the terms of the GNU General Public License as published by * * the Free Software Foundation; either version 3 of the License, or * * (at your option) any later version. * * * * This program is distributed in the hope that it will be useful, * * but WITHOUT ANY WARRANTY; without even the implied warranty of * * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * * GNU General Public License for more details. * * * * You should have received a copy of the GNU General Public License * * along with this program; if not, see: * * . * ***************************************************************************/ /*************************************************************/ /* This is the header file for the library version of espeak */ /* */ /*************************************************************/ #include #include #define ESPEAK_API_REVISION 6 /* Revision 2 Added parameter "options" to eSpeakInitialize() Revision 3 Added espeakWORDGAP to espeak_PARAMETER Revision 4 Added flags parameter to espeak_CompileDictionary() Revision 5 Added espeakCHARS_16BIT Revision 6 Added macros: espeakRATE_MINIMUM, espeakRATE_MAXIMUM, espeakRATE_NORMAL */ /********************/ /* Initialization */ /********************/ // values for 'value' in espeak_SetParameter(espeakRATE, value, 0), nominally in words-per-minute #define espeakRATE_MINIMUM 80 #define espeakRATE_MAXIMUM 450 #define espeakRATE_NORMAL 175 typedef enum { espeakEVENT_LIST_TERMINATED = 0, // Retrieval mode: terminates the event list. espeakEVENT_WORD = 1, // Start of word espeakEVENT_SENTENCE = 2, // Start of sentence espeakEVENT_MARK = 3, // Mark espeakEVENT_PLAY = 4, // Audio element espeakEVENT_END = 5, // End of sentence or clause espeakEVENT_MSG_TERMINATED = 6, // End of message espeakEVENT_PHONEME = 7, // Phoneme, if enabled in espeak_Initialize() espeakEVENT_SAMPLERATE = 8 // internal use, set sample rate } espeak_EVENT_TYPE; typedef struct { espeak_EVENT_TYPE type; unsigned int unique_identifier; // message identifier (or 0 for key or character) int text_position; // the number of characters from the start of the text int length; // word length, in characters (for espeakEVENT_WORD) int audio_position; // the time in mS within the generated speech output data int sample; // sample id (internal use) void* user_data; // pointer supplied by the calling program union { int number; // used for WORD and SENTENCE events. For PHONEME events this is the phoneme mnemonic. const char *name; // used for MARK and PLAY events. UTF8 string } id; } espeak_EVENT; /* When a message is supplied to espeak_synth, the request is buffered and espeak_synth returns. When the message is really processed, the callback function will be repetedly called. In RETRIEVAL mode, the callback function supplies to the calling program the audio data and an event list terminated by 0 (LIST_TERMINATED). In PLAYBACK mode, the callback function is called as soon as an event happens. For example suppose that the following message is supplied to espeak_Synth: "hello, hello." * Once processed in RETRIEVAL mode, it could lead to 3 calls of the callback function : ** Block 1: