MBROLA

(Multi-Band Resynthesis OverLap Add)

System documentation

Edition 6.0 - Mbrola release 3.01g

October 20st, 1998

 

 

 

 

 

 

 

"It would be a considerable invention indeed, that of a machine able to mimic speech, with its sounds and articulations. I think it is not impossible."

Leonhard Euler (1761)

 

 

 

 

 

 

 

 

 

by Vincent Pagel and Thierry Dutoit

 

MBROLA release 3.01g, table of Content

1. MBROLA Sources General condition of use *

2. A brief description of MBROLA *

3. Distribution *

4. Installation and Tests *

4.1 On Unix *

4.2 On PCs/Dos *

4.3 On PC/Windows *

4.3.1 Black magic *

4.4 Using the standalone binary *

4.4.1 Changing the pitch *

4.4.2 Renaming and Cloning phonemes *

4.5 Machine dependant hints for best using Mbrola *

4.5.1 On MSDOS *

4.5.2 On modern Unix systems such as Solaris or HPUX or Linux *

4.5.3 On Sun4 ( old audio interface ) *

4.5.4 On VAX or AXP workstations *

5. Default Parser Manual *

5.1 Input file format *

5.1.1 Changing the Frequency Ratio or Time Ratio *

5.1.2 Flush the output stream *

5.2 Limitations of MBROLA *

6. Programmer's Manual *

6.1 Philosophy and architecture *

6.1.1 Encapsulation of Object's attributes *

6.1.2 Inheritance and Polymorphism *

Inheritance and cross-reference graph *

6.2 Application Programming Interface *

6.2.1 One channel mode *

6.2.2 Multi channel mode *

6.2.3 Designing and plugging your own parser *

7. Mbrola architecture *

7.1 File: Misc/common.h *

7.2 File: Misc/incdll.h *

7.3 File: Misc/mbralloc.h *

7.4 File: Misc/vp_error.h *

7.5 File: Misc/audio.h *

7.6 File: Database/database.h *

7.7 File: Database/database_bacon.h *

7.8 File: Database/database_old.h *

7.9 File: Database/diphone_info.h *

7.10 File: Database/hash_tab.h *

7.11 File: Database/little_big.h *

7.12 File: Database/rename_list.h *

7.13 File: Engine/diphone.h *

7.14 File: Engine/mbrola.h *

7.15 File: Parser/fifo.h *

7.16 File: Parser/input.h *

7.17 File: Parser/input_fifo.h *

7.18 File: Parser/input_file.h *

7.19 File: Parser/parser.h *

7.20 File: Parser/parser_input.h *

7.21 File: Parser/phonbuff.h *

7.22 File: Parser/phone.h *

7.23 File: Standalone/synth.h *

7.24 File: LibOneChannel/onechannel.h *

7.25 File: LibMultiChannel/multichannel.h *

7.26 Index of symbols *

8. Support *

  1. MBROLA Sources General condition of use
  2. The source code of MBROLA may only be used to produce the object code sold by your company. It is confidential and should remain safely locked, as well as its documentation.

  3. A brief description of MBROLA

MBROLA v3.01 is a speech synthesizer based on the concatenation of diphones. One synthesis channel takes a list of phonemes as input, together with prosodic information (duration of phonemes and a piecewise linear description of pitch), and produces speech samples on 16 bits (linear), at the sampling frequency of the diphone database. It is therefore not a Text-To-Speech synthesizer, since it does not accept raw text as input.

It is distributed as a ZIP file whose name respect the format "mbrXXXX.zip" where XXXX represent the version number (e.g. "mbr3.01e.zip").

It may be compiled in 3 modes depending on which stream drives the process:

While using library or DLL mode, we now differentiate one channel and multi channel mbrola. In the first mode, one database is associated to one and only one synthesis channel, which generally fits for end-user applications. In the second mode, one can run many synthesis channel instantiations with one or more Database instances and many phonetic input streams. This second solution is adapted to multi channel telecom TTS applications.

In all those compilation modes MBROLA requires a language/voice database to run properly. For your internal use (i.e. non-commercial) you can test the voices made available on the MBROLA project homepage:

http://tcts.fpms.ac.be/synthesis

Refer to your contract to check your rights for commercial exploitation of the different Diphone Databases.

  1. Distribution
  2. Since release 3.01, Mbrola has been transformed into pure ANSI/C code, and object like programming with a strong encapsulation of data (strong because we have respected the fences we put!). One file in the distribution is generally equivalent to one object (pointer on struct). You can find an exhaustive description in the programmer's section 6.

    This distribution of MBROLA contains the following files:

    Makefile: Unix makefile for Gnu Make (gmake command)

    DOCUMENTATION/Programmer/documentation301e.doc: this document

    DOCUMENTATION/Programmer/HISTORY.txt: history of revisions

    DOCUMENTATION/User/readme.txt: standalone version manual

    Database: handling of different database formats

    Database/database.c: functions to read diphones in the speech database

    Database/database.h

    Database/database_bacon.c: functions to read compressed diphone databases

    Database/database_bacon.h

    Database/database_old.c: functions to read diphone databases older than 2.06

    Database/database_old.h

    Database/diphone_info.c: description of the diphone structures

    Database/diphone_info.h

    Database/hash_tab.c: hash table of DiphoneInfo (access to the diphone database)

    Database/hash_tab.h

    Database/little_big.c: handles the little and big endian numeric conversions

    Database/little_big.h

    Database/rename_list.c: list of phoneme pairs (used for renaming and cloning)

    Database/rename_list.h

    Parser: functions to read phonemes in the input stream

    Parser/fifo.c: First In First Out with chars

    Parser/fifo.h

    Parser/input.h: define abstract input stream

    Parser/input_fifo.c: instantiation of input.h with Fifo

    Parser/input_fifo.h

    Parser/input_file.c: instantiation of input.h with File

    Parser/input_file.h

    Parser/parser.h: define abstract phoneme parser

    Parser/parser_input.c: instantiation of parser.h with Input

    Parser/parser_input.h

    Parser/phonbuff.c: handle a phoneme buffer for pitch interpolation

    Parser/phonbuff.h

    Parser/phone.c: phoneme type

    Parser/phone.h

    Engine: Mbrola synthesis engine

    Engine/diphone.c: diphone with info for synthesis

    Engine/diphone.h

    Engine/mbrola.c: mbrola algorithm (Ola, Smoothing...)

    Engine/mbrola.h

    Misc: Miscellaneous functions basically unrelated to synthesis

    Misc/audio.c: audio output and audio file header (au, wav, aiff, raw)

    Misc/audio.h

    Misc/common.c: useful little functions (uppercase, swab...)

    Misc/common.h

    Misc/g711.c: G711 audio coding (ALAW and MULAW)

    Misc/g711.h

    Misc/incdll.h: external definitions used outside of the Mbrola package

    Misc/mbralloc.c: memory allocators are here and ONLY here

    Misc/mbralloc.h

    Misc/vp_error.c: deals with fatal error and warnings

    Misc/vp_error.h: macros for debugging purposes

    Standalone: Standalone compilation front-end

    Standalone/Posix

    Standalone/Posix/getopt.c: provided for non-POSIX Unixes

    Standalone/Posix/getopt.h

    Standalone/synth.c: front-end for the compilation in the standalone mode. Main()

    Standalone/synth.h

    LibOneChannel: library providing one MBROLA synthesis channel

    LibOneChannel/demo1.c: small demonstration program running with the library LibOneChannel/demo1b.c: small demo showing error handling with the library

    LibOneChannel/onechannel.c: library providing one mbrola channel at a time

    LibOneChannel/onechannel.h

    LibOneChannel/lib1.c: wrapper file to build the library lib1.c (mono channel)

    LibMultiChannel: library for multi MBROLA synthesis channel for telecom

    LibMultiChannel/multichannel.c: many synthesis channel from one dba

    LibMultiChannel/multichannel.h

    LibMultiChannel/demo2.c: demo using lib2

    LibMultiChannel/lib2.c: wrapper file to build the library lib2.c (multi channel)

    VisualC++: compilation projects for Microsoft Visual C++

    VisualC++/DLL: Visual C++ project to build the DLL

    VisualC++/DLL_USE: sample program using the DLL

    VisualC++/Standalone: Visual C++ project to build a standalone binary

    Bin: directory containing the output of the compilation with Make under Unix architectures.

  3. Installation and Tests
    1. On Unix

You must first unzip the distribution file mbrXXXX.zip where XXXX stand for the version number:

unzip mbrXXXX.zip

Mbrola can be compiled with the 'gmake' (gnu make) command on the following platforms:

SUN Sparc 5/S5R4 (Solaris2.4)

HPUX9.0 and HPUX10.0

VAX/VMS V6.2 (V5.5-2 won't work)

DECALPHA(AXP)/VMS 6.2

AlphaStation 200 4/233

AlphaStation 200 4/166

IBM RS6000 Aix 4.12

PC/LINUX 1.2.11

PCPentium120/Solaris2.4

OS/2

BeBox

QNX OS

Though, as Mbrola is written in standard ANSI/C, we also support POSIX compliant UNIX Platforms. Please send acknowledgment when Mbrola works on a machine/system not listed here.

Before you compile anything you must define some symbols depending on the architecture you're working with:

According to the compilation mode you wish, you can comment or uncomment following lines of Makefile :

#CFLAGS += -DDEBUG

#CFLAGS += -DDEBUG_HASH

#CFLAGS += -DLITTLE_ENDIAN

CFLAGS += -DBIG_ENDIAN

You can add any definitions to the CFLAGS (compilation flags) variable of the Makefile, as in the following example:

optimized compilation on a Sun Station :

CFLAGS= -Wall -DBIG_ENDIAN -O6

debug mode on a VAX/VMS :

CFLAGS= -Wall -DLITTLE_ENDIAN -DVMS -g -DDEBUG

By default the compiler is set with CC = gcc ; though on many platforms cc may also work. As the hardware manufacturer generally provides cc, it is preferred when possible since the object code performance can be higher by an order of magnitude.

You can type :

The intermediate object code goes into a Bin directory that is created on the occasion.

    1. On PCs/Dos
    2. On PC/Dos platforms, use "pkunzip synthXXXX.zip" to restore the files (don't forget to restore the embedded paths in the archive).

      Mbrola can be compiled with Microsoft Visual C++ (4 .0 or higher), or Borland C++ (4 .5 or higher), on the following platforms:

      PC486/DOS6 (but other PC/DOS should do, too)

      PC486/Windows 3.1

      PC486/Windows 95

      PC-Pentium/Windows 98

      PC-Pentium/Windows NT

      Always check that in your project the following preprocessor directives are defined: LITTLE_ENDIAN and DOS. A project to build such a release with Visual C++ is provided under VisualC++/Standalone.

    3. On PC/Windows
    4. First proceed like for the PC/DOS platforms. Once synthXXXX is installed you can start building a DLL in the VisualC++\DLL directory. MbrolaDll.dsw is a Microsoft VisualC++ 5.0 project file to build a DLL. In any project you make to build a DLL with Mbrola don't forget to define the DLL, LITTLE_ENDIAN, DOS preprocessor definitions.

      The Mbrola source files and a wrapper DLL interface is included in the project, it should compile smoothly. In case you have to build a new project from scratch remember that you should include only file from either LibOneChannel/ or LibMultiChannel/. Never include files from Standalone/, as this directory is only relevant for a standalone mode (see section above for an exe binary).

      Several compilation modes are available, the "Win32 Bacon Static" is a good one to start with (Bacon compression scheme is included, DLL are statically linked).

      In the directory VisualC++/DLL_USE , little sample programs are given that use the Mbrola DLL.

      1. Black magic

      There is a strange bug in Visual C++, when you compile the project you sometime get:

      Linking...

      nafxcw.lib(dllmodul.cbj) : error LNK2005: _DllMain@12 already defined in LIBCMT.lib(dllmain.cbj)

      nafxcw.lib(afxmem.cbj) : error LNK2005: "void * __cdecl operator new(unsigned int)" (??2@YAPAXI@Z) already defined in LIBCMT.lib(new.cbj)

      nafxcw.lib(afxmem.cbj) : error LNK2005: "void __cdecl operator delete(void *)" (??3@YAXPAX@Z) already defined in LIBCMT.lib(delete.cbj)

      nafxcw.lib(dllmodul.cbj) : warning LNK4006: _DllMain@12 already defined in LIBCMT.lib(dllmain.cbj); second definition ignored

      nafxcw.lib(afxmem.cbj) : warning LNK4006: "void * __cdecl operator new(unsigned int)" (??2@YAPAXI@Z) already defined in LIBCMT.lib(new.cbj); second definition ignored

      nafxcw.lib(afxmem.cbj) : warning LNK4006: "void __cdecl operator delete(void *)" (??3@YAXPAX@Z) already defined in LIBCMT.lib(delete.cbj); second definition ignored

      Creating library MbrolaDl/Mbrola.lib and object MbrolaDl/Mbrola.exp

      Output\Release_Static\Mbrola.dll : fatal error LNK1169: one or more multiply defined symbols found

      Error executing link.exe.

      Mbrola.dll - 4 error(s), 7 warning(s)

      Solution: remove one file from the project and include it again in the list of source files, and build the project again. The problem vanishes.

    5. Using the standalone binary
    6. You are now ready to test the program. First try: "synth" to get an information screen about the copyright. Then, for a help screen on how to use the standalone version of the software, try :

      synth -h

      You get a help screen like the following:

      > USAGE: ./synth [COMMAND LINE OPTIONS] database pho_file+ output_file

      >

      >A - instead of pho_file or output_file means stdin or stdout

      >Extension of output_file ( raw, au, wav, aiff ) tells the wanted audio format

      >

      > Options can be any of the following:

      > -i = display the database information if any

      > -e = IGNORE fatal errors on unknown diphone

      > -c CC = set COMMENT char (escape sequence in pho files)

      > -F FC = set FLUSH command name

      > -v VR = VOLUME ratio, float ratio applied to ouput samples

      > -f FR = FREQ ratio, float ratio applied to pitch points

      > -t TR = TIME ratio, float ratio applied to phone durations

      > -l VF = VOICE freq, target freq for voice quality

      > -R RL = Phoneme RENAME list of the form a A b B ...

      > -C CL = Phoneme CLONE list of the form a A b B ...

      >

      > -I IF = Initialization file containing one command per line

      > CLONE, RENAME, VOICE, TIME, FREQ, VOLUME, FLUSH,

      > COMMENT, and IGNORE are available

      Now in order to go further, you need to get a version of an MBROLA language/voice database from the MBROLA project homepage. Let us assume you have copied the FR1 database and referred to the accompanying fr1.txt file for its installation.

      Then try:

      synth fr1/fr1 fr1/TEST/bonjour.pho bonjour.wav

      it uses the format:

      synth diphone_database command_file1 command_file2 ... output_file

      and creates a sound file for the word 'bonjour' (Hello! in French)

      Basically the output file is composed of signed integer numbers on 16 bits, corresponding to samples at the sampling frequency of the MBROLA voice/language database (16 kHz for the diphone database supplied by the authors of MBROLA : Fr1). MBROLA can produce different audio file formats: .au, .wav, .aiff, .aif, and .raw files depending on the ouput_file extension. If the extension is not recognized, the format is RAW (no header). We recommend .wav for Windows, and .au for Unix platforms.

      To display information about the phoneme set used by the database, type:

      synth -i fr1/fr1

      It displays the phonetic alphabet as well as copyright information about the database.

      Option -e makes Mbrola ignore wrong or missing diphone sequences (replaced by silence) which can be quite useful when debugging your TTS. Equivalent to "IGNORE" directive in the initialization file (N.B replace the obsolete ;;E=OFF , unsupported in .pho file).

       

      1. Changing the pitch
      2. Optional parameters let you shorten or lengthen synthetic speech and transpose it by providing optional time and frequency ratios:

        synth -t 1.2 -f 0.8 -v 0.7 fr1/fr1 TEST/bonjour.pho bonjour.wav

        or its equivalent in the initialization file:

        TIME 1.2

        FREQ 0.8

        for instance, will result in a RIFF Wav file bonjour.wav 1.2 times longer than the previous one (slower rate), and containing speech in which all fundamental frequency values have been multiplied by 0.8 (sounds lower). You can also set the values of these coefficients directly in a .pho file by adding special escape sequence like :

        ;; F=0.8

        ;; T=1.2

        You can change the voice characteristics with the -l parameter. If the sampling rate of your database is 16000, indicating -l 18000 allows you to shorten the vocal tract by a ratio 16/18 (children voice, or women voice depending on the voice you're working on). With -l 10000,you can lengthen the vocal tract by a ratio 18/10 (namely the voice of a Troll). The same command in an initialization file becomes "VOICE 10000".

        Option "-v" gives a VolumeRatio that multiplies each output sample. In the example below, each sample is multiplied by 0.7 (the loudness goes down). Warning: setting VolumeRatio too high generates saturation.

        synth -v 0.7 fr1/fr1 TEST/bonjour.pho bonjour.wav

        or add the line "VOLUME 0.7" in an initialization file

        The -c option lets you specify which symbol will be used as an escape sequence for comments and commands in .pho files. The default value is the semi-colon ';', but you may want to change this if your phonetic alphabet use this symbol, like in:

        synth -c ! fr1/fr1 TEST/test1.pho test2.pho test.wav

        equivalent to "COMMENT !" in an initialization file

        The -F option lets you specify which symbol will be used to Flush the audio output. The default value is #, you may want to change the symbol like in:

        mbrola -F FLUSH_COMMAND fr1/fr1 test.pho test.wav

        equivalent to "FLUSH FLUSH_COMMAND" in the initialization file.

      3. Using Pipes
      4. A - instead of command_file or output_file means stdin or stdout. On multitasking machines, it is easy to run the synthesizer in real time to obtain audio output from the audio device, by using pipes.

      5. Renaming and Cloning phonemes

      It may happen that the language-processing module connected to MBROLA doesn't use the same phonemic alphabet as the voice used. The Renaming and Cloning mechanisms help you to quickly solve such problems (without adding extra CPU load). The only limitation about phoneme names is that they can't contain blank characters.

      If, for instance, phoneme a in the mbrola voice you use is called my_a in your alphabet, and phoneme b is called my_b, then the following command solves the problem:

      synth -R "a my_a b my_b" fr1/fr1 test.pho test.wav

      You can give as many renaming pairs as you want. Circular definition is not a problem. E.g. "a b b c" will rename original [a] into [b] and original [b] into [c] independently ([a] won't be renamed to [c]).

      LIMITATION: you can't rename a phoneme into another that already exists.

      The cloning mechanism does exactly the same thing, though the old phoneme still exists after renaming. This is useful if you have 2 allophones in your alphabet, but the Mbrola voice only provides one.

      Imagine for instance, that you make the distinction between the voiced [r] and its unvoiced counterpart [r0] and that you are using a syllabic version [r=]. If as a first approximation using [r] for both is OK, then you may use an Mbrola voice that only provides one version of [r] by running:

      synth -C "r r0 r r=" fr1/fr1 test.pho test.wav

      which tells the synthesizer that [r0] and [r=] should be both synthesized as [r]. You can write a long cloning list of phoneme pairs to fit your needs.

      Renaming and cloning eats CPU since the complete diphone hash table has to be rebuilt, but once the renaming or cloning has occurred there is absolutely NO RELATED PERFORMANCE DROP. So using this feature is more efficient than a pre-processor is, though a simple phoneme mapping cannot always solve incompatibilities.

      Before renaming anything as #, check section 5.1.2

      When one has long cloning and renaming lists, you can conveniently write them into an initialization file according to the following format:

      RENAME a my_a

      RENAME b my_b

      CLONE r r0

      CLONE r r=

      The obsolete ";; RENAME a my_a" can't be used in .pho file anymore, but is correctly parsed in initialization files. Note to EN1 and MRPA users: the consequence of the change above is that you must change the previous call format "mbrola en1 en1mrpa..." into "mbrola -I en1mrpa en1 ...".

       

    7. Machine dependant hints for best using Mbrola
      1. On MSDOS
      2. With the standalone version, generating wav files is easier:

        synth fr1/fr1 TEST/bonjour.pho bonjour.wav

        Then you can play the RIFF Wav file with your favorite DOS or Windows sound utility. On OS/2 pipes may be used just like below.

      3. On modern Unix systems such as Solaris or HPUX or Linux
      4. Type:

        synth fr1 bonjour.pho -.au | audioplay

        where audioplay is your audio file player (* the name vary with the platform, e.g. splayer for HPUX *).

        If your audioplayer has problems with sun .AU files, try with .wav or .raw. Never use .wav format when you pipe the output (mbrola can't rewind the file to write the audio size in the header). Wav format was not developed for Unix (on the contrary Au format let you specify in the header "we're on a pipe, read until end of file").

        NOTE FOR LINUX: you can use the GPL rawplay program provided at

        ftp://tcts.fpms.ac.be/pub/mbrola/pclinux/

         

      5. On Sun4 ( old audio interface )
      6. Those machines are now quite old and only provide a mulaw 8Khz output. A hack is:

        synth fr1 input.pho - | sox -t raw -sw -r 16000 - -t raw -Ub -r 8000 - > /dev/audio

        Provided you have the public domain sox utility developed by Ircam, you should hear 'bonjour' without the need to create intermediate files. Note that we strongly recommend that you DON'T use SOX, since its resampling method (linear interpolation) will permanently damage the sound.

        Other solution: The UTILITY.ZIP file available from the MBROLA homepage provides RAW2SUN that does this conversion.

      7. On VAX or AXP workstations

To make it easier for users to find MBROLA, you should add the following command to your system startup procedure:

$ DEFINE/SYSTEM/EXEC MBROLA_DIR disk:[dir]

where "disk:[dir]" is the name of the directory you created for the MBROLA_DIR files. You could also add the following command to your system login command procedure:

$ MBROLA :== $MBROLA_DIR:MBROLA.EXE

$ RAW2SUN :== $MBROLA_DIR:RAW2SUN.EXE

to use the decsound device:

$ MCR DECSOUND - volume 40 -play sound.au

See also the MBR_OLA.COM batch file in the UTILITY.ZIP file available from the MBROLA Homepage if you cannot play 16 bits sound files on your machine.

  1. Default Parser Manual
  2. The default parser is the parser that was provided before release 3.01. Implicitly it means that you can replace it with your own one, thanks to the setParser_MBR function. Basically the work of the parser is to return to Mbrola a phoneme with a length, and its pitch points.

    We provide a default parser that allows you to give optional pitch points, the intonation curve being linearly interpolated between those points.

    1. Input file format
    2. Example of a command line :

      synth fr1/fr1 bonjour.pho bonjour.wav

      For example the phonetic input file bonjour.pho simply contains :

      ; Bonjour

      _ 51 25 114

      b 62

      o~ 127 48 170.42

      Z 110 53.5 116

      u 211

      R 150 50 91

      _ 91

      This shows the format of the input data required by MBROLA. Each line contains a phoneme name, a duration (in ms), and a series (possibly none) of pitch pattern points composed of two float numbers each: the position of the pitch pattern point within the phoneme (in % of its total duration), and the pitch value (in Hz) at this position.

      Hence, the second line of bonjour.pho :

      _ 51 25 114

      tells the synthesizer to produce a silence of 51 ms, and to put a pitch pattern point of 114 Hz at 25% of 51 ms. Pitch pattern points define a piecewise linear pitch curve. Notice that the pitch pattern they define is continuous, since the program automatically drops pitch information when synthesizing unvoiced phones.

      Blank characters or tabs separate the data on each line. Comments can optionally be introduced in command files, starting with a semi-colon ';'. This default can be overrun with the -c option of the command line.

      Another special escape sequence ';;' allow the user to introduce commands in the middle of .pho files as described below. This escape sequence is also affected by the -c option.

      1. Changing the Frequency Ratio or Time Ratio
      2. A command escape sequence containing a line like "T=x.x" modifies the time ratio to x.x, the same result is obtained on the fundamental frequency by replacing T with F, like in:

        ;; T = 1.2

        ;;F=0.8

      3. Flush the output stream

      Note, finally, that the synthesizer outputs chunks of synthetic speech determined as sections of the piecewise linear pitch curve. Phones inside a section of this curve are synthesized in one go. The last one of each chunk, however, cannot be properly synthesized while the next phone is not known (since the program uses diphones as base speech units). When using mbrola with pipes, this may be a problem. Imagine, for instance, that mbrola is used to create a pipe-based speaking clock on a HP:

      speaking_clock | mbrola fr1 - -.au | splayer

      which tells the time, say, every 30 seconds. The last phone of each time announcement will only be synthesized when the next announcement starts. To bypass this problem, mbrola accepts a special command phone, which flushes the synthesis buffer : "#"

      This default character can be replaced by another symbol thanks to the command:

      ;; FLUSH new_flush_symbol

      Another important issue with piping under UNIX, is the possibility to prematurely end the audio output, if for example the user presses the stop button of your application. Since release 3.01, Mbrola handles signals.

      If in the previous example the user wants to interrupt the speaking clock message, the application just needs to send the USR1 signal. You can send such a signal from the console with:

      kill -16 mbrola_process_number

      Once mbrola catches the signal, it reads its input stream until it gets EOF or a FLUSH command (hence, surrounding sections with flush is a good habit).

    3. Limitations of MBROLA

    There is no more limitation on the number of pitch points one can assign to a phoneme, or on the number of phonemes without pitch points. There is no more limitation on extra low pitch (sometime used to produce vocal fry).

    Phonemes can be synthesized with a maximum duration that depends on the fundamental frequency with which they are produced. The higher the frequency, the lower the duration. For a frequency of 133 Hz, the maximum duration is 7.5 sec. For a frequency of 66.5 Hz, is 5 sec. For a frequency of 266 Hz, is 3.75 sec.

  3. Programmer's Manual
  4. First, we describe in this section the object oriented philosophy used since release 3.01.

     

    1. Philosophy and architecture

Actually nothing (or nearly nothing) prevents us to program in standard C/ANSI with an object like convention which authorize:

 

      1. Encapsulation of Object's attributes
      2. Let's exemplify the programming conventions with the char Fifo found in Parser/fifo.h. First we define a structure describing a Fifo.

        typedef struct

        {

        char* charbuff; /* circular buffer for phonetic input */

        int buffer_pos; /* Current position */

        int buffer_end; /* Last available phoneme */

        int buffer_size; /* number of chars in Phobuffer */

        } Fifo;

        To make distinction between public and private data, the convention is to never directly access the features of a Fifo out of its fifo.c implementation file. To reach this goal we exclusively access members through function-like macros.

        #define charbuff(ff) ff->charbuff

        #define buffer_pos(ff) ff->buffer_pos

        #define buffer_end(ff) ff->buffer_end

        #define buffer_size(ff) ff->buffer_size

        It allows the following:

        Fifo* my_fifo;

        �..

        int length= buffer_size(my_fifo);

        The programmer should not cheat to discover whether buffer_size is a function or a macro, thus encapsulating the data and making them independent of the Fifo's real implementation (modulo a complete recompiling). C is not C++ and your compiler won't be able to carry out strong type checking just as with inline functions, that's the reason why attributes don't respect the full convention below (according to our conventions we should have use the name buffer_size_Fifo() ).

        The methods always respect the format: functionname_ObjectName just like below and take a pointer on the object as a first argument. Methods beginning with init are always constructor, and those beginning with close are destructors:

        Fifo* init_Fifo(int size);

        /*

        * Constructor with size of the buffer

        */

        void close_Fifo(Fifo* ff);

        /*

        * Release the memory

        */

        void reset_Fifo(Fifo* ff);

        /*

        * Forget previously entered data in the circular buffer

        */

        int write_Fifo(Fifo* ff, char *buffer_in);

        /*

        * Write a string of phoneme in the input buffer

        * Return the number of chars actually written

        */

        int readline_Fifo(Fifo* ff, char *line, int size);

        /*

        * Read a line from the input stream in a circular buffer

        * Return 0 if there's nothing to read

        */

      3. Inheritance and Polymorphism
      4. Inheritance alone can always be simulated through the is_a_client_of relation, the most interesting case being polymorphism. Polymorphism is interesting for multiple format database handling, and live input parser definition inside of the synthesizer.

        The abstract type below specifies an Input object providing the methods close, reset and readline .

        typedef struct Input Input;

        typedef int (*readline_InputFunction)(Input* in, char *line, int size);

        typedef void (*close_InputFunction)(Input* in);

        typedef void (*reset_InputFunction)(Input* in);

        struct Input

        {

        void* self;

        readline_InputFunction readline_Input;

        close_InputFunction close_Input;

        close_InputFunction reset_Input;

        };

        This type can be derived into Input_File (the input stream is a file) or Input_Fifo (the input stream comes from a Fifo as described above). The part of the object corresponding to the features overloaded on the basic Input type is stored in the self part.

        #include "input.h"

        #include "fifo.h"

        static int readline_InputFifo(Input* in, char *line, int size)

        { return( readline_Fifo((Fifo*) in->self,line,size) ); }

        static void reset_InputFifo(Input* in)

        { reset_Fifo((Fifo*) in->self); }

        static void close_InputFifo(Input* in)

        { MBR_free(in); }

        Input* init_InputFifo(Fifo* my_fifo)

        {

        Input* self= (Input*) MBR_malloc( sizeof(Input) );

        self->self= (void*) my_fifo;

        self->readline_Input= readline_InputFifo;

        self->close_Input= close_InputFifo;

        self->reset_Input= reset_InputFifo;

        return self;

        }

         

      5. Inheritance and cross-reference graph

The Database, Input and Parser objects contain deferred (=virtual) methods and thus allow polymorphism.

 

    1. Application Programming Interface
    2. The explanations given in the previous section are particularly useful to the user who wants to design ad-hoc parsers. Though one can keep on working with the default parser.

      1. One channel mode
      2. You can build a demo by running "make demo1" under Unix, or simply build the library with "make lib1". With Windows and Visual C++ the DLL project builds an equivalent of lib1, and numerous examples are provided in the DLL_USE directory. The complete one channel mode interface is given section 7.24. Let's exemplify the use below:

        First, initialize the engine with a diphone database. All the functions in the API return an error code. A negative value means there was a flaw during the process, in case of error, an explicit error message can be obtained from lastErrorStr_MBR().

        err_code= init_MBR("h:/mbrola/database/fr1" );

        if (err_code<0)

        handle_error();

        If the default parser is plugged, one can use the regular syntax in write_MBR to send phonemes to the engine:

        if ( ( write_MBR("_ 51 \n b 62 \n") < 0) ||

        ( write_MBR("o~ 127 50 170 \n Z 110\n") <0) ||

        ( WriteSpeechFile(output)<0) ||

        ( write_MBR("u 211 100 200\n R 150 \n_ 9\n#\n") < 0) ||

        ( WriteSpeechFile(output)<0) )

        handle_error();

        close_MBR();

        Each time one calls init_MBR(), one should call a pending close_MBR() to release allocated memory. Once close_MBR() is called, one can call init_MBR() for a brand new database. If one wish to work with the same database but forget previously entered phonemes, then use reset_MBR().

        Let's describe how WriteSpeechFile works:

        int WriteSpeechFile(FILE *output)

        {

        int i;

        while ( (i=readtype_MBR(buffer, 16000, LIN16)) == 16000)

        fwrite(buffer, 2, i, output);

        if (i>0)

        { /* write last chunk */

        fwrite(buffer,size,i,output);

        return 0;

        }

        else

        return i; /* return an error code */

        }

        It reads sample buffers from the engine until it can't get any more ( readtype_MBR returns 0), or an error occurs. Readtype can return 0 for two reasons: either a flush has been encountered, either we don't have enough data in the default parser, as it needs a look ahead to interpolate pitch values. This is the case after write_MBR("o~ 127 50 170 \n Z 110\n"), synthesis on the /Z/ can't be carried out until we get the pitch point on "u 211 100 200". This way asynchronous read/write operations are allowed.

        The small error handling function simply does:

        void handle_error()

        {

        char err[255];

        lastErrorStr_MBR(err,sizeof(err));

        printf("Code %i\n%s\n", lastError_MBR(), err);

        exit(-1);

        }

        At any time, one can use the get_* and set_* functions to modify internal parameters of the synthesizer.

        Important note about the vocal tract length capabilities: one can modify the size of the speaker's throat with setFreq_MBR. The lower this frequency, the deeper the voice. This very simple method takes advantage of the playback sampling rate to shift the formants up and down, just like when changing the speed of a tape player. Thus, to be effective, any call to setFreq_MBR must be accompanied with a call to the audio hardware setting the requested playback sample rate. Otherwise the speed and pitch will sound odd.

      3. Multi channel mode
      4. One can build a demo by running "make demo2" under Unix, or simply build the library with "make lib2". The complete multi channel mode interface is given section 7.25.

        It looks strangely close to the one channel mode, except that one passes a pointer to a synthesizer structure for every function. Another point is that it doesn't hide any more the parser's details to the user. Thus if one wants to use the default parser, one has to effectively build it.

        The following code build 3 independent default phoneme parsers:

        /* Input Fifo with a buffer of 100 chars */

        fifo1= init_Fifo(100);

        fifo2= init_Fifo(100);

        fifo3= init_Fifo(100);

        /* Input stream of the synthesizer */

        input1= init_InputFifo(fifo1);

        input2= init_InputFifo(fifo2);

        input3= init_InputFifo(fifo3);

        /* Plug the fifos on the default parsers */

        parser1= init_ParserInput(input1,"_",120.0,";",1.0,1.0);

        parser2= init_ParserInput(input2,"_",120.0,";",1.0,1.0);

        parser3= init_ParserInput(input3,"_",120.0,";",1.0,1.0);

        To use one's own parser, see the next section. Once this is done, as many databases as synthesis channels must be opened (let's say 3 channels in this example).

        Database* main_dba= init_DatabaseMBR2(argv[1],NULL,NULL);

        if (!main_dba)

        handle_error(True);

        Of course opening 3 or more times the same database would spoil a lot of memory since many internal structures could be shared. Instead of using init_DatabaseMBR2 one can clone an already opened database:

        Database* clone_dba1= copyconstructor_DatabaseMBR2(main_dba);

        Database* clone_dba2= copyconstructor_DatabaseMBR2(main_dba);

        Database* clone_dba3= copyconstructor_DatabaseMBR2(main_dba);

        Cloned database just behave like regular Database, i.e. their destructor must be called before leaving. Once we have a Parser input and a Database, we can open a synthesis channel:

        Mbrola* channel1= init_MBR2(clone_dba1,parser1);

        Mbrola* channel2= init_MBR2(clone_dba2,parser2);

        Mbrola* channel3= init_MBR2(clone_dba3,parser3);

        In this particular example, one can write phonemes in the parser, and read samples from the synthesis engine with instructions such as:

        write_Fifo(fifo1,"_ 51 \n b 62 \n o~ 100\n Z 120")

        while ((i=readtype_MBR2(channel1, buffer, 16000, LIN16))==16000)

        fwrite(buffer,size,i,output);

        Of course the call to write_Fifo is completely dependent of the fact that this example uses the default phoneme parser. In this particular case, the polymorphic object Parser, which was passed to the constructor of channel, reads its input data from Fifo1.

      5. Designing and plugging your own parser

The user can write his own implementation of a Parser, as long as it follows the definition of Parser/parser.h. The file parser_simple.c below gives an example of a parser that reads phonetic inputs with the format: Phoneme Duration Pitch_At_0% Pitch_At_100%.

In practice this example does not take into account that the Engine synthesize diphones. As the word states, a diphone is made of two phonemes, thus one must know both parts of the diphones to utter it. Thus each phoneme file being used with parser_simple must end with two silences: the first one reveal 1st half of the last phoneme, and the second one reveal the second half (a complete example is provided in VisualC++/DLL_USE/mbrola/parser_simple.cpp). Many people forget to include the second silence as the result sounds correct without. Though, the total length of the synthetic message won't agree with the requested one.

/*

* FPMs-TCTS SOFTWARE LIBRARY

*

* File: parser_simple.c

* Purpose: parse a simple "pho file" (demonstration of the mbrola DLL)

* Instanciation of parser.h

*

* Author: Vincent Pagel

* Email : mbrola@tcts.fpms.ac.be

*

* Copyright (c) 1995-2018 Faculte Polytechnique de Mons (TCTS lab)

*

* 18/09/98 : Created

*/

#include <stdio.h>

#include "mbrola.h"

#include "parser_simple.h"

static void reset_ParserSimple(Parser* parse)

{

/* nothing to do */

fseek( (File*) parse->self,0,SEEK_SET);

}

static StatePhone nextphone_ParserSimple(Parser* parse, LPPHONE* ph)

{

char phoneme[255]; /* phoneme name */

float length; /* length in milliseconds */

float pitch0; /* pitch at 0% */

float pitch100; /* pitch at 100% */

if ( fscanf( (FILE*)parse->self," %s %f %f %f ",phoneme,&length,&pitch0,&pitch100 ) ==4 )

{

*ph= init_Phone(phoneme,length);

appendf0_Phone(*ph, 0.0 , pitch0);

appendf0_Phone(*ph, 100.0, pitch100);

return PHO_OK;

}

else

{

return PHO_EOF;

}

}

static void close_ParserSimple(Parser* parse)

/* Destructor */

{

fclose( (FILE*) parse->self);

free(parse);

}

Parser* init_ParserSimple(char* input_name)

/*

* Constructor of the parser. Parse a text file of the form

* PHONEME LENGTH PITCH_AT_BEGINNING PITCH_AT_END

*/

{

FILE* input;

Parser* parse;

/* open the text file */

input=fopen(input_name,"rt");

if (!input)

return NULL;

parse= (Parser*) MBR_alloc( sizeof( struct Parser) );

parse->reset_Parser= reset_ParserSimple;

parse->close_Parser= close_ParserSimple;

parse->nextphone_Parser= nextphone_ParserSimple;

parse->self= (void*) input;

return(parse);

}

 

 

  1. Mbrola architecture
  2. In following chapters the exported functions and variables of all the source files in the project are described. After the file descriptions, a symbol index is provided to allow fast localization of any function, variable or define.

    1. File: Misc/common.h
    2. /*

      * Purpose: common utilities and defines

      * Author: Vincent Pagel

      */

      /******************

      * Definitions *

      ******************/

      /* Release number (automatically changed by "make version") */

      #define SYNTH_VERSION "3.01e2"

      #define WWW_ADDRESS "http://tcts.fpms.ac.be/synthesis"

      /* General trace */

      /* #define DEBUG */

      /* Trace of the hash table -> this debug make the program stop and

      * and print the access statistics in the hash table (may help to

      * check and tune access time on new databases)

      */

      /* #define DEBUG_HASH */

      /*

      * True and False should be used instead of integer values

      */

      /* Argh ! Depends on the compiler! Comment it if yours is not C/ANSI */

      #define bool int

      #define False 0

      #define True 1

      /*

      * ARCHITECTURE DEPENDENT !!!

      * These definitions should be imposed so that int8, int16 and int32

      * always refer to 8, 16 and 32 bits integer

      */

      #define uint8 unsigned char

      #define int8 char

      #define int16 short

      #define uint16 unsigned short

      #define int32 long

    3. File: Misc/incdll.h
    4. /*

      * Purpose: symbols needed outside of the mbrola sources

      * namely to compile the wrapper DLL

      * Author: Vincent Pagel

      */

       

      /*

      * Type of samples we can output with read_MBR

      */

      typedef enum {

      LIN16=0, /* same as intern computation format: 16 bits linear */

      LIN8, /* unsigned linear 8 bits, worse than telephone */

      ULAW, /* MU law -> 8bits, telephone. Roughly equ. to 12bits */

      ALAW /* A law -> 8bits, equivallent to mulaw */

      } AudioType;

       

    5. File: Misc/mbralloc.h
    6. /*

      * Purpose: memory allocation and freeing

      * Author: Vincent Pagel and Alain Ruelle

      */

      #define MBR_free(X) {free(X);X=NULL;}

      /* free a memory block and set the pointer to NULL */

      #define MBR_realloc(X,Y) realloc(X,Y)

      /* dummy reallocation for the moment */

      void *MBR_malloc(size_t size);

      /*

      * Check there's enough memory for the pointer

      */

      char *MBR_strdup( const char *str);

      /* standard strdup would use standard malloc */

    7. File: Misc/vp_error.h
    8. /*

      * Purpose: Errors management with debugging messages

      * Authors: V. Pagel and A. Ruelle

      */

      /*

      * For the DLL and LIBRARY mode, Error codes returned

      */

      #define ERROR_MEMORYOUT -1

      #define ERROR_UNKNOWNCOMMAND -2

      #define ERROR_SYNTAXERROR -3

      #define ERROR_COMMANDLINE -4

      #define ERROR_OUTFILE -5

      #define ERROR_RENAMING -6

      #define ERROR_PRGWRONGVERSION -10

      #define ERROR_TOOMANYPITCH -20

      #define ERROR_TOOMANYPHOWOPITCH -21

      #define ERROR_PITCHTOOHIGH -22

      #define ERROR_PHOLENGTH -30

      #define ERROR_PHOREADING -31

      #define ERROR_DBNOTFOUND -40

      #define ERROR_DBWRONGVERSION -41

      #define ERROR_DBWRONGARCHITECTURE -42

      #define ERROR_DBNOSILENCE -43

      #define ERROR_INFOSTRING -44

      #define ERROR_BINNUMBERFORMAT -60

      #define ERROR_PERIODTOOLONG -61

      #define ERROR_SMOOTHING -62

      #define ERROR_UNKNOWNSEGMENT -63

      #define ERROR_CANTDUPLICATESEGMENT -64

      #define ERROR_BOOK -70

      #define ERROR_CODE -71

      #define WARNING_UPGRADE -80

      #define WARNING_SATURATION -81

      /* buffer cumulating error messages when in lib or dll mode */

      extern char errbuffer[];

      extern int lasterr_code; /* Code of the last error */

      void fatal_message(const int code, const char *format, /* args */ ...);

      /*

      * Uses the format of a printf function

      * throw an exception when in library mode, or abort the program

      */

      void warning_message(const int code, const char *format, /* args */ ...);

      /*

      * Uses the format of a printf function

      * Just print a warning in the error buffer

      */

      #ifdef DEBUG

      void debug_message(char const *format, /* args */ ...);

      /* What's below is kind of ugly. When we're in C++ it can be replaced

      * by an inline debug_message function

      *

      * unavoidable if I don't want code to be generated in the release

      */

      #define debug_message1(A) debug_message(A)

      #define debug_message2(A,B) debug_message(A,B)

      #define debug_message3(A,B,C) debug_message(A,B,C)

      #define debug_message4(A,B,C,D) debug_message(A,B,C,D)

      #define debug_message5(A,B,C,D,E) debug_message(A,B,C,D,E)

      #define debug_message6(A,B,C,D,E,F) debug_message(A,B,C,D,E,F)

      #define debug_message7(A,B,C,D,E,F,G) debug_message(A,B,C,D,E,F,G)

      #define debug_message8(A,B,C,D,E,F,G,H) debug_message(A,B,C,D,E,F,G,H)

      #else

      /* don't generate anything */

      #define debug_message1(A)

      #define debug_message2(A,B)

      #define debug_message3(A,B,C)

      #define debug_message4(A,B,C,D)

      #define debug_message5(A,B,C,D,E)

      #define debug_message6(A,B,C,D,E,F)

      #define debug_message7(A,B,C,D,E,F,G)

      #define debug_message8(A,B,C,D,E,F,G,H)

      #endif

    9. File: Misc/audio.h
    10. /*

      * Purpose: audio files

      * Author: Vincent Pagel

      */

      /*

      * Audio file format

      */

      typedef enum {

      RAW_FORMAT=0, /* same as intern computation format: 16 bits linear */

      WAV_FORMAT ,

      AU_FORMAT ,

      AIF_FORMAT ,

      AIFF_FORMAT

      } WaveType;

      int write_int16s(int16 *buffer,int count,FILE *file);

      /* Write a buffer of int16 */

      void write_header(WaveType file_format, int32 audio_length, uint16 samp_rate, FILE *output_file);

      /* Write the header corresponding to the output audio format */

      WaveType find_file_format(char *name);

      /* Find the file format corresponding to the name's extension

      * raw=none wav=RIFF au=Sun Audio aif or aiff=Macintosh

      */

      /*

      * Sample type conversion routines for read_MBR

      */

      #ifdef LIBRARY

      void* zero_convert(void* buffer_out, int nb_move, AudioType sample_type);

      /*

      * Output zeros in a buffer according to the sample_type

      * Return the next position after the end of the buffer

      *

      * Returning NULL means fatal error

      */

      void* move_convert(void* buffer_out,int16* buffer_in,int nb_move, AudioType sample_type);

      /*

      * Move audio samples and convert them at the same time

      * Return the shifted pointer in buffer_out

      *

      * linear 16bits to linear16 :-) simply move

      * linear 16bits to linear8

      * linear 16bits to mulaw

      * linear 16bits to alaw

      *

      * Returning NULL means fatal error

      */

      #endif

    11. File: Database/database.h
    12. /*

      * Purpose: diphone database management

      * Author: Vincent Pagel

      */

      #define DIPHONE_RAW 1 /* The diphone wave database is raw */

      #define INFO_ESCAPE 0xFF /* Escape code in database informations (prevents from displaying) */

      #define MAX_INFO 10 /* information strings at the end of the dba */

      /*

      * Frame types in the MBR analysed database

      */

      typedef uint8 FrameType;

      #define VOICING_MASK 2 /* Voiced/Unvoiced mask */

      #define TRANSIT_MASK 1 /* Stationary/Transitory mask */

      #define NV_REG 0 /* unvoiced stable state */

      #define NV_TRA TRANSIT_MASK /* unvoiced transient */

      #define V_REG VOICING_MASK /* voiced stable state */

      #define V_TRA (VOICING_MASK | TRANSIT_MASK) /* voiced transient */

      /*

      * Main type

      */

      typedef struct Database Database;

      typedef bool (*getdiphone_DatabaseFunction)(Database* dba, DiphoneSynthesis *diph);

      typedef void (*close_DatabaseFunction)(Database* dba);

      typedef Database* (*init_DatabaseFunction)(Database* dba);

      struct Database

      {

      void* self; /* Polymorphic depends on Coding */

      char *dbaname; /* name of the diphone file */

      FILE *database; /* diphone wave file */

      int16 nb_diphone; /* Number of diphones in the database */

      long RawOffset; /* Offset for raw samples in database */

      uint8 Coding; /* Type of coding DIPHONE_RAW, or BACON */

      int16 Freq; /* Sampling frequency of the database */

      uint8 MBRPeriod; /* Period of the MBR analysis */

      int32 SizeMrk; /* Size of the pitchmark part */

      int32 SizeRaw; /* Size of the wave part */

      int32 Magic[2]; /* Magic header of the database */

      char Version[6]; /* Version of the database */

      char* sil_phon; /* Silence symbol in the database */

      char *DbaInfo[MAX_INFO]; /* information strings */

      uint8 nb_dbaInfo; /* number of available info strings */

      FrameType *pmrk; /* The whole pitch marks database */

      HashTab *diphone_table; /* Diphone index table */

      /* Virtual function for diphone wave loading */

      getdiphone_DatabaseFunction getdiphone_Database;

      /* Virtual function to release the memory */

      close_DatabaseFunction close_Database;

      };

      /* Convenient macros */

      #define dbaname(PDatabase) PDatabase->dbaname

      #define database(PDatabase) PDatabase->database

      #define nb_diphone(PDatabase) PDatabase->nb_diphone

      #define RawOffset(PDatabase) PDatabase->RawOffset

      #define Coding(PDatabase) PDatabase->Coding

      #define Freq(PDatabase) PDatabase->Freq

      #define MBRPeriod(PDatabase) PDatabase->MBRPeriod

      #define SizeMrk(PDatabase) PDatabase->SizeMrk

      #define SizeRaw(PDatabase) PDatabase->SizeRaw

      #define Magic(PDatabase) PDatabase->Magic

      #define Version(PDatabase) PDatabase->Version

      #define sil_phon(PDatabase) PDatabase->sil_phon

      #define DbaInfo(PDatabase) PDatabase->DbaInfo

      #define nb_dbaInfo(PDatabase) PDatabase->nb_dbaInfo

      #define pmrk(PDatabase) PDatabase->pmrk

      #define diphone_table(PDatabase) PDatabase->diphone_table

      /* convenience: pmrk may be compressed in the future */

      #define pmrkval(PDatabase,X) (PDatabase->pmrk[X])

       

      /*

      * Three parts of the Database header

      */

      bool ReadDatabaseHeader(Database* dba);

      /* Reads the diphone database header , and initialize variables */

      bool ReadDatabaseIndex(Database* dba);

      /*

      * Read the index table of diphones, and put them in the hash table

      */

      bool ReadDatabasePitchMark(Database* dba);

      /* Load pitch markers (Voiced/Unvoiced, Transitory/Stationnary) */

      bool ReadDatabaseInfo(Database* dba);

      /*

      * Extract textual information from the database if any

      */

      int getDatabaseInfo(Database* dba, char* msg, int size, int index);

      /*

      * Retrieve the ith info message, NULL means get the size

      */

      void init_real_frame(Database* dba, DiphoneSynthesis *diph);

      /*

      * Make the link between logical and physical frames -> used by loadiphs

      */

      /*

      * Initialisation and loading of Diphones -> depend on database Coding

      * Returning NULL means fail (check LastError)

      */

      Database* init_DatabaseBasic(Database* dba);

      /*

      * Basic version, read raw waves = Check there's no coding

      * Returning NULL means error

      */

      void close_DatabaseBasic(Database* dba);

      /* Release the memory allocated for the in-house BACON decoder */

      bool getdiphone_DatabaseBasic(Database* dba, DiphoneSynthesis *ds);

      /*

      * Basic loading of the diphone specified by diph. Stores the samples

      * Return False in case of error

      */

      Database* init_Database(char* dbaname);

      /* Generic initialization, calls the appropriate constructor

      * Returning NULL means fail (check LastError)

      */

      Database* init_rename_Database(char* dbaname,RenameList* rename,RenameList* clone);

      /*

      * A variant of init_Database allowing phoneme renaming on the fly

      * Returning NULL means fail (check LastError)

      *

      * rename and clone can be NULL to indicate there's nothing to change

      *

      * Renaming is a ONCE consuming operation (the database is changed

      * at loading) -> it involves a complete reconstruction of the hash table

      * but nothing else at run-time

      */

      #ifdef MULTICHANNEL_MODE

      Database* copyconstructor_Database(Database* dba);

      /* Creates a copy of a diphone database so that many synthesis engine

      * can use the same database at the same time (duplicate the file handler)

      *

      * Returning NULL means fail (check LastError)

      *

      * Highly recommended with multichannel mbrola, unless you can guaranty

      * mutually exclusive access to the getdiphone function

      */

      #endif

    13. File: Database/database_bacon.h
    14. /*

      * Purpose: Decode BACON coded diphone databases

      *

      * Authors: Nicolas Pierret, Olivier Van der Vrecken, Vincent Pagel

      */

      #define DIPHONE_BACON 2 /* BACON order 2 coding */

      typedef struct

      {

      float *P_COCO; /* Reconstructed excitation of AR filter */

      int16 P_FRAME_SIZE; /* Size of P_COCO */

      float *P_Codebook; /* Codebook of stochastic excitation vectors */

      int16 P_SUBFRAME_SIZE; /* Size of stochastic vectors in P_Codebook */

      int16 P_CODEBOOK_SIZE; /* Number of bits for index of Shape in P_Codebook */

      int16 P_CODEBOOK_SIZE_LOG; /* LOG2 of P_CODEBOOK_SIZE */

      float *P_Lar[2]; /* Table of coefs for AR filter order 2 */

      int16 P_AIVQ_SIZE; /* Number of quantized AR coefficients */

      float *P_GAIN_PITCH; /* Table of quantized gains for pitch excitation */

      int16 P_NBR_GAIN_PITCH; /* Size of P_GAIN_PITCH */

      float *P_GAIN_STOCH; /* Table of quantized gains for stochastic excitation */

      int16 P_NBR_GAIN_STOCH; /* Size of P_GAIN_STOCH */

      } DatabaseBacon;

      #define P_COCO(db) (db->P_COCO)

      #define P_FRAME_SIZE(db) (db->P_FRAME_SIZE)

      #define P_Codebook(db) (db->P_Codebook)

      #define P_SUBFRAME_SIZE(db) (db->P_SUBFRAME_SIZE)

      #define P_CODEBOOK_SIZE(db) (db->P_CODEBOOK_SIZE)

      #define P_CODEBOOK_SIZE_LOG(db) (db->P_CODEBOOK_SIZE_LOG)

      #define P_Lar(db) (db->P_Lar)

      #define P_AIVQ_SIZE(db) (db->P_AIVQ_SIZE)

      #define P_GAIN_PITCH(db) (db->P_GAIN_PITCH)

      #define P_NBR_GAIN_PITCH(db) (db->P_NBR_GAIN_PITCH)

      #define P_GAIN_STOCH(db) (db->P_GAIN_STOCH)

      #define P_NBR_GAIN_STOCH(db) (db->P_NBR_GAIN_STOCH)

      Database* init_DatabaseBacon(Database* dba);

      /*

      * Initializes the in-house BACON order 2 decoder

      */

    15. File: Database/database_old.h
    16. /*

      * Purpose: Decode raw formats before 2.05 release, here for

      * compatibility purpose

      * Use pretty much RAW functions

      * Author: Vincent Pagel

      */

      Database* init_DatabaseOld(Database* dba);

      /*

      * Initializes the old ones!

      */

    17. File: Database/diphone_info.h
    18. /*

      * Purpose: diphone descriptor

      * Authors: Vincent Pagel & Alain Ruelle

      */

      /*

      * Structure of the diphone database (as stored in memory)

      */

      typedef struct

      {

      char *left,*right; /* Name of the diphone */

      int32 pos_wave; /* position in SPEECH_FILE */

      int16 halfseg; /* position of center of diphone */

      int32 pos_pm; /* index in PITCHMARK_FILE */

      uint8 nb_frame; /* Number of pitch markers */

      } DiphoneInfo;

      /* Convenience macros */

      #define left(diphoneinfo) diphoneinfo->left

      #define right(diphoneinfo) diphoneinfo->right

      #define pos_wave(diphoneinfo) diphoneinfo->pos_wave

      #define halfseg(diphoneinfo) diphoneinfo->halfseg

      #define pos_pm(diphoneinfo) diphoneinfo->pos_pm

      #define nb_frame(diphoneinfo) diphoneinfo->nb_frame

      DiphoneInfo* init_DiphoneInfo(char* left, char* right);

      /* Allocate memory */

      DiphoneInfo* initclone_DiphoneInfo(DiphoneInfo* src, char* left, char* right);

      /* Allocate memory, and copy parameters from di, except the left-right name*/

      void close_DiphoneInfo(DiphoneInfo *di);

      /* Release the memory of the phoneme names */

      bool equalkey_DiphoneInfo(const DiphoneInfo* di1, const char*left, const char*right);

      /*

      * True if the keys for hashing are equal

      */

      int32 hash_DiphoneInfo(const char* left, const char* right);

      /*

      * Hashing function for searching the diphone name in diphone_table

      */

       

    19. File: Database/hash_tab.h
    20. /*

      * Purpose: coalescent hashing table

      * Author: Vincent Pagel

      */

      /* Return when the index is not find in the hash table */

      #define NONE -1

      /* Used to mark a hash cell as empty */

      #define EMPTY 255

      /* Wrapper structure */

      typedef struct

      {

      DiphoneInfo* content; /* Hashing information */

      uint8 hit; /* survey value for number of collisions */

      int16 next_one; /* link to the next database cell */

      } HashInfo;

      /* The whole diphone database */

      typedef struct

      {

      HashInfo *hash_tab; /* Hashing information */

      int16 nb_item; /* Number of elements in hash_tab */

      int16 first_free; /* First position free from the end of the table */

      #ifdef DEBUG_HASH

      int16 tot_nb_coup;

      int16 tot_coup;

      #endif

      } HashTab;

      /* Convenient macros */

      #define nb_item(Tab) Tab->nb_item

      #define first_free(Tab) Tab->first_free

      #define next_one(Tab,Index) Tab->hash_tab[Index].next_one

      #define hit(Tab,Index) Tab->hash_tab[Index].hit

      #define content(Tab,Index) Tab->hash_tab[Index].content

      HashTab *init_HashTab(int16 nb_item);

      /* Initialize a void hash_table */

      void close_HashTab(HashTab *hash_tab);

      /* Empty and release the hash_table */

      int16 searchdiph_HashTab(const HashTab *hash_tab, DiphoneInfo* one_cell);

      /*

      * Return the reference number of a diphone in the diphone database

      * Hash table search -> return NONE=-1 if the value is not present

      */

      int16 search_HashTab(const HashTab *hash_tab,const char* left, const char* right);

      /*

      * Return the reference number of a diphone in the diphone database

      * Hash table search -> return NONE=-1 if the value is not present

      */

      void add_HashTab(HashTab *hash_tab, DiphoneInfo* one_cell);

      /* Add a new reference in the diphone table */

      void diphone_rename_HashTab(HashTab *hash_tab,RenameList* rename);

      /*

      * Rename all occurences of diphones containing the phoneme X to phone Y

      * in the hash table according to rename which contains X-Y pairs

      *

      * WARNING 1: it costs some CPU and memory moves ....

      *

      * WARNING 2: This operation fundamentaly change the rank of elements

      * in the hash table, so

      *

      * FORGET ALL YOUR POINTER TO ELEMENTS OF THE TABLE

      */

      void diphone_clone_HashTab(HashTab *hash_tab, RenameList* clone);

      /* Make a copy of all occurences of diphones containing the phoneme

      * X to phoneme Y according to the clone list which contains X-Y pairs

      *

      * The clone list contains only ONE occurence of each phoneme

      */

      #ifdef DEBUG_HASH

      void tuning_HashTab(HashTab *hash_tab);

      /* Function for debug and tuning purpose */

      #endif

    21. File: Database/little_big.h
    22. /*

      * Purpose: IO little_endian aware

      * Author: Vincent Pagel

      */

      /*

      * Check that architecture is defined and define reading and writing operations

      * depending on it ( deals with byte swapping)

      */

      #ifdef LITTLE_ENDIAN

      #define MAGIC_HEADER 0x4f52424d

      #define readl_int32(X,Y) read_int32(X,Y)

      #define readl_int16(X,Y) read_int16(X,Y)

      #define readl_int16buffer(X,Y,Z) read_int16buffer(X,Y,Z)

      #define readl_uint16(X,Y) read_uint16(X,Y)

      #define readb_int32(X,Y) read_int32_swapped(X,Y)

      #define readb_int16(X,Y) read_int16_swapped(X,Y)

      #define readb_uint16(X,Y) read_uint16_swapped(X,Y)

      #define writel_int32(X,Y) write_int32(X,Y)

      #define writel_int16(X,Y) write_int16(X,Y)

      #define writeb_int32(X,Y) write_int32_swapped(X,Y)

      #define writeb_int16(X,Y) write_int16_swapped(X,Y)

      #else

      #ifdef BIG_ENDIAN

      #define MAGIC_HEADER 0x4d42524f

      #define readl_int32(X,Y) read_int32_swapped(X,Y)

      #define readl_int16(X,Y) read_int16_swapped(X,Y)

      #define readl_int16buffer(X,Y,Z) read_int16buffer_swapped(X,Y,Z)

      #define readl_uint16(X,Y) read_uint16_swapped(X,Y)

      #define readb_int32(X,Y) read_int32(X,Y)

      #define readb_int16(X,Y) read_int16(X,Y)

      #define readb_uint16(X,Y) read_uint16(X,Y)

      #define writel_int32(X,Y) write_int32_swapped(X,Y)

      #define writel_int16(X,Y) write_int16_swapped(X,Y)

      #define writeb_int32(X,Y) write_int32(X,Y)

      #define writeb_int16(X,Y) write_int16(X,Y)

      #else

      #error You should define BIG_ENDIAN (sun,hp,next..) or LITTLE_ENDIAN (pc,vax)

      #endif

      #endif

      /*

      * Read and write operations with/without byte swapping

      */

      void write_int16(int16 value, FILE *output_file);

      void write_int32(int32 value, FILE *output_file);

      void write_int16_swapped(int16 value, FILE *output_file);

      void write_int32_swapped(int32 value, FILE *output_file);

      void read_int16(int16 *value, FILE *output_file);

      size_t read_int16buffer(int16 *ptr, size_t nitems, FILE *stream);

      void read_uint16(uint16 *value, FILE *output_file);

      void read_int32(int32 *value, FILE *output_file);

      void read_int16_swapped(int16 *value, FILE *output_file);

      size_t read_int16buffer_swapped(int16 *ptr, size_t nitems, FILE *stream);

      void read_uint16_swapped(uint16 *value, FILE *output_file);

      void read_int32_swapped(int32 *value, FILE *output_file);

    23. File: Database/rename_list.h
    24. /*

      * Purpose: list of phoneme renamings

      * Author: Vincent Pagel

      */

      typedef struct

      {

      int nb_elem;

      int nb_available;

      char** rename_list;

      } RenameList;

      #define nb_elem(rl) (rl->nb_elem)

      #define nb_available(rl) (rl->nb_available)

      #define rename_list(rl) (rl->rename_list)

      RenameList* init_RenameList();

      /* Basic constructor, initialize to empty list */

      bool parse_RenameList(RenameList* my_rl, char* rename_string, bool multi);

      /*

      * Parsing the renaming pairs from a string

      * Returning False means wrong initializer (check LastErr)

      *

      * multi= True means multiset. If False, introducing a renaming pair

      * with the same key is an error (returns False)

      */

      void close_RenameList(RenameList* rl);

      /* Release the memory */

      bool append_RenameList(RenameList* rl, char* old_name, char* new_name, bool multi);

      /* Add a new renaming pair to the list

      * If multi is True, it's a multiset

      *

      * Return False if the key was allready present (if it's not multiset)

      */

      char* find_RenameList(RenameList* rl, char* str);

      /* finds the translation of 'str'. Null if not found */

      int size_RenameList(RenameList* rl);

      /* return the size of the renaming list */

    25. File: Engine/diphone.h
    26. /*

      * Purpose: Phone and diphone objects

      * Authors: Vincent Pagel

      */

      /*

      * PITCH MARKED DIPHONE DATABASE, CONSTANTS

      */

      #define MAX_MBR_PERIOD 200 /* MBR period (samples) -> base of 80Hz */

      #define MAX_LENGTH 10000 /* Longest diphone length in samples */

      #define NBRE_PM_MAX 2000 /* Max nbr of frames in a synth. segment*/

      /*

      * STRUCTURES representing diphone sequences to synthesize

      */

      /* A Diphone is made of 2 phonemes */

      typedef struct

      {

      int Length1; /* Length of first half-phoneme in samples */

      int Length2; /* Length of second half-phoneme in samples */

      Phone *LeftPhone; /* First phoneme */

      Phone *RightPhone;/* Second phoneme */

      } Diphone;

      /*

      * A DiphoneSynthesis is a Diphone equiped with information necessary to

      * synthesize it

      */

      typedef struct

      {

      Diphone d; /* The diphone to synthesize */

      DiphoneInfo* Descriptor; /* Descriptor in the diphone database */

      int nb_pm; /* Number of pitch markers to synthesize */

      int smoothw[2*MAX_MBR_PERIOD]; /* Difference vector between 2 ola frames*/

      bool smooth; /* True if Smoothw has a value */

      int16 buffer[MAX_LENGTH]; /* longest diphone */

      uint8 real_frame[NBRE_PM_MAX]; /* for skiping V - NV transition */

      uint8 physical_frame_type[NBRE_PM_MAX]; /* to get the nature of a given frame */

      uint8 tot_frame; /* physical number of frames of the diphone */

      } DiphoneSynthesis;

      /*

      * Convenient macros to access Diphone_synth structures

      */

      #define Descriptor(X) X->Descriptor

      #define Length1(X) X->d.Length1

      #define Length2(X) X->d.Length2

      #define LeftPhone(X) (X->d.LeftPhone)

      #define RightPhone(X) (X->d.RightPhone)

      #define nb_pm(X) X->nb_pm

      #define smoothw(X) X->smoothw

      #define smooth(X) X->smooth

      #define buffer(X) X->buffer

      #define real_frame(X) X->real_frame

      #define physical_frame_type(X) X->physical_frame_type

      #define tot_frame(X) X->tot_frame

      DiphoneSynthesis* init_DiphoneSynthesis();

      /* Alloc memory. Embedded Diphone */

      void reset_DiphoneSynthesis(DiphoneSynthesis* ds);

      /*

      * Forget the diphone in progress

      */

      void close_DiphoneSynthesis(DiphoneSynthesis* ds);

      /* Release memory and phone */

      int GetPitchPeriod(DiphoneSynthesis *dp, int cur_sample,int Freq);

      /*

      * Returns the pitch period (in samples) at position cur_sample

      * of dp by linear interpolation between pitch pattern points.

      */

       

    27. File: Engine/mbrola.h
    28. /*

      * Purpose: Diphone-based MBROLA speech synthesizer.

      * Author: Vincent Pagel

      */

      typedef struct

      {

      Database* diph_dba; /* A synth engine is linked to a database */

      Parser* parser; /* Phonemic command stream */

      /*

      * prev_diph points to the previous diphone synthesis structure

      * and cur_diph points to the current one. The reason is that to

      * synthesize the previous diphone we need information on the next

      * one. While progressing to the next diphone, prev_diph memory is

      * resetted the pointers are swapped between cur and prev diphones

      */

      DiphoneSynthesis *prev_diph, *cur_diph;

      /* Last_time_crumb balances slow time drifting in match_proso. time_crumb is

      * the difference in samples between the length really synthesized and

      * theoretical one

      */

      int last_time_crumb;

      float FirstPitch; /* default first F0 Value (fetched in the database) */

      int32 audio_length; /* File size, used for file formats other than RAW */

      int frame_number[NBRE_PM_MAX]; /* for match_prosody */

      int frame_pos[NBRE_PM_MAX]; /* frame position for match_prosody */

      int nb_begin;

      int nb_end; /* number of voiced frames at the begin and end the segment */

      bool saturation; /* Saturation in ola_integer */

      float ola_win[2*MAX_MBR_PERIOD]; /* OLA buffer */

      int16 ola_integer[2*MAX_MBR_PERIOD]; /* OLA buffer for file output */

      float weight[2*MAX_MBR_PERIOD]; /* Hanning weighting window */

      float volume_ratio; /* 1.0 is default */

      /*

      * The following variables are part of the structure for library mode

      * but could be local for standalone mode

      */

      bool odd; /* flip-flop for reversing 1 out of 2 unvoiced OLA frame */

      int frame_counter; /* frame being OLAdded */

      int buffer_shift; /* Shift between 2 Ola = available for output */

      int zero_padding; /* 0's between 2 Ola = available for output */

      bool smoothing; /* True if the smoothing algorithm is on */

      bool no_error; /* True to ignore missing diphones */

      uint16 VoiceFreq; /* Freq of the audio output (vocal tract length) */

      float VoiceRatio; /* Freq ratio of the audio output */

      #ifdef LIBRARY

      bool first_call; /* True if it's the first call to Read_MBR */

      int eaten; /* Samples allready consumed in ola_integer */

      #endif

      } Mbrola;

      /* Convenience macros */

      #define diph_dba(mb) mb->diph_dba

      #define parser(mb) mb->parser

      #define prev_diph(mb) mb->prev_diph

      #define cur_diph(mb) mb->cur_diph

      #define last_time_crumb(mb) mb->last_time_crumb

      #define FirstPitch(mb) mb->FirstPitch

      #define audio_length(mb) mb->audio_length

      #define frame_number(mb) mb->frame_number

      #define frame_pos(mb) mb->frame_pos

      #define nb_begin(mb) mb->nb_begin

      #define nb_end(mb) mb->nb_end

      #define saturation(mb) mb->saturation

      #define ola_win(mb) mb->ola_win

      #define ola_integer(mb) mb->ola_integer

      #define weight(mb) mb->weight

      #define volume_ratio(mb) mb->volume_ratio

      #define odd(mb) mb->odd

      #define frame_counter(mb) mb->frame_counter

      #define buffer_shift(mb) mb->buffer_shift

      #define zero_padding(mb) mb->zero_padding

      #define smoothing(mb) mb->smoothing

      #define no_error(mb) mb->no_error

      #define VoiceRatio(pt) (pt->VoiceRatio)

      #define VoiceFreq(pt) (pt->VoiceFreq)

      #define first_call(pt) (pt->first_call)

      #define eaten(pt) (pt->eaten)

      void set_voicefreq_Mbrola(Mbrola* mb, uint16 OutFreq);

      /* Change the Output Freq and VoiceRatio to change the vocal tract */

      uint16 get_voicefreq_Mbrola(Mbrola* mb);

      /* Get output Frequency */

      void set_smoothing_Mbrola(Mbrola* mb, bool smoothing);

      /* Spectral smoothing or not */

      bool get_smoothing_Mbrola(Mbrola* mb);

      /* Spectral smoothing or not */

      void set_no_error_Mbrola(Mbrola* mb, bool no_error);

      /* Tolerance to missing diphones */

      bool get_no_error_Mbrola(Mbrola* mb);

      /* Spectral smoothing or not */

      void set_volume_ratio_Mbrola(Mbrola* mb, float volume_ratio);

      /* Overall volume */

      float get_volume_ratio_Mbrola(Mbrola* mb);

      /* Overall volume */

      void set_parser_Mbrola(Mbrola* mb, Parser* parser);

      /* drop the current parser for a new one */

      Mbrola* init_Mbrola(Database* dba);

      /*

      * Connect the database to the synthesis engine, then initialize internal

      * variables. Connect the phonemic command stream later with set_parser_Mbrola

      */

      void close_Mbrola(Mbrola* mb);

      /* close related features and free the memory ! */

      bool reset_Mbrola(Mbrola* mb);

      /*

      * Gives initial values to current_diphone (not synthesized anyway)

      * -> it will give a first value for prev_diph when we make the first

      * NextDiphone call so that cur_diph= _-FirstPhon with lenght1=0

      * and prev_diph= _-_ with length2=0

      *

      * return False in case of error

      */

      StatePhone NextDiphone(Mbrola* mb);

      /*

      * Reads a phone from the phonetic command buffer and prepares the next

      * diphone to synthesize ( prev_diph )

      * Return value may be: PHO_EOF, PHO_FLUSH, PHO_OK, PHO_ERROR

      */

      bool MatchProsody(Mbrola* mb);

      /*

      * Selects Duplication or Elimination for each analysis OLA frames of

      * the diphone we must synthesize (prev_diph). Selected frames must fit

      * with wanted pitch pattern and phonemes duration of prev_diph

      *

      * Return False in case of error

      */

      void Concat(Mbrola* mb);

      /*

      * This is a unique feature of MBROLA.

      * Smoothes diphones around their concatenation point by making the left

      * part fade into the right one and conversely. This is possible because

      * MBROLA frames have the same length everywhere.

      *

      * output : nb_begin, nb_end -> number of stable voiced frames to be used

      * for interpolation at the end of Leftphone(prev_diph) and the beginning

      * of RightPhone(prev_diph).

      */

      void OverLapAdd(Mbrola* mb, int frame);

      /*

      * OLA routine

      */

      #ifdef LIBRARY

      /* LIBRARY mode: synthesis driven by the output */

      int readtype_Mbrola(Mbrola* mb, void *buffer_out, int nb_wanted, AudioType sample_type);

      /*

      * Reads nb_wanted samples in an audio buffer

      * Returns the effective number of samples read

      */

      #else

      /* STANDALONE MODE: Synthesis driven by the input */

      StatePhone Synthesis(Mbrola* mb);

      /*

      * Main loop: performs MBROLA synthesis of all diphones

      * Returns a value indicating the reasons of the break

      * (a flush request, a end of file, end of phone sequence)

      */

      #endif

       

    29. File: Parser/fifo.h
    30. /*

      * Purpose: a char fifo

      *

      * Author: Vincent Pagel

      */

      /* Size of the standard phonetic input buffer */

      #define FIFO_SIZE 8192

      #define LINE_FEED 0x0a

      typedef struct

      {

      char* charbuff; /* circular buffer for phonetic input */

      int buffer_pos; /* Current position */

      int buffer_end; /* Last available phoneme */

      int buffer_size; /* number of chars in Phobuffer */

      } Fifo;

      #define charbuff(ff) ff->charbuff

      #define buffer_pos(ff) ff->buffer_pos

      #define buffer_end(ff) ff->buffer_end

      #define buffer_size(ff) ff->buffer_size

      int readline_Fifo(Fifo* ff, char *line, int size);

      /*

      * Read a line from the input stream in a circular buffer

      * Return 0 if there's nothing to read

      */

      int write_Fifo(Fifo* ff, char *buffer_in);

      /*

      * Write a string of phoneme in the input buffer

      * Return the number of chars actually written

      */

      void reset_Fifo(Fifo* ff);

      /*

      * Forget previously entered data in the circular buffer

      */

      void close_Fifo(Fifo* ff);

      /*

      * Release the memory

      */

      Fifo* init_Fifo(int size);

      /*

      * Constructor with size of the buffer

      */

    31. File: Parser/input.h
    32. /*

      * Purpose: polymorphic type for input stream

      * Author: Vincent Pagel

      */

      typedef struct Input Input;

      typedef int (*readline_InputFunction)(Input* in, char *line, int size);

      typedef void (*close_InputFunction)(Input* in);

      typedef void (*reset_InputFunction)(Input* in);

      struct Input

      {

      void* self;

      readline_InputFunction readline_Input;

      close_InputFunction close_Input;

      close_InputFunction reset_Input;

      };

    33. File: Parser/input_fifo.h
    34. /*

      * Purpose: input stream from a fifo (instanciation of input.h)

      * Author: Vincent Pagel

      */

      Input* init_InputFifo(Fifo* my_fifo);

       

    35. File: Parser/input_file.h
    36. /*

      * Purpose: input stream from a file handler

      * Author: Vincent Pagel

      */

      Input* init_InputFile(FILE* my_file );

       

    37. File: Parser/parser.h
    38. /*

      * Purpose: polymorphic type to parse a "pho file"

      * Author: Vincent Pagel

      */

      /* Return values of the nextphone function */

      typedef enum {

      PHO_OK,

      PHO_EOF,

      PHO_FLUSH,

      PHO_ERROR

      } StatePhone;

       

      /* Polymorphic type */

      typedef struct Parser Parser;

      typedef void (*reset_ParserFunction)(Parser* ps);

      typedef void (*close_ParserFunction)(Parser* ps);

      typedef StatePhone (*nextphone_ParserFunction)(Parser* ps,Phone** ph);

      /*

      * Generic parser :

      * reset: forget remaining data in the buffer (when the user STOP synthesis for example

      *

      * close: release the memory

      *

      * nextphone: return the next Phoneme from input.

      *

      * PRECONDITION: this phoneme MUST have a pitch point at 0 and 100%

      *

      * THE CALLER IS IN CHARGE OF CALLING close_Phone ON THE PHONES HE GETS

      * WITH nextphone

      */

      struct Parser

      {

      void* self; /* Polymorphic on the real type */

      reset_ParserFunction reset_Parser; /* virtual func */

      close_ParserFunction close_Parser; /* virtual func */

      nextphone_ParserFunction nextphone_Parser; /* virtual func */

      };

    39. File: Parser/parser_input.h
    40. /*

      * Purpose: parse a "pho file" from a polymorphic input stream

      * Instanciation of parser.h

      *

      * Author: Vincent Pagel

      */

      Parser* init_ParserInput(Input* my_input, char* silence, float pitch, char* comment,float time_ratio, float freq_ratio);

      /*

      * Constructor of the parser. Need to know initial default pitch, and

      * initial default phoneme as well

      */

    41. File: Parser/phonbuff.h
    42. /*

      * Purpose: Table of phonemes to implement a simple .pho parser

      * Buffer of phonemes for pitch interpolation

      * Author: Vincent Pagel

      */

      #define MAXNPHONESINONESHOT 250 /* Max nbr of phonemes without F0 pattern*/

      /* A phonetic command buffer and its pitch points */

      typedef struct

      {

      Input* input; /* Polymorphic input stream */

      Phone* Buff[MAXNPHONESINONESHOT];/* Phonetic command buffer */

      int NPhones; /* Nbr of phones in the phonetic command buffer */

      int CurPhone; /* Index of current phone in the command buffer */

      StatePhone state_pho; /* State of the last phoneme serie: EOF FLUSH OK */

      bool Closed; /* True if the sequence is closed by a pitch point */

      /*

      * Silence is a special phoneme used for initialization (the first diphone

      * in the stream is always SILENCE-SILENCE. Also used for termination

      */

      char *default_phon;

      float default_pitch; /* first pitch point of the sequence */

      char *comment_symbol; /* user defined escape char */

      char *flush_symbol; /* user defined flush command */

      float TimeRatio; /* Ratio for the durations of the phones */

      float FreqRatio; /* Ratio for the pitch applied to the phones */

      } PhoneBuff;

      /* Convenient macro to access Phonetable */

      #define input(X) (X->input)

      #define CurPho(X) (X->Buff[X->CurPhone])

      #define NPhones(X) X->NPhones

      #define CurPhone(X) X->CurPhone

      #define Buff(X) X->Buff

      #define val_PhoneBuff(pt,i) (pt->Buff[i])

      #define state_pho(pt) (pt->state_pho)

      #define Closed(pt) (pt->Closed)

      #define default_phon(pt) (pt->default_phon)

      #define default_pitch(pt) (pt->default_pitch)

      #define comment_symbol(pt) (pt->comment_symbol)

      #define flush_symbol(pt) (pt->flush_symbol)

      #define TimeRatio(pt) (pt->TimeRatio)

      #define FreqRatio(pt) (pt->FreqRatio)

      /*

      * Last phone of the list

      */

      #define tail_PhoneBuff(pt) (val_PhoneBuff(pt,NPhones(pt)))

      /*

      * First phone of the list

      */

      #define head_PhoneBuff(pt) (val_PhoneBuff(pt,0))

      PhoneBuff* init_PhoneBuff(Input* my_input, char* default_phon,float default_pitch, float time_ratio, float freq_ratio,char* comment, char* flush);

      /*

      * Constructor, needs a phoneme and pitch for default allocation (begin

      * and end of synthesis)

      */

      void close_PhoneBuff(PhoneBuff *pt);

      /* free allocated strings in the phonetable */

      void reset_PhoneBuff(PhoneBuff *pt);

      /* Before a synthesis sequence initialize the loop with a silence */

      StatePhone next_PhoneBuff(PhoneBuff *pt,Phone** ph);

      /*

      * Reads a phone from the phonetic command buffer and prepares the next

      * diphone to synthesize ( prev_diph )

      * Return value may be: PHO_EOF,PHO_FLUSH,PHO_OK, PHO_ERROR

      *

      * NB : Uses only phones from 1 to ... NPhones-1 in the buffer.

      * Phone 0 = memory from previous buffer.

      */

      void init_FlushSymbol(PhoneBuff *pt, char *flush);

      /*

      * Build a new sscanf target to spot the flush symbol

      */

      void init_CommentSymbol(PhoneBuff *pt, char *comment);

      /*

      * Build a new sscanf target to spot the comment symbol

      */

      void init_SilenceSymbol(PhoneBuff *pt, char *silence);

      /*

      * Build a new sscanf target to spot the comment symbol

      */

    43. File: Parser/phone.h
    44. /*

      * Purpose: Phone objects

      * Authors: Vincent Pagel

      */

      /*

      * STRUCTURES representing phones and diphone sequences to synthesize

      */

      /* Pitch pattern point attached to a Phoneme */

      typedef struct

      {

      float pos; /* relative position within phone in milliseconds */

      float freq; /* frequency (Hz)*/

      } PitchPatternPoint;

      #define pos_Pitch(X) X->pos

      #define freq_Pitch(X) X->freq

      /* A Phoneme and its pitch points */

      typedef struct

      {

      char *name; /* Name of the phone */

      float length; /* phoneme length in ms */

      int NPitchPatternPoints; /* Nbr of pattern points */

      int pp_available; /* number of allocatables pitch points */

      PitchPatternPoint* PitchPattern;

      /* PitchPattern[0] gives F0 at 0% of the duration of a phone,

      and the last pattern point (PitchPattern[NPitchPatternPoints-1])

      gives F0 at 100% ( reserve 2 slots for 0% and 100% during interpolation ) */

      } Phone;

      /* Convenient macro to access Phone structure */

      #define tail_PitchPattern(X) (&(X->PitchPattern[X->NPitchPatternPoints-1]))

      #define head_PitchPattern(X) (&(X->PitchPattern[0]))

      #define val_PitchPattern(X,i) (&(X->PitchPattern[i]))

      #define length_Phone(X) (X->length)

      #define name_Phone(X) (X->name)

      #define NPitchPatternPoints(X) (X->NPitchPatternPoints)

      #define pp_available(X) (X->pp_available)

      #define PitchPattern(X) (X->PitchPattern)

      Phone* DLL_EXPORT initSized_Phone(char* name, float length,int nb_pitch);

      /*

      * Initialize a phoneme with its name and length in milliseconds

      * Indicate the planned number of pitch points ( added with appendf0)

      */

      Phone* DLL_EXPORT init_Phone(char* name, float length);

      /*

      * Initialize a phoneme with its name and length in milliseconds

      * 2 pitch points is the default (one at 0 one at 100)

      */

      void DLL_EXPORT reset_Phone(Phone *ph);

      /* Reset the pitch pattern list of a phoneme */

      void DLL_EXPORT close_Phone(Phone *ph);

      /*

      * Release the name in the string

      */

      void DLL_EXPORT appendf0_Phone(Phone *ph, float pos, float f0);

      /*

      * Append a pitch point to a phoneme ( position in % and f0 in Hertz )

      * resize the pitch point vector if too small

      */

      void applyRatio_Phone(Phone* ph, float ratio);

      /*

      * length and freq modified if the vocal tract length is not 1.0

      * internal use only

      */

       

    45. File: Standalone/synth.h
    46. /*

      * Purpose: Main function of the MBROLA speech synthesizer.

      * Authors: Vincent Pagel

      */

      /*

      * In standalone mode, input and ouput through files

      */

      extern FILE *command_file; /* File providing the phonetic input (can be stdin) */

      extern FILE *output_file; /* Audio output file (can be stdout) */

      /* used in standalone compilation mode */

      extern int main(int argc, char **argv);

    47. File: LibOneChannel/onechannel.h
    48. /*

      * Purpose: Diphone-based MBROLA speech synthesizer.

      * One synthesis channel per database

      *

      * Author: Vincent Pagel

      */

      int DLL_EXPORT init_MBR(char *dbaname);

      /*

      * Reads the diphone database

      * 0 if ok, error code otherwise

      */

      int DLL_EXPORT init_rename_MBR(char *dbaname,char* rename,char* clone);

      /*

      * Reads the diphone database

      * Rename and clone the list parsed from the parameter strings

      *

      * 0 if ok, error code otherwise

      */

      void DLL_EXPORT close_MBR(void);

      /* Free all the allocated memory */

      int DLL_EXPORT reset_MBR();

      /*

      * Reset the pho buffer with residual commands -> may be used as a kind of

      * "panic" flush when a sentence is interrupted

      * 0 means fail

      */

      int DLL_EXPORT readtype_MBR(void *buffer_out, int nb_wanted, AudioType sample_type);

      /*

      * Read nb_wanted samples in an audio buffer

      * return the effective number of samples read

      * or the negative error code we catch

      *

      * The sample_type may be LIN16, LIN8, ULAW, ALAW

      */

      int DLL_EXPORT read_MBR(void *buffer_out, int nb_wanted);

      /*

      * Read nb_wanted samples in an audio buffer

      * return the effective number of samples read

      * or the negative error code we catch

      *

      * Kept for compatibility

      */

      int DLL_EXPORT write_MBR(char *buffer_in);

      /*

      * Write a string of phoneme in the input buffer

      * Return the number of chars actually written

      * 0 mean not enough space in the buffer

      */

      int DLL_EXPORT flush_MBR();

      /*

      * Write a flush command in the stream (0 means fail). Used by client

      * applications in case the flush symbol has been renamed

      */

      int DLL_EXPORT getDatabaseInfo_MBR(char *msg,int nb_wanted,int index);

      /* Retrieve the ith info message, NULL means get the size */

      void DLL_EXPORT setFreq_MBR(int freq);

      /* Set the freq and voice ratio */

      int DLL_EXPORT getFreq_MBR();

      /* Return the output frequency */

      void DLL_EXPORT setNoError_MBR(int no_error);

      /* Tolerance to missing diphones */

      int DLL_EXPORT getNoError_MBR();

      /* Spectral smoothing or not */

      void DLL_EXPORT setVolumeRatio_MBR(float volume_ratio);

      /* Overall volume */

      float DLL_EXPORT getVolumeRatio_MBR();

      /* Overall volume */

      void DLL_EXPORT setParser_MBR(Parser* parser);

      /* drop the current parser for a new one */

      int DLL_EXPORT lastError_MBR();

      /* Return the last error code */

      int DLL_EXPORT lastErrorStr_MBR(char *buffer_err,int nb_wanted);

      /* Return the last error message available */

      void DLL_EXPORT resetError_MBR();

      /* Clear the Mbrola error buffer */

      int DLL_EXPORT getVersion_MBR(char *msg,int nb_wanted);

      /* Return the release number, e.g. "2.05a" */

    49. File: LibMultiChannel/multichannel.h
    50. /*

      * Purpose: multichannel Mbrola synthesis

      * Authors: Pagel Vincent

      */

      Database* DLL_EXPORT init_DatabaseMBR2(char* dbaname, char* rename, char* clone);

      /*

      * Give the name of the file containing the database, and parameters to

      * rename of clone phoneme names

      *

      * NULL on rename or clone means no modification to the database

      */

      Database* DLL_EXPORT copyconstructor_DatabaseMBR2(Database* dba);

      /* Creates a copy of a diphone database so that many synthesis engine

      * can use the same database at the same time (duplicate the file handler)

      *

      * Highly recommended with multichannel mbrola, unless you can guaranty

      * mutually exclusive access to the getdiphone function

      */

      void DLL_EXPORT close_DatabaseMBR2(Database* dba);

      /* Release the memory */

      void DLL_EXPORT close_ParserMBR2(Parser* pars);

      /*

      * Release the memory of the polymorphic type

      */

      Mbrola* DLL_EXPORT init_MBR2(Database* db, Parser* parse);

      /* Kick start the engine. Returning NULL means error */

      void DLL_EXPORT close_MBR2(Mbrola* mb);

      /* Free everything */

      int DLL_EXPORT reset_MBR2(Mbrola* mb);

      /*

      * Reset the pho buffer with residual commands -> used as a kind of

      * "panic" flush when a sentence is interrupted either with the stop

      * button, or in case of error

      * Return false in case of failure

      */

      int DLL_EXPORT readtype_MBR2(Mbrola* mb, void *buffer_out, int nb_wanted, AudioType sample_type);

      /*

      * Reads nb_wanted samples in an audio buffer

      * Returns the effective number of samples read

      */

      int DLL_EXPORT getDatabaseInfo_MBR2(Mbrola* mb,char *msg,int nb_wanted,int index);

      /* Retrieve the ith info message, NULL means get the size */

      void DLL_EXPORT setFreq_MBR2(Mbrola* mb,int freq);

      /* Set the freq and voice ratio */

      int DLL_EXPORT getFreq_MBR2(Mbrola* mb);

      /* Return the output frequency */

      void DLL_EXPORT setSmoothing_MBR2(Mbrola* mb, int smoothing);

      /* Spectral smoothing or not */

      int DLL_EXPORT getSmoothing_MBR2(Mbrola* mb);

      /* Spectral smoothing or not */

      void DLL_EXPORT setNoError_MBR2(Mbrola* mb, int no_error);

      /* Tolerance to missing diphones */

      int DLL_EXPORT get_no_error_MBR2(Mbrola* mb);

      /* Spectral smoothing or not */

      void DLL_EXPORT set_volume_ratio_MBR2(Mbrola* mb, float volume_ratio);

      /* Overall volume */

      float DLL_EXPORT get_volume_ratio_MBR2(Mbrola* mb);

      /* Overall volume */

      void DLL_EXPORT set_parser_MBR2(Mbrola* mb, Parser* parser);

      /* drop the current parser for a new one */

      int DLL_EXPORT lastError_MBR2();

      /* Return the last error code */

      int DLL_EXPORT lastErrorStr_MBR2(char *buffer_err,int nb_wanted);

      /* Return the last error message available */

      void DLL_EXPORT reset_error_MBR2();

      /* Clear the Mbrola error buffer */

      int DLL_EXPORT getVersion_MBR2(char *msg,int nb_wanted);

      /* Return the release number, e.g. "2.05a" */

    51. Index of symbols

    D= data segment, initialized global

    C= data segment, non initialized globals

    T= exported function

    t= private function

    Concat T Engine/mbrola.c

    FillCommandBuffer T Parser/phonbuff.c

    FlushFile T Engine/mbrola.c

    GetPitchPeriod T Engine/diphone.c

    LowerCase T Misc/audio.c

    MBR_malloc T Misc/mbralloc.c

    MBR_strdup T Misc/mbralloc.c

    MatchProsody T Engine/mbrola.c

    NextDiphone T Engine/mbrola.c

    OverLapAdd T Engine/mbrola.c

    ReadDatabaseHeader T Database/database.c

    ReadDatabaseIndex T Database/database.c

    ReadDatabaseInfo T Database/database.c

    ReadDatabasePitchMark T Database/database.c

    ReadDatabaseZstring T Database/database.c

    ReadLine T Parser/phonbuff.c

    Synthesis T Engine/mbrola.c

    add_HashTab T Database/hash_tab.c

    append_PhoneBuff T Parser/phonbuff.c

    append_RenameList T Database/rename_list.c

    appendf0_Phone T Parser/phone.c

    applyRatio_Phone T Parser/phone.c

    audio_swapped C Misc/audio.c

    close_DatabaseBacon t Database/database_bacon.c

    close_DatabaseBasic T Database/database.c

    close_DatabaseCebab t Database/database_cebab.c

    close_DatabaseInfo T Database/database.c

    close_DiphoneInfo T Database/diphone_info.c

    close_DiphoneSynthesis T Engine/diphone.c

    close_HashTab T Database/hash_tab.c

    close_InputFile t Parser/input_file.c

    close_Mbrola T Engine/mbrola.c

    close_ParserInput t Parser/parser_input.c

    close_Phone T Parser/phone.c

    close_PhoneBuff T Parser/phonbuff.c

    close_RenameList T Database/rename_list.c

    comment_symbol D Standalone/synth.c

    debug_message T Misc/vp_error.c

    diphone_clone_HashTab T Database/hash_tab.c

    diphone_rename_HashTab T Database/hash_tab.c

    equalkey_DiphoneInfo T Database/diphone_info.c

    errbuffer C Misc/vp_error.c

    fatal_message T Misc/vp_error.c

    find_RenameList T Database/rename_list.c

    find_file_format T Misc/audio.c

    free_residue_PhoneBuff T Parser/phonbuff.c

    freq_ratio D Standalone/synth.c

    getDatabaseInfo T Database/database.c

    get_no_error_Mbrola T Engine/mbrola.c

    get_smoothing_Mbrola T Engine/mbrola.c

    get_voicefreq_Mbrola T Engine/mbrola.c

    get_volume_ratio_Mbrola T Engine/mbrola.c

    getdiphone_DatabaseBacon t Database/database_bacon.c

    getdiphone_DatabaseBasic T Database/database.c

    getdiphone_DatabaseCebab t Database/database_cebab.c

    hash_DiphoneInfo T Database/diphone_info.c

    init_BaconTable t Database/database_bacon.c

    init_CebabTable t Database/database_cebab.c

    init_CommentSymbol T Parser/phonbuff.c

    init_Database T Database/database.c

    init_DatabaseBacon T Database/database_bacon.c

    init_DatabaseBasic T Database/database.c

    init_DatabaseCebab T Database/database_cebab.c

    init_DatabaseOld T Database/database_old.c

    init_DiphoneInfo T Database/diphone_info.c

    init_DiphoneSynthesis T Engine/diphone.c

    init_FlushSymbol T Parser/phonbuff.c

    init_Hanning T Engine/mbrola.c

    init_HashTab T Database/hash_tab.c

    init_InputFile T Parser/input_file.c

    init_Mbrola T Engine/mbrola.c

    init_ParserInput T Parser/parser_input.c

    init_Phone T Parser/phone.c

    init_PhoneBuff T Parser/phonbuff.c

    init_RenameList T Database/rename_list.c

    init_real_frame T Database/database.c

    init_rename_Database T Database/database.c

    init_tab D Database/database.c

    initclone_DiphoneInfo T Database/diphone_info.c

    initdummy_PhoneBuff T Parser/phonbuff.c

    initSized_Phone T Parser/phone.c

    interpolatef0_PhoneBuff T Parser/phonbuff.c

    lasterr_code C Misc/vp_error.c

    main T Standalone/synth.c

    mix T Database/hash_tab.c

    my_brole C Standalone/synth.c

    my_dba C Standalone/synth.c

    my_parse C Standalone/synth.c

    my_pitch C Standalone/synth.c

    next_PhoneBuff T Parser/phonbuff.c

    nextphone_ParserInput t Parser/parser_input.c

    one_diphone_clone_HashTab T Database/hash_tab.c

    oneshot_Mbrola T Engine/mbrola.c

    output_file C Standalone/synth.c

    parse_RenameList T Database/rename_list.c

    process_one_file T Standalone/synth.c

    read_BaconTables t Database/database_bacon.c

    read_CebabTables t Database/database_cebab.c

    read_int16 T Database/little_big.c

    read_int16_swapped T Database/little_big.c

    read_int16buffer T Database/little_big.c

    read_int16buffer_swapped T Database/little_big.c

    read_int32 T Database/little_big.c

    read_int32_swapped T Database/little_big.c

    read_uint16 T Database/little_big.c

    read_uint16_swapped T Database/little_big.c

    readline_InputFile t Parser/input_file.c

    reset_DiphoneSynthesis T Engine/diphone.c

    reset_InputFile t Parser/input_file.c

    reset_Mbrola T Engine/mbrola.c

    reset_ParserInput t Parser/parser_input.c

    reset_Phone T Parser/phone.c

    reset_PhoneBuff T Parser/phonbuff.c

    search_HashTab T Database/hash_tab.c

    searchdiph_HashTab T Database/hash_tab.c

    set_no_error_Mbrola T Engine/mbrola.c

    set_parser_Mbrola T Engine/mbrola.c

    set_smoothing_Mbrola T Engine/mbrola.c

    set_voicefreq_Mbrola T Engine/mbrola.c

    set_volume_ratio_Mbrola T Engine/mbrola.c

    shift_PhoneBuff T Parser/phonbuff.c

    size_RenameList T Database/rename_list.c

    time_ratio D Standalone/synth.c

    warning_message T Misc/vp_error.c

    write_header T Misc/audio.c

    write_int16 T Database/little_big.c

    write_int16_swapped T Database/little_big.c

    write_int16s T Misc/audio.c

    write_int32 T Database/little_big.c

    write_int32_swapped T Database/little_big.c

  3. Support

Mbrola Team

Faculté Polytechnique de Mons, TCTS Lab,

31, bvd Dolez, B-7000 Mons, Belgium.

tel : /32/65/374133

fax : /32/65/374129

e-mail: mbrola@tcts.fpms.ac.be, for general information, questions on the installation.