well, I am posting this where I think it may be relevant... basically, this was part of a misc idea that came up, and I went and beat together the code for it (AKA: I don't expect it to amount to much). the idea was that I would combine together a speech synthesizer/TTS engine and a MIDI synth, and see if I could get much "interesting" from it (such as combining music and a synth'ed voice, singing TTS, ...). in general, it was created by mashing together 2 pieces of code I had written before, for which I had noticed some internal similarity: a TTS-engine / speech synth (where mostly I had used diphone, but had experimented some with formant); a MIDI synth, where in my case I had used wavetable synth. the TTS engine had had some of the usual front-end machinery, such as text normalization, phonetic dictionary handling/lookup, ..., so I kept this. the MIDI synth is, well, a MIDI synth... combining them, however, forced a good deal of alteration to the machinery for both. particularly, many pieces of functionality from the TTS engine (such as "voices") was absorbed into the MIDI synth, and wavetables/patches are essentially relative to the voice, ... however, the MIDI synth still plays midi-files, as before. as is, the voice patches override GM patches, but I am likely to move the voice patches to bank 2 (banks 0 and 1 being GM and GM2). the TTS frontend has been reworked mostly so that it produces short MIDI fragments, which basically rework the phonetic information into a stream of MIDI commands (the frontend has control over matters such as voice frequency and timing, ...). these commands mostly work in terms of a voice-derived wavetable, and AFAIK the process is a variant of formant synthesis, although I don't actually simulate the voice signal (mostly I use loops derived from various vowel and consonant sounds, as well as a few non-looping patches). mostly this is because it is a lot easier to get a convincing 'ah' or 'eh' by deriving it from an actual voice, and by using several recorded frequencies in an attempt to cover the vocal range (similar to how multiple recordings of an instrument at different notes are used internally in the wavetable...). nothing prevents me from using purely synthetic voices, only that I don't see as much need at present... dipthongs are currently synthed, but this doesn't sound very good, and I have doubts about using recorded diphthongs (mostly timing/frequency issues...). however, I am not sure of a good mechanism to synthesize them (simply blending between the adjacent sounds is not very good...). at this point, mostly still battling basic comprehensibility issues... otherwise, it may be worth noting that for composing the MIDI, I am using a textual representation of the command-stream, mostly as this is a little easier to compose (via sprintf/...) than would be a binary representation. an issue though is that of how to best represent a combination of text and MIDI information (for the input). one possibility is to just use an odd syntax to just sort of "stuff in" MIDI commands, but this seems not very good. another uncertainty is how to best represent commands to the voice (such as "speak in this particular note", "speak at this rate", ...). I guess of uncertainty is the issue commonly seen in singing things, where people will sing part of a word at one note, and then sing another part at another note, ... as-is, breaking up a word like this would confuse the dictionary, and to address this would require representing the words in phonetic form, ... another issue: for the phonetic form, is the IPA really necessary?... (internally, I don't use IPA, rather a customized ASCII-based notation, vaguely similar to SAMPA but currently without non-letter chars, and in many places different as I didn't know about SAMPA originally...). actually, personally I would rather change the notation some (reorganizing some of the letters, ...), but the main issue I guess is that I would have to rework my dictionary (may be a worthwhile tradeoff, in the past it would have been more difficult), ... I guess a partial issue is what is the most "ideal" notation for photetic transcriptions?... (part of my "ideal" I guess is the avoidance of non-ASCII characters, and preferably avoidance of any special characters as well...). current thinking: a-z: typical "base sounds" A-Z: typical "alternate sounds" ax-zx (excluding xx): additional alternate sounds, or, as a case-insensitive alternate to upper-case forms (for example, in filenames, ...). Ax-Zx: yet more alternate sounds aX-zX: yet more AX-ZX: yet more this allows 156 sounds, although... as is I had yet to exceed the prior limit of 56 (lower+upper case), though this is probably because I am generally being far less precise than the IPA?... 56 sounds could be done with: a-z A-Z | ax-zx this would be in contrast to my current notation, which uses 'x' as a prefix (for a similar purpose): xa-xz, xA-xZ, ... (and in which xa!=A, as is, I have to use an alternate notation in filenames, ...). the other major changes would be reorganizing some of the letter assignments (from my current notation) to be more "traditional"... (actually, I may use SAMPA partly as a template, trying mostly to add an alternate notation, AKA: without special symbols and more flexible WRT case, mostly so that it is safer to mix with file names, and with other syntactic elements which may also need to use these non-letter characters...). it is uncertain if it should remain as a mixed-case notation, or be forced into being a case-insensitive notation. my current bias is to keep it as case-sensitive, but allow certain alternate forms, mostly for file-naming (forcing a full case-insensitive notation is likely to just make things ugly...). or such... -- BGB: Hobbyist Programmer (Specialty: 3D, Compilers, VMs) Site: http://cr88192.dyndns.org/
![]() |
0 |
![]() |
On Jul 4, 12:27=A0pm, "cr88192" <cr88...@hotmail.com> wrote: > I guess of uncertainty is the issue commonly seen in singing things, wher= e > people will sing part of a word at one note, and then sing another part a= t > another note, ... Could you use the bender for this? -- lxt
![]() |
0 |
![]() |
"cr88192" <cr88192@hotmail.com> wrote in message news:h2o3e2$f08$1@news.albasani.net... > another issue: > for the phonetic form, is the IPA really necessary?... (internally, I don't > use IPA, rather a customized ASCII-based notation, vaguely similar to SAMPA > but currently without non-letter chars, and in many places different as I > didn't know about SAMPA originally...). If you ever want to distribute this as a useful application (who knows!), proper IPA support would be nice. SAMPA is a poor man's approximation of the IPA set, and it has a fair set of strange decisions ... however, it /can/ be entered with any keyboard. And, as always, you are free to decide for a scheme for yourself. The con is that you cannot /mix/ these approaches -- your own scheme could suddenly pop up and mess up an IPA phrase. Perhaps you could prefix each phrase with a unique identifier: "=hElo world" where the '=' indicates using your private system. As for needing more than the standard set of a..z/A..Z, SAMPA proves (for me :-) that throwing in even more ASCII characters for each unique sound doesn't really help. Perhaps you can get by with multi-character strings, although you should try to avoid 'ax', 'ex', 'ox' for sounds that have nothing to do with 'echh' -- "th" is easier to parse as a soft theta than "tx". All you need to do is finding a way to incorporate multi-character phonemes /without/ having them pop up unadvertently :-) -- bracket them? (F.e., "[th]eta" wise) Do you have a list of problematic phonemes? Interesting project! [Jw]
![]() |
0 |
![]() |
On 2009-07-05, [Jongware] <IdontWantSpam@hotmail.com> wrote: > "cr88192" <cr88192@hotmail.com> wrote: >> for the phonetic form, is the IPA really necessary?... (internally, >> I don't use IPA, rather a customized ASCII-based notation, vaguely >> similar to SAMPA but currently without non-letter chars, and in >> many places different as I didn't know about SAMPA originally...). > > If you ever want to distribute this as a useful application > (who knows!), proper IPA support would be nice. SAMPA is a poor > man's approximation of the IPA set, and it has a fair set of > strange decisions... however, it /can/ be entered with any keyboard. .... >> an issue though is that of how to best represent a combination >> of text and MIDI information (for the input). Whatever else it can handle, surely it must be able to handle the text simply in MIDI-lyric-events ? (If your application can't handle plain old MIDI-lyric-events, then someone is still going to have to write a MIDI-with-lyric-events-to-your-application-input-format translator.) > breaking up a word like this would confuse the dictionary, and to > address this would require representing the words in phonetic form, If the text is in MIDI-lyric-events, it can be reassembled into words ready for your dictionary, can't it? I mean, if the lyric-text ends with a space, that means end-of-word... If the text is in *phonetic* MIDI-lyric-events (I think they have to be in 7-bit ASCII, don't they?) then that means SAMPA, I guess. But if it comes from a MusicXML file, (translation not *too* hard, see http://www.pjb.com.au/midi/musicxml2mid.html ) then it could be Unicode. >> actually, personally I would rather change the notation some >> (reorganizing some of the letters, ...), but the main issue >> I guess is that I would have to rework my dictionary You shouldn't have to rework the dictionary, can't you just translate SAMPA or IPA or English or whatever (if you can do English, you can do any language, English is the most irregular) into your dictionary format ? >> how to best represent commands to the voice (such as "speak in >> this particular note", "speak at this rate", ...). If you want plain spoken TTS text within a MIDI file, I guess you can either (1) use something wierder than lyric-events, or (2) define a male-spoken-voice patch and a female-spoken-voice and any other voices you want in a bank somewhere. (I'm not completely sure I understand the question...) > Interesting project! Absolutely... synth'ed voice and singing TTS I'd just love to see! Can your MIDI synth use the same kind of soundfonts and stuff that timidity can be configured to use, I mean for the non-voice stuff ? Very interesting project! Could it replace both timidity and festival ? Regards, Peter -- Peter Billam www.pjb.com.au www.pjb.com.au/comp/contact.html
![]() |
0 |
![]() |
"luserXtrog" <mijoryx@yahoo.com> wrote in message news:cad33091-64c4-4c64-bccc-5f5f4d73ae3e@s31g2000yqs.googlegroups.com... On Jul 4, 12:27 pm, "cr88192" <cr88...@hotmail.com> wrote: >> I guess of uncertainty is the issue commonly seen in singing things, >> where >> people will sing part of a word at one note, and then sing another part >> at >> another note, ... > > Could you use the bender for this? > bender?... not sure which feature this is exactly (not sure of any MIDI command with that name...). part of the problem though is that commands tend to be represented sequentially, and it would be problematic to represent a command in the middle of the word without breaking up the word. I guess potentially a kind of prefix command could be used, but then the question would be "where in the word to change the note?". one idea I partly thought up is this: a word ends with '-', which indicates a word break. ^C4 merr- ^D4 ily ^E4 they ^A3 went ^G4 a- ^F4 long ^C4 their ^E4 way so, then it can join and look up the word, and try to guess where to break it again in the phonetic transcription... ^C4 *mer ^D4 *ily ... this transformation could be a little awkward though, as my TTS frontend is essentially structured around a stack machine...
![]() |
0 |
![]() |
"[Jongware]" <IdontWantSpam@hotmail.com> wrote in message news:23d84$4a508143$3ec348e5$25955@news.chello.nl... > "cr88192" <cr88192@hotmail.com> wrote in message > news:h2o3e2$f08$1@news.albasani.net... >> another issue: >> for the phonetic form, is the IPA really necessary?... (internally, I >> don't >> use IPA, rather a customized ASCII-based notation, vaguely similar to >> SAMPA >> but currently without non-letter chars, and in many places different as I >> didn't know about SAMPA originally...). > > If you ever want to distribute this as a useful application (who knows!), > proper > IPA support would be nice. SAMPA is a poor man's approximation of the IPA > set, > and it has a fair set of strange decisions ... however, it /can/ be > entered with > any keyboard. > And, as always, you are free to decide for a scheme for yourself. > The con is that you cannot /mix/ these approaches -- your own scheme could > suddenly pop up and mess up an IPA phrase. Perhaps you could prefix each > phrase > with a unique identifier: > "=hElo world" > where the '=' indicates using your private system. > ok. typically I have used '*' for phonetic fragments, maybe I could use '*' for my notation, and '[...]' for SAMPA?... *helowerld *DIsIzqfrexz of course, this would mean either supporting both in my backend (duplicated code/effort), or doing a transcription... (however, a transcription approach could also be made to handle IPA, where it would be transcribed...). note that, without brackets, my TTS engine tends to assume it is a normal word, either looking it up in the dictionary or trying to invoke phonics magic... > As for needing more than the standard set of a..z/A..Z, SAMPA proves (for > me :-) > that throwing in even more ASCII characters for each unique sound doesn't > really > help. Perhaps you can get by with multi-character strings, although you > should > try to avoid 'ax', 'ex', 'ox' for sounds that have nothing to do with > 'echh' -- > "th" is easier to parse as a soft theta than "tx". All you need to do is > finding > a way to incorporate multi-character phonemes /without/ having them pop up > unadvertently :-) -- bracket them? (F.e., "[th]eta" wise) Do you have a > list of > problematic phonemes? > ok, in my newer notation ax/ex/ox/... ended up being assigned to dipthongs (I freed up A/E/I/O/U for use as vowels, which had before contained both vowels and dipthongs). 'Ax'/'Ex'/... could be used for dipthongs instead, but I had used 'ax'/... for this. at first, I figured I could make dipthongs implicit, but then realized a bigger problem: I would need a notation to indicate when not to use dipthongs. q/Q is "redefined" in my notation as a vowel (allowing 12 base vowels, as well as 12 "extended" vowels, several of which are used as dipthongs). as is, I currently have about 10 base vowels (I started with 8, but with thinking came up with 2 more...). 'x' (in SAMPA) has been moved to 'K'. under the current scheme, "soft theta" (I assume 'voiced th' is meant by this) is 'D'. in my case, words are either pure photetic or pure textual. partial bracketing is not done as this would confuse the current processing machinery... I decided on keeping the system proper as case-sensitive, and essentially use a mangling hack to map it to a case-insensitive form. I guess the major alternative is to continue using my prior notation externally (essentially, a variant of the cmudict/Festival notation...). > Interesting project! > maybe, just something random in my case...
![]() |
0 |
![]() |
"Peter Billam" <peter@www.pjb.com.au> wrote in message news:slrnh514ip.v4h.peter@box8.pjb.com.au... > On 2009-07-05, [Jongware] <IdontWantSpam@hotmail.com> wrote: >> "cr88192" <cr88192@hotmail.com> wrote: >>> for the phonetic form, is the IPA really necessary?... (internally, >>> I don't use IPA, rather a customized ASCII-based notation, vaguely >>> similar to SAMPA but currently without non-letter chars, and in >>> many places different as I didn't know about SAMPA originally...). >> >> If you ever want to distribute this as a useful application >> (who knows!), proper IPA support would be nice. SAMPA is a poor >> man's approximation of the IPA set, and it has a fair set of >> strange decisions... however, it /can/ be entered with any keyboard. > ... >>> an issue though is that of how to best represent a combination >>> of text and MIDI information (for the input). > > Whatever else it can handle, surely it must be able to handle the > text simply in MIDI-lyric-events ? > > (If your application can't handle plain old MIDI-lyric-events, > then someone is still going to have to write a > MIDI-with-lyric-events-to-your-application-input-format translator.) > I had not thought of this... MIDI is probarily used by the synth backend, whereas TTS would require running it though the frontend, which had thus far assumed a sort of annotated text input... I will have to look into lyric events, and see if hopefully there is some good way to key the lyrics to the music (such as to particular MIDI channel or whatever). >> breaking up a word like this would confuse the dictionary, and to >> address this would require representing the words in phonetic form, > > If the text is in MIDI-lyric-events, it can be reassembled into words > ready for your dictionary, can't it? I mean, if the lyric-text ends > with a space, that means end-of-word... > I will have to look more. the issue is how closely the lyric events can be keyed to the notes. unlike a human, the TTS engine is much less capable at figuring these things out from context. > If the text is in *phonetic* MIDI-lyric-events (I think they have > to be in 7-bit ASCII, don't they?) then that means SAMPA, I guess. > But if it comes from a MusicXML file, (translation not *too* hard, > see http://www.pjb.com.au/midi/musicxml2mid.html ) then it could > be Unicode. > will have to look at this... didn't really know about MusicXML... basically, music is not really my strong area, and initially was approached more for pragmatic reasons, so I am not all that familiar with the field in general... >>> actually, personally I would rather change the notation some >>> (reorganizing some of the letters, ...), but the main issue >>> I guess is that I would have to rework my dictionary > > You shouldn't have to rework the dictionary, can't you just translate > SAMPA or IPA or English or whatever (if you can do English, you can > do any language, English is the most irregular) into your dictionary > format ? > well, the issue was mostly one of changed notation... however, since my dictionaries are initially translated from a different notation (the CMU dictionary notation), I modified my conversion tool and re-converted them (me thinking this is both easier and less lossy than it would be to make another tool to convert from my old notation to my new notation). some further tweaking on the phonics machinery, which did not perfectly handle the transition. but, yes, the dictionary approach "should" be able to more-or-less handle whatever language is used. however, the text normalization code/... would likely need to be adjusted (for example, providing alternate functions for things like how to read out numbers, alternate phonics rules for unhandled cases, ...). it would mean though providing various per-language dictionaries, ... so, currently, my main focus is English... >>> how to best represent commands to the voice (such as "speak in >>> this particular note", "speak at this rate", ...). > > If you want plain spoken TTS text within a MIDI file, I guess > you can either (1) use something wierder than lyric-events, or > (2) define a male-spoken-voice patch and a female-spoken-voice > and any other voices you want in a bank somewhere. > (I'm not completely sure I understand the question...) > basically, I have been assuming an annotated text input (vs a binary MIDI input). binary MIDI + lyric events is an interesting idea, just I had not thought of it... in the case of annotated text, it would be mostly an issue of deciding the exact notation for the annotations. the issue then becomes with "pure text" input, how to avoid accidentally interpreting unintended things as commands, .... as is though, the current default input is a text+commands format. just I will need to define a few more commands. FWIW, I currently don't really distinguish much between the male voice and the female voice, apart from maybe based on frequency, so am currently internally assuming more of an "androgynous" voice... partly though this is because I am lazy (combining together male and female derived vocal samples), and with this level of DSP, it is kind of hard to tell anyways, apart from based on frequency... if a female were singing at 100Hz, we can just make a simplifying assumption that she would sound like a male, and a male at 200Hz, the assumption could be a more femalish sound (AFAICT, this seems to be the primary difference anyways, as otherwise the accousics don't seem too much different...). >> Interesting project! > > Absolutely... synth'ed voice and singing TTS I'd just love to see! > Can your MIDI synth use the same kind of soundfonts and stuff that > timidity can be configured to use, I mean for the non-voice stuff ? > originally, yes... this is where I grabbed my original MIDI patches from, although I converted them, and since dropped the relevant code for the TTS effort (mostly in favor of wav's and text files). I could re-add the code from my original MIDI synth if needed though... > Very interesting project! > Could it replace both timidity and festival ? > I have not really compared it with either of them, more just something I had did for my own efforts. however, I had ripped off some data from both projects in my 2 source projects. current TTS with this synth is not as good as Festival, mostly for the reason that Festival uses diphone synthesis (as did my older synth), or potentially unit-selection (which has further increases in the "naturalness" of the speech), and for this one I had switched to a wavetable+formant approach (mostly because diphones are not very "flexible", and the DSP required to bend the pitch or tempo with them somewhat drastically reduced quality, as well as the inherent difficulty of having well-controlled pitch, .... whereas with a more synthetic approach, one has more fine-grain control from the start, and how flexible it is depends mostly on the code...). dropping diphones is not free though, as doing so suddenly makes it sound a good deal more like "MS Sam" and friends (although, the voice itself is more natural-sounding, the timing/intonation/... is clearly fake, and it has some similar accoustic artifacts to "MS Sam" and similar...). sadly, *lots* of additional and detailed fiddling would be required if the quality is to be any good... (or, even if a good level of comprehensibility is to be achieved...). I guess there is also "Flinger", which I guess uses unit selection + post-filtering, but I have not checked what the output is like... > Regards, Peter > > -- > Peter Billam www.pjb.com.au www.pjb.com.au/comp/contact.html
![]() |
0 |
![]() |
Not really competent to comment on most of this (except to agree that it's interesting stuff...!), but one point I can correct: In article <slrnh514ip.v4h.peter@box8.pjb.com.au>, Peter Billam <contact.html@www.pjb.com.au> wrote: >> "cr88192" <cr88192@hotmail.com> wrote: >>> an issue though is that of how to best represent a combination >>> of text and MIDI information (for the input). > >Whatever else it can handle, surely it must be able to handle the >text simply in MIDI-lyric-events ? This was my initial thought also. > > [....] >If the text is in *phonetic* MIDI-lyric-events (I think they have >to be in 7-bit ASCII, don't they?) then that means SAMPA, I guess. Actually they don't. The data inside a midifile metaevent (including lyric events) can be *any* sequence of bytes, because the length is specified in the prefix. So you'd be free to use any phonetic scheme you like. [Oh... responding to another post in the thread, the suggestion to use a "bender" was I'm sure referring to pitch-bend events. Because events in a midifile are timed, you could emit pitchbends to shift the note after the phoneme info had been sent, and still have it happen somewhere in the middle of the sound.] Cheers, -- Pete --
![]() |
0 |
![]() |
In article <h2qmo6$r26$1@news.albasani.net>, cr88192 <cr88192@hotmail.com> wrote: > >"Peter Billam" <peter@www.pjb.com.au> wrote in message >news:slrnh514ip.v4h.peter@box8.pjb.com.au... >> >> Whatever else it can handle, surely it must be able to handle the >> text simply in MIDI-lyric-events ? >> >I had not thought of this... > >MIDI is probarily used by the synth backend, whereas TTS would require >running it though the frontend, which had thus far assumed a sort of >annotated text input... > >I will have to look into lyric events, and see if hopefully there is some >good way to key the lyrics to the music (such as to particular MIDI channel >or whatever). Lyric events are timed just like any other, and can contain as little of the song text as you need (often just a syllable), so you can cue them exactly to the note timing. However, they don't include channel information directly, so you'd have to fudge something for that (maybe just adding the channel number to the text in some defined way). In theory you could put the lyric event in any track of a multitrack midifile (to connect with the notes in that track) but most sequencers and such assume that metaevents will be in track 0, so that might cause trouble. Cheers, -- Pete --
![]() |
0 |
![]() |
"Pete" <neverland@GOODEVEca.net> wrote in message news:3_CdnbSQL_-yuszXnZ2dnUVZ_s6dnZ2d@lmi.net... > In article <h2qmo6$r26$1@news.albasani.net>, > cr88192 <cr88192@hotmail.com> wrote: >> >>"Peter Billam" <peter@www.pjb.com.au> wrote in message >>news:slrnh514ip.v4h.peter@box8.pjb.com.au... >>> >>> Whatever else it can handle, surely it must be able to handle the >>> text simply in MIDI-lyric-events ? >>> >>I had not thought of this... >> >>MIDI is probarily used by the synth backend, whereas TTS would require >>running it though the frontend, which had thus far assumed a sort of >>annotated text input... >> >>I will have to look into lyric events, and see if hopefully there is some >>good way to key the lyrics to the music (such as to particular MIDI >>channel >>or whatever). > > Lyric events are timed just like any other, and can contain as little > of the song text as you need (often just a syllable), so you can cue them > exactly to the note timing. However, they don't include channel > information > directly, so you'd have to fudge something for that (maybe just adding the > channel number to the text in some defined way). In theory you could put > the lyric event in any track of a multitrack midifile (to connect with > the notes in that track) but most sequencers and such assume that > metaevents > will be in track 0, so that might cause trouble. > yep, otherwise it would just be monotone spoken lyrics, which would sort of be pointless... it is worth noting that the way my synth works, 1-3 channels are needed just to do the speech synth (typically only 1 is used, but 1-2 additional channels are needed for some constructions). this would mean a channel layout something like: channels 1-9: free for music channel 10: drum track 11-13: free for music (except with GM2, where 11 may be another drum track) 14-16: reserved for TTS luckily, I don't think MIDI files typically use this many channels... I guess likely it would also involve likely producing a secondary MIDI stream, and then merging it back into the first. I guess, if a channel is to be used for keying the voice to, another question is whether to leave this track in place, or strip the track prior to merging. all this would require re-architecting the process slightly. the another alternative would be to decompose the MIDI stream, and force the entire thing through the TTS engine, with it composing a new stream for the output, which would simply require providing a way of feeding MIDI commands through the TTS engine. I guess, either way, I might need some mechanism to support out-of-order events. both options seem awkward... a slightly less awkward approach could be an async write-joiner (type thinggy...), where code woud produce multiple streams, and then asynchronously write them into the joiner, which would then go about producing a new output stream with the various input streams merged together. potentially, it could also include a "channel allocator" (sort of like a register allocator...) such that the input streams need not worry as much about conflicting channels. (actually, the above is likely to be necessary as well for the construction of "drum machines" and other similar features, which would otherwise be difficult to produce...). it may seem odd, but I primarily use single-track midi streams (multi-track MIDI files are typically merged on read...). > Cheers, > -- Pete -- >
![]() |
0 |
![]() |
"Pete" <neverland@GOODEVEca.net> wrote in message news:IvydncqMKcCMuczXnZ2dnUVZ_tSdnZ2d@lmi.net... > Not really competent to comment on most of this (except to agree that > it's interesting stuff...!), but one point I can correct: > > In article <slrnh514ip.v4h.peter@box8.pjb.com.au>, > Peter Billam <contact.html@www.pjb.com.au> wrote: >>> "cr88192" <cr88192@hotmail.com> wrote: >>>> an issue though is that of how to best represent a combination >>>> of text and MIDI information (for the input). >> >>Whatever else it can handle, surely it must be able to handle the >>text simply in MIDI-lyric-events ? > This was my initial thought also. >> >> [....] >>If the text is in *phonetic* MIDI-lyric-events (I think they have >>to be in 7-bit ASCII, don't they?) then that means SAMPA, I guess. > Actually they don't. The data inside a midifile metaevent (including > lyric events) can be *any* sequence of bytes, because the length is > specified in the prefix. So you'd be free to use any phonetic scheme > you like. > > [Oh... responding to another post in the thread, the suggestion > to use a "bender" was I'm sure referring to pitch-bend events. > Because events in a midifile are timed, you could emit pitchbends > to shift the note after the phoneme info had been sent, and still > have it happen somewhere in the middle of the sound.] > oh, ok. this would be an odd way to do it though, and would be an ugly post-processing hack IMO. better I think would be to figure out the correct note to sing up-front... > Cheers, > -- Pete -- >
![]() |
0 |
![]() |
cr88192 said: > "luserXtrog" <mijoryx@yahoo.com> wrote... > "cr88192" wrote: > >>> I guess of uncertainty is the issue commonly seen in singing >>> things, where >>> people will sing part of a word at one note, and then sing >>> another part at >>> another note, ... >> >> Could you use the bender for this? > > bender?... > > not sure which feature this is exactly (not sure of any MIDI > command with that name...). Presumably he is referring to pitch-bend. But I don't think that's your problem. It seems to me that your problem is one of clean syntax design, or at least you hint as much in your OP. Pitch-bend may or may not be helpful as a solution to the problem of representing variable intonation, but it isn't going to solve your syntax problem for you. <snip> -- Richard Heathfield <http://www.cpax.org.uk> Email: -http://www. +rjh@ Forged article? See http://www.cpax.org.uk/prg/usenet/comp.lang.c/msgauth.php "Usenet is a strange place" - dmr 29 July 1999
![]() |
0 |
![]() |
"Richard Heathfield" <rjh@see.sig.invalid> wrote in message news:FcednfaRaruhHczXnZ2dnUVZ8hJi4p2d@bt.com... > cr88192 said: >> "luserXtrog" <mijoryx@yahoo.com> wrote... >> "cr88192" wrote: >> >>>> I guess of uncertainty is the issue commonly seen in singing >>>> things, where >>>> people will sing part of a word at one note, and then sing >>>> another part at >>>> another note, ... >>> >>> Could you use the bender for this? >> >> bender?... >> >> not sure which feature this is exactly (not sure of any MIDI >> command with that name...). > > Presumably he is referring to pitch-bend. But I don't think that's > your problem. It seems to me that your problem is one of clean > syntax design, or at least you hint as much in your OP. Pitch-bend > may or may not be helpful as a solution to the problem of > representing variable intonation, but it isn't going to solve your > syntax problem for you. > yeah, pretty much... there was the idea of using MIDI lyric events (and a binary MIDI input), but the problem here is how to key the lyrics to the music (apart from assuming extra data be included, but this would IMO defeat the point of lyric events...). however, one syntax idea that came to mind is to allow using '-' as a word break, such that a word break could be given, and notes changed. ^C4 merr- ^D4 ily ... another issue I realize now is one of timing: not only does one care the rate and frequency of the words, but also when the words are said. this opens up yet another set of awkward design issues (such as the possible need for timestamps, ...). so, yes, the "combined whole" is starting to look a little more complex than either MIDI or TTS by themselves... some of the issues could be addressed with certain features I had thought up, such as asynchronous MIDI-stream joiners, but timestamps is an issue in its own right. a very simple trick though could be to add explicit breaks along quarter-note boundaries, where a command is given that serves to re-align the TTS engine to the next note. ^| and then ^| I went ^| down the street ^| to find ^| my homies ^| on the beat ^| ... however, this leaves an issue of what to do if/when a synthed fragment goes over a note, where likely having it take 2-notes would not be the intended result (potentially throwing the lyrics out of sync with the beat, ...). (it probably doesn't help much that I don't really know "music theory" either...). and so on... > <snip> > > -- > Richard Heathfield <http://www.cpax.org.uk> > Email: -http://www. +rjh@ > Forged article? See > http://www.cpax.org.uk/prg/usenet/comp.lang.c/msgauth.php > "Usenet is a strange place" - dmr 29 July 1999
![]() |
0 |
![]() |
In article <h2rgqu$uvf$1@news.albasani.net>, cr88192 <cr88192@hotmail.com> wrote: > >"Pete" <neverland@GOODEVEca.net> wrote in message >news:3_CdnbSQL_-yuszXnZ2dnUVZ_s6dnZ2d@lmi.net... >> In article <h2qmo6$r26$1@news.albasani.net>, >> cr88192 <cr88192@hotmail.com> wrote: >>> >>>MIDI is probarily used by the synth backend, whereas TTS would require >>>running it though the frontend, which had thus far assumed a sort of >>>annotated text input... >>> >>>I will have to look into lyric events, and see if hopefully there is some >>>good way to key the lyrics to the music (such as to particular MIDI >>>channel or whatever). >> >> Lyric events are timed just like any other, and can contain as little >> of the song text as you need (often just a syllable), so you can cue them >> exactly to the note timing. However, they don't include channel >> information >> directly, so you'd have to fudge something for that (maybe just adding the >> channel number to the text in some defined way). In theory you could put >> the lyric event in any track of a multitrack midifile (to connect with >> the notes in that track) but most sequencers and such assume that >> metaevents will be in track 0, so that might cause trouble. >> >yep, otherwise it would just be monotone spoken lyrics, which would sort of >be pointless... I was assuming (initially at least) that there would only be one 'voice', so lyric events could be associated with a particular melody channel. Only if you wanted vocal harmony would you need some way of tagging them to different channels. (And note the distinction between "tracks" and "channels"...) > >it is worth noting that the way my synth works, 1-3 channels are needed just >to do the speech synth (typically only 1 is used, but 1-2 additional >channels are needed for some constructions). > Thinking a bit more about all this, though, it strikes me that there's been a bit too much concentration on "MIDI". It only comes in because your original synth code happens to use it. MIDI is good for its original purpose -- sending notes and other musical data down a wire -- but gets a bit contorted when you want to do much more. Midifiles can hold a few other things, like lyrics, but the MIDI protocol itself can't transmit these anywhere outside the computer actually processing the file. The only feature of pure MIDI that could transmit the extra information is the System Exclusive message (which seems to have escaped mention so far!), so I suppose if sticking with MIDI is important for some reason, you could use that. It would seem much better to adapt the way your synth is driven, as things like pitch and loudness could be supplied in any form (as the text you originally suggested for example). If eventually separating data source and sound generation into different machines is desirable, you might want to look at the "OSC" (Open Sound Control) protocol, which is intended as an open-ended successor to MIDI (and other things). It seems to have been -- and is being -- adopted by quite a few projects, both open-source and commercial. Its structure is such that you can transmit any kind of data over as many "channels" as you like. < http://opensoundcontrol.org/introduction-osc > Cheers, -- Pete --
![]() |
0 |
![]() |
"Pete" <neverland@GOODEVEca.net> wrote in message news:v7WdndW9PdWY48_XnZ2dnUVZ_v2dnZ2d@lmi.net... > In article <h2rgqu$uvf$1@news.albasani.net>, > cr88192 <cr88192@hotmail.com> wrote: >> >>"Pete" <neverland@GOODEVEca.net> wrote in message >>news:3_CdnbSQL_-yuszXnZ2dnUVZ_s6dnZ2d@lmi.net... >>> In article <h2qmo6$r26$1@news.albasani.net>, >>> cr88192 <cr88192@hotmail.com> wrote: >>>> >>>>MIDI is probarily used by the synth backend, whereas TTS would require >>>>running it though the frontend, which had thus far assumed a sort of >>>>annotated text input... >>>> >>>>I will have to look into lyric events, and see if hopefully there is >>>>some >>>>good way to key the lyrics to the music (such as to particular MIDI >>>>channel or whatever). >>> >>> Lyric events are timed just like any other, and can contain as little >>> of the song text as you need (often just a syllable), so you can cue >>> them >>> exactly to the note timing. However, they don't include channel >>> information >>> directly, so you'd have to fudge something for that (maybe just adding >>> the >>> channel number to the text in some defined way). In theory you could >>> put >>> the lyric event in any track of a multitrack midifile (to connect with >>> the notes in that track) but most sequencers and such assume that >>> metaevents will be in track 0, so that might cause trouble. >>> >>yep, otherwise it would just be monotone spoken lyrics, which would sort >>of >>be pointless... > I was assuming (initially at least) that there would only be one 'voice', > so lyric events could be associated with a particular melody channel. > Only if you wanted vocal harmony would you need some way of tagging them > to different channels. (And note the distinction between "tracks" and > "channels"...) ok... I had assumed something more like: music plays in the background; singing goes over the top of the music... >> >>it is worth noting that the way my synth works, 1-3 channels are needed >>just >>to do the speech synth (typically only 1 is used, but 1-2 additional >>channels are needed for some constructions). >> > Thinking a bit more about all this, though, it strikes me that there's > been a bit too much concentration on "MIDI". It only comes in because > your original synth code happens to use it. MIDI is good for its original > purpose -- sending notes and other musical data down a wire -- but gets > a bit contorted when you want to do much more. Midifiles can hold a > few other things, like lyrics, but the MIDI protocol itself can't transmit > these anywhere outside the computer actually processing the file. > yeah... actually, thus far I am only using a basic set of the built in features: turning on and off notes, along with program change... the TTS system basically works by sending program-change messages and turning on and off notes (corresponding to various voice-related patches). however, the synth itself was modified, mostly in terms of somewhat increasing the complexity of the 'wavetable' system (vs what was being used in the original synth). > The only feature of pure MIDI that could transmit the extra information > is the System Exclusive message (which seems to have escaped mention so > far!), so I suppose if sticking with MIDI is important for some reason, > you could use that. > well, my other main option would be to direct-drive the synth, which is mostly a whole bunch of signal processing and mixing code... > It would seem much better to adapt the way your synth is driven, as > things like pitch and loudness could be supplied in any form (as the text > you originally suggested for example). > yes, the text is currently the main input... the MIDI stage is essentially internal (what connects the frontend to the backend...). > If eventually separating data source and sound generation into different > machines is desirable, you might want to look at the "OSC" (Open Sound > Control) protocol, which is intended as an open-ended successor to MIDI > (and other things). It seems to have been -- and is being -- adopted by > quite a few projects, both open-source and commercial. Its structure is > such that you can transmit any kind of data over as many "channels" as > you like. > < http://opensoundcontrol.org/introduction-osc > > well, thus far, MIDI is working... I don't really actually need to send any additional info to the synth, just people had suggested playing MIDI files with lyrics (which I had not considered originally). but, supporting MIDI (files) at the same time as TTS is a terrible complexity (vs just using it for the synth stage of the TTS engine). so, it is not a synth which was itself extended to do voice, rather voice is being done via the synth (vs other possible approaches, such as via diphones or unit selection...). I don't actually have to send any additional info at present, FWIW... > Cheers, > -- Pete -- >
![]() |
0 |
![]() |
>>>"Pete" <neverland@GOODEVEca.net> wrote in message >>>> Lyric events are timed just like any other, and can contain as little >>>> of the song text as you need (often just a syllable), so you can cue >>>> them exactly to the note timing. However, they don't include channel >>>> information directly, so you'd have to fudge something for that >>>> (maybe just adding the channel number to the text in some defined way). It's true: or use sysex as also already suggested. Presumably if MIDI input did contain plain old lyric-events, those words ought to get applied to all voice-like channels, and there is a fair amount of music for which this would be adequate, e.g. http://www.pjb.com.au/muscript/samples/ich_fahr.pdf http://www.pjb.com.au/muscript/samples/ich_fahr.mid >>>> In theory you could put the lyric event in any track of a >>>> multitrack midifile (to connect with the notes in that track) >>>> but most sequencers and such assume that metaevents will be >>>> in track 0, so that might cause trouble. If this process is going to involve a custom input format, which it is (e.g. sysex), then the fact that other synths don't do metaevents on tracks>0 doesn't matter - they don't sing anyway. On 2009-07-07, cr88192 <cr88192@hotmail.com> wrote: > the MIDI stage is essentially internal (what connects the > frontend to the backend...). .... > so, it is not a synth which was itself extended to do voice, > rather voice is being done via the synth (vs other possible > approaches, such as via diphones or unit selection...). OK. OTOH I have a lot of MIDI files, like choir things where the the different voices on their different channels usually don't sing the same word at the same time, or poppier things where there's a lead singer and some "doo-waa" backing vocals, or plain old folk- tune and guitar, and I'd just love to be able to hear these things being *sung* on a synth somehow. We've had TTS for decades, and singing should actually be easier because the frequency is taken care of, and your project is tantalisingly close to doing the job... If some way could be standardised of getting lyrics-per-channel into a MIDI file (sysex or an extended lyric-event or multiple tracks), or some other input format (like MusicXML, which is a pigsty of a format but then MIDI also has its problems) then I've got a lot of music hanging out for some synth-extended-to-do-voice. Timidity doesn't do it, and though Festival with a sufficiently assertive pre-processor might set the frequencies right, it doesn't do rhythms, or other patches... Anyway, all the best with your most interesting project, Regards, Peter -- Peter Billam www.pjb.com.au www.pjb.com.au/comp/contact.html
![]() |
0 |
![]() |
"Peter Billam" <peter@www.pjb.com.au> wrote in message news:slrnh55ouu.1ni.peter@box8.pjb.com.au... >>>>"Pete" <neverland@GOODEVEca.net> wrote in message >>>>> Lyric events are timed just like any other, and can contain as little >>>>> of the song text as you need (often just a syllable), so you can cue >>>>> them exactly to the note timing. However, they don't include channel >>>>> information directly, so you'd have to fudge something for that >>>>> (maybe just adding the channel number to the text in some defined >>>>> way). > > It's true: or use sysex as also already suggested. > > Presumably if MIDI input did contain plain old lyric-events, those > words ought to get applied to all voice-like channels, and there > is a fair amount of music for which this would be adequate, e.g. > http://www.pjb.com.au/muscript/samples/ich_fahr.pdf > http://www.pjb.com.au/muscript/samples/ich_fahr.mid > yep. I guess, if one were to "customize" the MIDI some (as opposed to stock MIDI files), one possible option could be to use a sort of "magic patch", which would basically cause a channel to not be played (in itself), but would indicate that said channel is used mostly for coordinating the vocals. the issue then would be that of figuring a good way to attach the TTS to this channel. I guess one possible (somewhat different way), would be to, rather than storing the lyrics linearly, they are stored in a table. now, then, a slight additional trickery would be needed to figure out how to key each table entry to the particular notes being played. a tweaky hack could be to actually merge this table with the patches, such that a combination of ProgramChange and NoteOn would indicate to play a particular piece of lyric, with the NoteOn event effectively encoding which note to sing... banks 96-127 could be used for this purpose (or 32-63...). granted though, something like this would not be a backwards-compatible extension (I guess I have little idea what the defined behavior is for missing patches and unknown events). >>>>> In theory you could put the lyric event in any track of a >>>>> multitrack midifile (to connect with the notes in that track) >>>>> but most sequencers and such assume that metaevents will be >>>>> in track 0, so that might cause trouble. > > If this process is going to involve a custom input format, which > it is (e.g. sysex), then the fact that other synths don't do > metaevents on tracks>0 doesn't matter - they don't sing anyway. > yep... meanwhile I tend to always just use single-track MIDI anyways (merging multi-track files into a single track). > On 2009-07-07, cr88192 <cr88192@hotmail.com> wrote: >> the MIDI stage is essentially internal (what connects the >> frontend to the backend...). .... >> so, it is not a synth which was itself extended to do voice, >> rather voice is being done via the synth (vs other possible >> approaches, such as via diphones or unit selection...). > > OK. OTOH I have a lot of MIDI files, like choir things where the > the different voices on their different channels usually don't sing > the same word at the same time, or poppier things where there's a > lead singer and some "doo-waa" backing vocals, or plain old folk- > tune and guitar, and I'd just love to be able to hear these things > being *sung* on a synth somehow. We've had TTS for decades, and > singing should actually be easier because the frequency is taken > care of, and your project is tantalisingly close to doing the job... > partly though, it is about the synthesis as well... diphone synth or unit selection are unlikely to really be able to do singing. formant-like approaches are a lot more capable, nevermind the inherently reduced "naturalness"... > If some way could be standardised of getting lyrics-per-channel into > a MIDI file (sysex or an extended lyric-event or multiple tracks), > or some other input format (like MusicXML, which is a pigsty of a > format but then MIDI also has its problems) then I've got a lot of > music hanging out for some synth-extended-to-do-voice. > yep. hacked-over MIDI is one option. I guess another could be to make a somewhat modified/extended file-format, which could change a few things: potentially, a different coding for the command streams; if multi-tracks are used, they actually mean something (just splitting individual channels into their own tracks seems a little pointless IMO...); ability to send-in custom patches (sort of like mod and s3m...); .... but, then the question is whether something like this would actually be worthwhile. example of a revised command stream: all inputs are VLIs (although, I would probably switch to a more matroska-like form); a few of the LSB bits would define the value type (delay, opcode, event, ....); numeric fields are also VLIs. potentially, the stream could be made self-contained, such that an external container format is largely unnecessary (the file begins with a special sync opcode, and the whole rest of the file is a raw command stream). then again, I think such a format already exists, would just have to go look into it. > Timidity doesn't do it, and though Festival with a sufficiently > assertive pre-processor might set the frequencies right, > it doesn't do rhythms, or other patches... > yep. > Anyway, all the best with your most interesting project, > Regards, Peter > at this stage, I guess it is probably more of an "experiment" though... > -- > Peter Billam www.pjb.com.au www.pjb.com.au/comp/contact.html
![]() |
0 |
![]() |
"cr88192" <cr88192@hotmail.com> wrote in message news:h2o3e2$f08$1@news.albasani.net... > well, I am posting this where I think it may be relevant... > > basically, this was part of a misc idea that came up, and I went and beat > together the code for it (AKA: I don't expect it to amount to much). > > > the idea was that I would combine together a speech synthesizer/TTS engine > and a MIDI synth, and see if I could get much "interesting" from it (such as > combining music and a synth'ed voice, singing TTS, ...). > > It may put you off perhaps, but you could model your MIDI plus text input to what these guys are doing commercially. http://www.soundsonline-europe.com/Symphonic-Choirs-PLAY-Edition-pr-EW-182.h tml. This is MIDI notes plus a special text input program. It works well. There are also three tutorials on you tube that show how to use it. SysExJohn.
![]() |
0 |
![]() |