Rendered at 16:40:46 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
fractallyte 2 days ago [-]
I went into this imagining something like Synfig Studio (https://www.synfig.org/) or Moho (https://moho.lostmarble.com/). "Studio" here is quite far from what it actually is: lip-syncing in static characters.
I get that you're using AI to boost capability with less effort, but at the moment, I think the more specialized tools are still a better avenue for this.
Lastly, I followed the link to Jellypod (https://www.jellypod.com/). It's pretty good, but falls into a vocal "uncanny valley". Even a human reading from a script wouldn't sound that perfect; the enunciations immediately come across as artificial.
Now, if this was an extension to Synfig (also open source!), it would be a much more interesting venture...
vaporaviatorlab 2 days ago [-]
This looks great. Curious about the lip-sync — viseme set or just
open/closed mouths? The South Park style is super forgiving but
HyperFrames quality seems like it'd need more.
amd92 2 days ago [-]
The LLM/deterministic split is the smart call here. You can iterate on a script without the rest of the pipeline drifting under you. Curious how far the vowel-per-word heuristic holds before you wish you had Rhubarb, but "regenerates instantly" sounds like the right tradeoff for a studio loop.
comicink 2 days ago [-]
Very cool! I will definitely try this out - cartoons is something I have been interested in for a while. Will check it out.
mdrzn 2 days ago [-]
static video with text2speech audio and two circles moving representing the mouths: "OMG I might have a show on my hands "
Also, Moho offers far more comprehensive (and comprehensible!) lip-sync: https://lostmarble.com/papagayo/
I get that you're using AI to boost capability with less effort, but at the moment, I think the more specialized tools are still a better avenue for this.
Lastly, I followed the link to Jellypod (https://www.jellypod.com/). It's pretty good, but falls into a vocal "uncanny valley". Even a human reading from a script wouldn't sound that perfect; the enunciations immediately come across as artificial.
Now, if this was an extension to Synfig (also open source!), it would be a much more interesting venture...