![]() But it’s much more realistic than earlier competitor versions and takes seconds. ![]() It creates professional voiceovers for videos and blog posts, as examples.įliki is not the first automated voiceover platform. Flikiįliki is an AI-powered voiceover generator. I can now create solid presentations in 15 minutes versus an hour or more previously.ĭesignerBot costs $12 per month, billed annually, with a 14-day free trial (credit card required). But the templates provide graphs and even some data. I’ve never used DesignerBot’s entire presentation, having edited, removed, and added slides. In addition, the proposed video encoder-decoder outperforms all per-frame baselines currently used in the literature in terms of spatio-temporal quality and number of tokens per video.DesignerBot can generate editable timelines for presentations. To the best of our knowledge, this is the first time a paper studies generating videos from time variable prompts. time variable text or a story) in open domain. Compared to the previous video generation methods, Phenaki can generate arbitrary long videos conditioned on a sequence of prompts (i.e. To address data issues, we demonstrate how joint training on a large corpus of image-text pairs as well as a smaller number of video-text examples can result in generalization beyond what is available in the video datasets. The generated video tokens are subsequently de-tokenized to create the actual video. To generate video tokens from text we are using a bidirectional masked transformer conditioned on pre-computed text tokens. This tokenizer uses causal attention in time, which allows it to work with variable-length videos. To address these issues, we introduce a new causal model for learning video representation which compresses the video to a small representation of discrete tokens. Generating videos from text is particularly challenging due to the computational cost, limited quantities of high quality text-video data and variable length of videos. We present Phenaki, a model capable of realistic video synthesis given a sequence of textual prompts. The camera zooms out slowly to the skyscraper exterior. The lion wearing looks at the camera and smiles. Zoom out to the lion wearing a dark suit in an office room. The camera zooms into the lion's face, inside the office. We are in an office room with empty desks. The camera zooms into one of the many windows. The ocean and the coastline of a futuristic city.Ĭrash zoom towards a futuristic skyscraper. The camera points up to the sky through the water. We follow the blue fish as it swims in the dark ocean. The screen behind the astronaut displays fish swimming in the sea. The camera moves beyond the astronaut and looks at the screen. The astronaut leaves the keyboard and walks away. The astronaut leaves the keyboard and walks to the left. The camera moves away from the astronaut. The camera moves forward until showing an astronaut in the blue room. The camera gets inside the alien spaceship. This 2-minute story was generated using a long sequence of prompts, on an older version of the modelĪn alien spaceship arrives to the futuristic city. Input is the first frame, plus the prompt. Generating video from a still image + a prompt
0 Comments
Leave a Reply. |