The Core Formula
HappyHorse 1.0 reads your prompt front-to-back as a unified instruction. The first sentence anchors everything that follows.
Every HappyHorse output that appears in this guide follows this pattern. The model uses early information to set the scene — once it has a clear subject and motion, later instructions (lighting, mood, style) tune the result rather than rebuild it.
Real example — Bohemian Amber Smile
The bohemian girl's pout gently comes alive in warm amber light — a subtle playful smile tugs at the corner of her lips, her large gold hoop earrings swaying softly with a slight head movement, scattering dancing amber light flecks. Her layered gold chain necklaces rise and fall with her breath. The camera executes a slow subtle arc around her, the warm bokeh in the background morphing from rounded circles to flowing light halos. Vintage film warm tone, gold jewelry macro texture, slice-of-life cinematic aesthetic.
Timestamp Structure
For complex, multi-phase scenes, use time-bracketed segments. This is the single most powerful technique in HappyHorse prompting.
The timestamp format tells HappyHorse 1.0 how to pace the motion and when to shift camera, subject, or energy. It works like a shot list. Four segments is the sweet spot — enough to build a scene arc, not so many that the model loses coherence.
Real example — Red Motorcycle Glass Tunnel
【0–3s】 First-person POV at the entrance of a glass tunnel — motorcycle dashboard at the bottom, tachometer needle twitching, circular steel tunnel receding into infinite distance. 【3–7s】 Sudden acceleration — cyan neon guide rails streak backward in motion blur. The red motorcycle body catches dynamic reflections against an orange sunset horizon. 【7–10s】 Camera phases through the glass wall to an exterior view — the red motorcycle moves like a bullet through a transparent pipe suspended high above the city, trailing a cyan light trail. 【10–12s】 Snap back to POV — tunnel exit light explodes to fill the frame in a blinding white cut.
Real example — Epic Dragon Covenant
【0–4s】 Low-angle upshot — grey storm clouds churn, dragon's scaled hide dominates the frame with faint embers of fire. A girl in an embroidered lace gown stands in the mountain wind. 【4–9s】 Profile view — she presses her palm against the dragon's jaw; silver light filaments radiate outward like a covenant. The dragon's eye dilates with ancient orange fire. 【9–13s】 Extreme close-up of dragon's pupil containing a micro-world, then a rapid pull-out as it roars; cloud ceiling shatters and sunlight pours through in cathedral columns. 【13–15s】 Overhead wide shot freeze — silhouettes stand like mythological sculptures in light columns. Overexposes to pure white.
Camera Language
HappyHorse 1.0 responds directly to cinematography language. These exact terms, extracted from top outputs, produce reliable results.
Shot Types
Camera Movement
Camera Transitions (from real outputs)
Lighting & Color Palette
Lighting is the single most impactful variable after subject. The best prompts also name a color palette — a direct contrast pair creates immediate visual richness.
Lighting from real outputs
Color palette contrast pairs
- Color palette: molten rose-gold vs electric teal
- Cyan neon guide rails against orange sunset horizon
- Silver light filaments, ancient orange dragon fire
- Cyan luminous text on skin — warm portrait depth
Dialogue & Lip-Sync
HappyHorse 1.0 generates audio natively and supports multilingual lip-sync in 7 languages. Write dialogue directly into the prompt using quotation marks.
- 01
Write the line in single quotes inside the prompt
Put the character's spoken words in single quotes with a description of how they say it.
...the bear's mouth moves in perfect lip-sync as it hurriedly says: 'There's no time left, I'm going to be late!' - 02
Describe the voice quality and expression
HappyHorse uses voice descriptors to match the audio tone to the visual character.
...it exclaims in a cute, precious voice: 'What is this? So delicious!' - 03
Describe the facial expression separately from the line
Emotion + expression prompts independently reinforce the character's reading.
Her facial expression remains sharp, focused, and determined. The camera should experience a slight jolt or impact shake as her fist nears the screen...
Full lip-sync example — Fluffy Monster
The fluffy little monster takes a hesitant but big, curious bite of the juicy burger; as its cheeks move rhythmically while chewing, its large eyes widen and light up with pure amazement; it exclaims in a cute, precious voice: 'What is this? So delicious!'; the monster quickly finishes the rest of the burger in a flash; then, the camera swiftly cuts to a wide shot of the monster running away; we see its back as it dashes through a sunlit meadow, jumping with joy and waving its arms while shouting enthusiastically, 'I need more!!!'
Style Reference Keywords
Style references anchor the aesthetic direction of the entire clip. These are the ones that actually appear in HappyHorse's best outputs.
Cinematic / Film
Fantasy / Epic
Texture & Detail descriptors
Text-to-Video vs Image-to-Video
Text-to-Video
Create a scene from words alone
Best for
- Cinematic and fantasy scenes from scratch
- Character-based clips with dialogue
- Action and motion-heavy sequences
- Multi-phase scenes with timestamps
Image-to-Video
Animate any still photo or product shot
Best for
- Product photography brought to life
- Brand imagery with consistent identity
- Portrait animation from a photo
- Real-estate and travel visuals
Weak vs Strong Prompts
The quality of your output is directly tied to prompt specificity. Same subject — completely different results.
Weak
a woman looking at the camera
Strong
The bohemian girl's pout gently comes alive in warm amber light — a subtle playful smile tugs at the corner of her lips, her large gold hoop earrings swaying softly. Vintage film warm tone, gold jewelry macro texture, slice-of-life cinematic aesthetic.
Weak
a woman punching toward the camera
Strong
The woman in the black leather suit forcefully thrusts her right fist directly toward the camera lens in a high-speed, explosive punch. Her long black hair whips and flows backward dynamically. Her expression remains sharp, focused, and determined. The camera should experience a slight jolt or impact shake as her fist nears the screen.
Weak
a motorcycle going fast through a tunnel
Strong
【0–3s】 First-person POV at the entrance of a glass tunnel — motorcycle dashboard at the bottom, tachometer needle twitching. 【3–7s】 Sudden acceleration — cyan neon guide rails streak backward in motion blur against an orange sunset horizon. 【10–12s】 Tunnel exit light explodes to fill the frame in a blinding white cut.
What to Avoid
- 01
No lighting instruction
Every strong HappyHorse output explicitly names a light source or quality. Without it, the model defaults to flat, inconsistent exposure.
- 02
Subject with no motion detail
HappyHorse generates video — static descriptions produce static-looking results. Every subject needs a motion or behaviour cue: "her lashes trembling," "tachometer needle twitching," "necklaces rise and fall with her breath."
- 03
Conflicting style references
Pick one aesthetic direction and commit. Mixing "documentary style, luxury commercial, and anime" in a single prompt produces visual noise rather than any of the three.
- 04
Forgetting aspect ratio for vertical content
For TikTok, Reels, or Shorts content, set the ratio before generating — not after. Default is 16:9.
- 05
Longer is not always better
Past ~100 words, additional detail shows diminishing returns. Prioritise your strongest 5–6 elements: subject motion, camera, lighting, color contrast, mood, style reference.
Before You Hit Generate
Run through this in 30 seconds before every generation.
- Subject includes at least one motion or behaviour cue
- Camera shot type and movement are named
- Lighting source or quality is specified
- A color contrast pair is named (or a dominant palette)
- One style reference is at the end
- Multi-phase scenes use 【Xs–Xs】 timestamp structure
- Dialogue lines are in single quotes with a voice descriptor
- Aspect ratio is set before generating (9:16 for social)
- Prompt is under 100 words — trim if needed
Ready to test your prompt? Open the generator — 1080p, ~38 seconds, no install.
Open Generator →