You Are Not a Writer. You Are a Director of Photography.

When beginners use Midjourney, they type nouns: "A cat."

When experts use Midjourney, they type physics: "Subsurface scattering, volumetric lighting, f/1.8 aperture, shot on Kodak Portra 400."

We are witnessing a fundamental shift in the definition of "Creativity." For the last century, creativity in the visual arts was inextricably linked to manual dexterity—the ability to hold a brush, wield a chisel, or operate a camera. Today, creativity is decoupling from execution options. The "How" is becoming automated, leaving only the "What."

Diffusion models (Midjourney, Stable Diffusion, Flux.1) don't just "paint pixels." They simulate light paths. To master them, you don't need to know coding; you need to know Cinematography and Art History.

This is the difference between an AI image that looks like "AI Slop" (shiny, generic, plastic) and an image that looks like a frame from a Christopher Nolan movie. The difference isn't the model; it's the engineer behind the prompt.

The "Plastic Skin" Problem:
Most default AI generations have perfect lighting and perfect skin. It looks fake because reality is imperfect. The models are trained on curated stock photography and ArtStation portfolios, pushing them toward an "aesthetic mean" of perfection.
To fix this, we must force the model to render "imperfections."
Prompt Keywords: "Textured skin, film grain, chromatic aberration, minor blemishes, unsharp mask, motion blur."

Part 1: The Physics of Latent Space

To control the output, you must understand the engine. Generative AI doesn't have a database of images it "collages" together. It has a mathematical understanding of concepts in a high-dimensional vector space called "Latent Space."

Imagine a 500-dimensional map. In one corner, you have the mathematical concept of "Red." In another, "Apple." In another, "Gravity." When you prompt "Red Apple falling," the model navigates the vectors between these concepts to synthesize a new data point.

Diffusion vs. GANs

In the early days (2018), we used Generative Adversarial Networks (GANs). One AI tried to fake an image, and another tried to detect the fake. They fought until the result was indistinguishable from reality. This worked for faces (ThisPersonDoesNotExist.com) but failed for complex scenes.

Today, we use Diffusion Models. Diffusion works by destroying data. It takes a clear image and slowly adds static (Gaussian noise) until it is pure random snow. The AI learns the reverse process: how to take pure static and hallucinate a structured image out of it. When you prompt, you are guiding this "denoising" process. You are whispering to the AI, "I know this looks like static, but if you squint, it looks like a cyberpunk city."

Part 2: The Vocabulary of Light

Light defines the mood. If you don't describe the light, the AI defaults to "Global Illumination" (Video game lighting), which is flat and uninteresting. You must act as the Gaffer (Chief Lighting Technician).

1. Volumetric Lighting (God Rays):
Light beams visible in the air due to dust or fog. Adds epic scale and atmosphere. It implies a medium that light has to travel through, giving the scene depth.
Prompt: "Dusty atmosphere, god rays entering window, Tyndall effect."
2. Subsurface Scattering (SSS):
The way light penetrates translucent objects (like skin, wax, marble, or grapes). Without this, skin looks like plastic. With it, skin looks fleshy and warm. It tells the viewer the object is organic.
Prompt: "Backlit ear, subsurface scattering, translucent skin texture."
3. Chiaroscuro:
High contrast between light and dark. Think Rembrandt or Caravaggio. It directs the viewer's eye and adds drama, hiding details in the shadows.
Prompt: "Deep shadows, single directional light source, noir, low key lighting, high contrast."
4. Rim Lighting:
A light source placed behind the subject to highlight the edges. This separates the subject from the background, preventing them from blending into the darkness.
Prompt: "Rim light, backlighting, halo effect, silhouette."

Part 3: The Camera Body and Lens Choices

The AI simulates a physical camera. It understands focal length, aperture, and sensor size. The "Lens" you choose dictates the psychological feeling of the image.

Lens Length	Field of View	Psychological Effect	Best For
16mm / Ultrawide	100°+	Distorts edges. Shows the whole room. Creates a sense of epic scale or uneasy distortion.	Landscapes, cyberpunk cities, horror scenes.
35mm / 50mm	40-60°	The "Human Eye" view. Natural distortion. It feels honest, documentary, and grounded.	Street photography, photojournalism, environmental portraits.
85mm / 100mm	20-28°	Compresses the background. Flattering for faces. It isolates the subject from the world.	Headshots, beauty portraits, product photography.
200mm+ / Telephoto	<10°	Extreme compression. Background objects look huge relative to foreground. sniper-like focus.	Sports, wildlife, spying.

Aperture and Bokeh

Don't just say "blurry background." Use f-stops. The lower the f-number, the shallower the depth of field.

f/1.2 to f/2.8: Creamy bokeh. Only the eyes are in focus. Dreamy.
f/8 to f/11: Everything is crisp. Architectural photography.
f/22: Deep focus. Ansel Adams landscapes.

Part 4: Midjourney Parameters (The Control Panel)

Midjourney allows precise control via flags. These are the sliders on your synthesizer.

--stylize (0-1000): How much "opinion" the AI adds.
Low (--s 50): Literal adherence to prompt. Photorealistic. Use this when you want exactly what you asked for.
Default (--s 100): The standard Midjourney look.
High (--s 750): AI adds artistic flourishes, composition tweaks, and "beauty." Good for illustrations and logos.

--chaos (0-100): The variety in the initial grid.
High (--c 80): You want wild exploration. The 4 images will look totally different from each other.
Low (--c 0): You know exactly what you want. The 4 images will be subtle variations of the same composition.

--weird (0-3000): Adds bizarre, surreal elements.
Unlike chaos, weird introduces subject matter you didn't ask for to make the image "edgy."

--tile: Creates seamless textures for 3D mapping or wallpapers.
--no: Negative prompting (e.g. --no text, --no blur).

The Permutation Prompt

Pro users don't run one prompt. They run 40. Midjourney allows "Permutations" using curly braces.

/imagine prompt: A portrait of a cyberpunk hacker, {red, blue, green} neon lighting, shot on {35mm, 85mm} lens --v 6.0

This single command will generate 6 jobs (3 colors * 2 lenses), allowing you to A/B test your visual setting rapidly.

Part 5: Flux.1 and The Return of Textures

For years, AI couldn't spell. Midjourney v6 and the new open-source king Flux.1 (by Black Forest Labs) have solved this. But Flux brings something else: Raw Realism.

Midjourney has an "aesthetic bias"—it wants things to look pretty. Flux is more neutral. It is willing to make things look "ugly" if the prompt demands it. This makes it superior for photorealism where "grit" is required.

Prompting for Text in Flux:

"A neon sign that says 'HELLO WORLD' in distinct serif font."

Flux.1 is particularly good at following complex spatial instructions ("A red ball on the left of a blue cube, and a green pyramid floating above them"). Older models would blend these concepts into "Purple shapes."

Part 6: Advanced Workflow: Image-to-Image and ControlNet

Prompting is just step one. The real professionals use a multi-stage workflow.

1. Image-to-Image (Img2Img)

You sketch a stick figure on a napkin, photograph it, and feed it into the AI. "Turn this into a hyper-realistic war scene." The AI uses your sketch as the structural scaffolding. This breaks the "Random Composition" problem.

2. ControlNet (Stable Diffusion)

ControlNet is the holy grail of consistency. It allows you to extract specific features from a reference image:

OpenPose: Extracts the skeleton/pose of a human. You can force the AI to generate a character in that exact pose.
Canny / Lineart: Extracts the edges. Useful for coloring in drawings or architectural renders.
Depth: Extracts the 3D depth map. Useful for keeping the lighting accurate.

3. Inpainting / Outpainting

Never accept a "good enough" image. If the hands are messed up (a classic AI trope), don't re-roll. Use Inpainting. Mask the hands and prompt "Perfectly detailed hands, 5 fingers."

Outpainting allows you to "Zoom Out" from an image, generating new context around the borders. This is crucial for fixing aspect ratios for different platforms (Instagram Stories vs. YouTube Thumbnails).

Part 7: The Future: Video and World Building

We are currently in the "static image" phase, but the "video phase" (Sora, Runway Gen-3, Luma Dream Machine) plays by the same rules. The prompt engineering skills you learn today for static images are directly transferable to video.

The only difference is Temporal Consistency. In video, the "Latent Space" includes time. The challenge is ensuring that the "Red Apple" doesn't morph into a "Red Ball" as it falls. We are solving this with high-context windows that can "remember" the first frame while generating the last frame.

Part 8: Glossary of Visual Terms

Inpainting: Editing a specific part of an image (e.g., changing a shirt) while keeping the rest constant.
Outpainting: Extending the canvas boundaries (Zooming out/Panning).
Seed: The random number that generates the initial noise pattern. Keeping the seed constant allows for reproducible results.
Latent Space: The multi-dimensional mathematical void where the AI's concepts exist.
LoRA (Low-Rank Adaptation): A mini-model trained on a specific face or style that can be plugged into a base model. This is how you train the AI on your face.
Negative Prompt: Telling the AI what you don't want (e.g. "--no cartoon, --no 3d render").

Conclusion

Visual Prompt Engineering is a dying art. Eventually, we will just talk to the AI, and it will infer our intent perfectly. But for now, and for the next 5 years, understanding f-stops, film grain, and lighting ratios gives you a superpower.

It allows you to escape the "Average." The Average output of AI is impressive but soulless. The "Exceptional" output requires a human with taste, vision, and the technical vocabulary to guide the machine into the unmapped corners of latent space.

See, Understand, Optimize -
All in One Place

Atler Pilot decodes your cloud spend story by bringing monitoring, automation, and intelligent insights together for faster and better cloud operations.