Following up on my YouTube channel creation from the previous article, on Mon. 25 May 2026 14:02:06 CEST, I wanted to write a program to automate the generation of my thumbnails.
Since my face will be shown on my YouTube channel, i've studied YouTubers thumbnails that have their face visible, such as LowLevel.
Anatomy of YouTube Thumbnails
And i came to the following conclusion:
The composition is simple -> arround 3 or 4 elements
Soper thumbnals we got:
-
The Background -> minimalistic enough, just one color tone
-
The upper body with (shoulders + face) -> The viewer sees the creator’s face before clicking, so the video feels less surprising once it starts. In theory, that can help retention.
Note that we do not want the PNG of the YouTuber upper body + face beeing horizontally cropped -> bad integration.
- Exaggerated facial expression: shock, confusion, desperation, or curiosity.
Note that the YouTuber's PNG must not take more than 1/3 of the space (especially horizontal space)
The PNG is often placed on the right, with the YouTuber looking left, towards the title.
-
The title (monochromatic)
-
When the Topic is about a company we add the logo above the title, if we can update the title font / color with the Logo + Thematic that's great !
That's basically all, but for me i wanted to add the logo of my YouTube channel in small at the top left of each video.
Novelty and Identity
First, the composition of thumbnails on a YouTube channel should not differ too much from one video to another.
We can keep the same composition and/or color sheme.
Some creators change the color identity, which is fine: it helps the viewer immediately understand that this is a new video.
But in my opinion, changing the title and the YouTuber PNG is already enough to express novelty.
That's what I'll do.
Prepare a dozen PNGs of myself with different expressions, adapted to the subject.
But keep a rotation of PNGs to preserve novelty.
And I’ll keep the same composition, background, title font, and title color to preserve the channel identity.
You know what ?
I'll keep the exact same T-Shirt for the PNGs and the time when I'll record a video.
Uploading Selfies onto my computer with a tiny Go server
Alright, let's create a Go HTTP server that handles file upload :)
package main
import (
"fmt"
"io"
"log"
"net/http"
"os"
"path/filepath"
"time"
)
const uploadDir = "uploads"
// 2 GB max upload size
const maxUploadSize = 2 << 30
func main() {
if err := os.MkdirAll(uploadDir, 0755); err != nil {
log.Fatal(err)
}
http.HandleFunc("/", uploadPage)
http.HandleFunc("/upload", uploadHandler)
log.Println("Server running on http://0.0.0.0:8080")
log.Fatal(http.ListenAndServe(":8080", nil))
}
func uploadPage(w http.ResponseWriter, r *http.Request) {
html := `
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Upload Video</title>
</head>
<body>
<h1>Upload a file</h1>
<form method="POST" action="http://192.168.1.20:8080/upload" enctype="multipart/form-data">
<input type="file" name="file" accept="video/*,image/*" required>
<button type="submit">Upload</button>
</form>
</body>
</html>
`
w.Header().Set("Content-Type", "text/html; charset=utf-8")
fmt.Fprint(w, html)
}
func uploadHandler(w http.ResponseWriter, r *http.Request) {
log.Println("Upload request received")
if r.Method != http.MethodPost {
http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
return
}
r.Body = http.MaxBytesReader(w, r.Body, maxUploadSize)
if err := r.ParseMultipartForm(maxUploadSize); err != nil {
log.Println("ParseMultipartForm error:", err)
http.Error(w, "File too large or invalid upload", http.StatusBadRequest)
return
}
file, header, err := r.FormFile("file")
if err != nil {
log.Println("FormFile error:", err)
http.Error(w, "Could not read uploaded file", http.StatusBadRequest)
return
}
defer file.Close()
log.Printf("Receiving file: %s, size: %d bytes\n", header.Filename, header.Size)
filename := filepath.Base(header.Filename)
// Add timestamp to avoid overwriting files with same name
finalName := fmt.Sprintf("%d_%s", time.Now().Unix(), filename)
dstPath := filepath.Join(uploadDir, finalName)
dst, err := os.Create(dstPath)
if err != nil {
log.Println("os.Create error:", err)
http.Error(w, "Could not create file on server", http.StatusInternalServerError)
return
}
defer dst.Close()
written, err := io.Copy(dst, file)
if err != nil {
log.Println("io.Copy error:", err)
http.Error(w, "Could not save file", http.StatusInternalServerError)
return
}
log.Printf("Saved file: %s (%d bytes)\n", dstPath, written)
fmt.Fprintf(w, "File uploaded successfully: %s\nSize: %d bytes\n", finalName, written)
}
The thing when receiving a file in go is:
- writing with a limit the raw data from the HTTP POST request (
method="POST") with:
r.body = http.MaxBytesReader(w, r.Body, maxUploadSize)
- Parsing the multipart form so Go can distinguish the uploaded resources:
if err := r.ParseMultipartForm(maxUploadSize); err != nil {
log.Println("ParseMultipartForm error:", err)
http.Error(w, "File too large or invalid upload", http.StatusBadRequest)
return
}
- And now we are able to use
r.FormFile()method to select the"file"content by its name, which is defined here in the client:
<input type="file" name="file" accept="video/*,image/*" required>
We do it here:
file, header, err := r.FormFile("file")
At this point file is an object behaving like a file stream containing the raw data of the file, and header is a little struct containing metadata, like the filename, size, content-type...
Example:
fmt.Println(header.Filename)
fmt.Println(header.Size)
fmt.Println(header.Header.Get("Content-Type"))
That's why we'll close the file connection with defer file.Close().
- Extract the filename
Simple enough, we just use filepath.Base(header.Filename).
Example:
filepath.Base("AA/BB.txt")
-> "BB.txt"
- Ensuring uniqueness of the filename.
finalName := fmt.Sprintf("%d_%s", time.Now().Unix(), filename)
dstPath := filepath.Join(uploadDir, finalName)
"uploads/filename_SECS1STJANUARY1970"
- Creating the destination file.
dst, err := os.Create(dstPath)
if err != nil {
log.Println("os.Create error:", err)
http.Error(w, "Could not create file on server", http.StatusInternalServerError)
return
}
defer dst.Close()
- Finally copying the content to permanent file.
We use io.Copy(dst, src), its signature is:
func Copy (dst Writer, src Reader) (writen int64, err error)
We use it here:
written, err := io.Copy(dst, file)
if err != nil {
log.Println("io.Copy error:", err)
http.Error(w, "Could not save file", http.StatusInternalServerError)
return
}
And forward a success response to the client:
fmt.Fprintf(w, "File uploaded successfully: %s\nSize: %d bytes\n", finalName, written)
So now i just have to:
> go run main.go
2026/05/25 15:51:51 Server running on http://0.0.0.0:8080
And now connect to my local network computer IP with my SmartPhone.
We can find my computer IP in the local network with ip addr
❯ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host noprefixroute
valid_lft forever preferred_lft forever
2: enp7s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
link/ether fc:34:97:67:a8:93 brd ff:ff:ff:ff:ff:ff
3: wlp6s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 70:9c:d1:62:92:3d brd ff:ff:ff:ff:ff:ff
inet 192.168.1.20/24 brd 192.168.1.255 scope global dynamic noprefixroute wlp6s0
valid_lft 65629sec preferred_lft 65629sec
inet6 2a01:cb00:125c:ef00:f791:46ce:28f3:a51e/64 scope global temporary dynamic
valid_lft 86259sec preferred_lft 459sec
inet6 2a01:cb00:125c:ef00:1e87:5f78:e039:8e92/64 scope global dynamic mngtmpaddr noprefixroute
valid_lft 86259sec preferred_lft 459sec
inet6 fe80::15e1:d82f:2d5e:b8a0/64 scope link noprefixroute
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 3a:b3:bf:71:b7:3d brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
5: br-83d249e18dae: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 5a:9e:c1:43:a3:1e brd ff:ff:ff:ff:ff:ff
inet 172.18.0.1/16 brd 172.18.255.255 scope global br-83d249e18dae
valid_lft forever preferred_lft forever
These are all network interfaces.
They show the address my computer has on each network.
For example on docker network, my computer has address 172.17.0.1.
What interests us here is the wireless interface wlp6s0.
Where my address is 192.168.1.20.
But what is the /24 ?
This is a mask applied to the ip adress range format for this network, meaning that the 24 first bits are set to 1.
An IPV4 adress is 4 bytes, one byte can represent 256 states -> 0 to 255, that is why IPV4 has this format:
[0, 255].[0, 255].[0, 255].[0, 255]
In this case, /24 means the first 24 bits are the network part.
In dotted decimal notation, the subnet mask is 255.255.255.0 (underlying bits are 11111111).
So we got only 256 possibles computers, phones... connected on the local network.
Also, a mask value can be anywhere between 0 and 32, meaning that in IPV4 the bytes are not uncuttable.
So here on my phone i just type the address http://192.168.1.20:8080 and uploads a bunch of pictures of me with interrogating face.
Our good friend imagemagick
In ImageMagick 6, the command is usually convert. In ImageMagick 7, the recommended command is magick, but on my system I’m using the legacy convert command.
We install it:
> sudo apt install imagemagick
In this article i use this version:
❯ convert -version
Version: ImageMagick 6.9.12-98 Q16 x86_64 18038 https://legacy.imagemagick.org
Copyright: (C) 1999 ImageMagick Studio LLC
License: https://imagemagick.org/script/license.php
Features: Cipher DPC Modules OpenMP(4.5)
Delegates (built-in): bzlib djvu fftw fontconfig freetype heic jbig jng jp2 jpeg lcms lqr ltdl lzma openexr pangocairo png raw tiff webp wmf x xml zlib
juju@juju-System-Product-Name ~/simple_uploader/uploads
Now, convert can work for cutting yourself out only if the background is uniform enough and you stand out clearly from it.
Example with this cat picture:

> convert cat1.jpg -fuzz 10% -transparent white cat2.png
And we got:

Now let's dissect the command.
-
-transparent whitemeans set the alpha channel (channel that controls the transparancy of the pixel) to 0 (transparent) for all the pixel you consider white -
-fuzz 10%means with a 10% tolerance
At first i thought the distance between the reference color point, which is white OR (R: 255, G:255, B:255) here was computed the following:
(255 - Rpix) + (255 - Gpix1) + (255 - Bpix1)
Then for example:
(255 - 251) + (255 - 249) + (255 - 254) = 11
And because the maximum distance would be 255 * 3 = 765.
11 / 765 = 0.01419355
Which is below the maximum tolerance value, hence it is considered white so itts alpha channel is set to 0 -> appears transparent.
This is a normalized Manhattan distance.
But in fact it is computed as a vectorial distance.
That is a distance in a space.
In this case a 3 dimensional space, each dimension is a color (R, G or B).
So we do:
sqrt((255 - 251) ^ 2 + (255 - 249) ^ 2 + (255 - 254) ^ 2)
That is just Pythagorean Theorem extended to 3 dimensions.
Like take a cube for example, you know the sides (A, B, C) are 1 meters long for example and you want to compute the length of the diagonal between the 2 most distant apex. (E)

So you first compute the length of a diagonal that is shared between 2 dimensons (D), which is in fact same length as (D').
sqrt(A^2 + C^2)
Then you use this diagonal with (A) (equivalent to (A')) to determine the length of the diagonal (E).
So in fact we got:
Length of E = sqrt( sqrt(A^2 + C^2)^2 + B^2)
=>
Length of E = sqrt( A^2 + C^2 + B^2)
But in general case we are not in a cube because distance values differes, but the idea is the same, we must use all sides and we can not simplify like in a cube for example to:
Length of E = sqrt( A^2 + A^2 + A^2)
=>
Length of E = sqrt( 3 * A^2)
What i find funny about that is that order of diagonal computation does not matter because in the end all is just side A + side B + side C
So here:

We can say that (GA) is (FE)^2 + DELTA_1^2 which is not intuitive at all.
Instead of stating sqrt(((FB)^2 + (FE)^2))^2 + FG^2 which is more intuitive.
Going back to the example.
sqrt((255 - 251) ^ 2 + (255 - 249) ^ 2 + (255 - 254) ^ 2) = 7.28
And we divide by the maximum.
7.28 / sqrt(3 * 255^2)
<=>
7.28 / 441.67 = .016 -> 1.6 %
We are below 10%, so the pixel is considered white, hence its alpha channel is set to 0 -> transparent.
Adding halo
Now, we want to add a halo around the PNG.
For that we will use:
> convert cat2.png \( +clone -background "#00eaff" -shadow 100x12+0+0 \) +swap -background none -layers merge +repage cat_halo.png
It produces something like:

Because it takes the initial PNG cat2.png, and enters in a sub-pipeline we define between parenthesis ( and ) -> to make them interpretable by imagemagick and not bash we escape them with \.
In this pipeline we wil bring cat2.png by cloning it inside the pipeline with +clone.
And we define background color, this is just the color that will be used for the next command -> Cyan (halo)
Speaking of halo, just after we have -shadow 100x12+0+0.
This in fact creates a shadow based on the dimensions of the pixels that are not transparent.
We define the coordinates of the shadow as ...+0+0 -> meaning no x and y shift respectively from the shape we make the shadow from.
The opacity is 100 which is for 100%.
And 12 is the blur radius/sigma-like value used to spread the shadow. (shadow diffuson)
If we increase it to 100 for example:

Now when we go outside of the sub-pipeline, the image that is generated is just a cyan shadow:

And in imagemagick we reason in term of a stack of images, so at this point we got this image stack:
shadow_image
cat2.png
But because we want a halo, it means that we want the original PNG to be paced above the shadow, so we swap the stack with +swap.
After that, we just make explicit that the background must still be transparent with -background transparent equivalent to -background none in th merge step.
Finally we merge the images of the stack with -layers merge (last stack image onto the first image stack).
Ha and what about the +repage ?
In fact PNG can have an optional chunk containing x and y offsets of the image.
Here, in the shadow step, because shadow is a little larger than the shape it's derived from to create the holow effect then imagemagick could have put some x and/or y offsets in the image optional buffer (here it is unlikely because the halo width (sigma value) is not high and the x and y offsets are equal to 0).
But still good practice to put +repage if we want the canva to perfectly fit the output image dimensions.
Hmm, but it appears that the merge step will adjust the dimensions of the output image (because of halo width, width and height (of the output image) will slightly increase).
So here it is even more unlikely to have offsets because merge will adjust output dimensions -> virtual canvas and visible image dimensions are equal.
FFmpeg is all we need
So now we are going to combine the images.
Here is the FFmpeg version i use for this article:
ffmpeg version 6.1.1-3ubuntu5 Copyright (c) 2000-2023 the FFmpeg developers
built with gcc 13 (Ubuntu 13.2.0-23ubuntu3)
configuration: --prefix=/usr --extra-version=3ubuntu5 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --disable-omx --enable-gnutls --enable-libaom --enable-libass --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-openal --enable-opencl --enable-opengl --disable-sndio --enable-libvpl --disable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-ladspa --enable-libbluray --enable-libjack --enable-libpulse --enable-librabbitmq --enable-librist --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libx264 --enable-libzmq --enable-libzvbi --enable-lv2 --enable-sdl2 --enable-libplacebo --enable-librav1e --enable-pocketsphinx --enable-librsvg --enable-libjxl --enable-shared
WARNING: library configuration mismatch
avcodec configuration: --prefix=/usr --extra-version=3ubuntu5 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --disable-omx --enable-gnutls --enable-libaom --enable-libass --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-openal --enable-opencl --enable-opengl --disable-sndio --enable-libvpl --disable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-ladspa --enable-libbluray --enable-libjack --enable-libpulse --enable-librabbitmq --enable-librist --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libx264 --enable-libzmq --enable-libzvbi --enable-lv2 --enable-sdl2 --enable-libplacebo --enable-librav1e --enable-pocketsphinx --enable-librsvg --enable-libjxl --enable-shared --enable-version3 --disable-doc --disable-programs --disable-static --enable-libaribb24 --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libtesseract --enable-libvo_amrwbenc --enable-libsmbclient
libavutil 58. 29.100 / 58. 29.100
libavcodec 60. 31.102 / 60. 31.102
libavformat 60. 16.100 / 60. 16.100
libavdevice 60. 3.100 / 60. 3.100
libavfilter 9. 12.100 / 9. 12.100
libswscale 7. 5.100 / 7. 5.100
libswresample 4. 12.100 / 4. 12.100
libpostproc 57. 3.100 / 57. 3.100
All we need is this FFmpeg command:
#!/usr/env/bash
ffmpeg -i background/background.png -i me/me4.png -i logo_blue.png \
-filter_complex "
[0:v]scale=1920:1080[bg];
[1:v]scale=-1:1080[me];
[2:v]scale=-1:144[logo];
[bg][me]overlay=W-w+80:H-h+140[withme];
[withme]drawtext=
fontfile=/usr/share/fonts/truetype/jetbrains-mono/JetBrainsMono-Bold.ttf:
textfile=title.txt:
fontcolor=blue:
fontsize=108:
line_spacing=12:
x=120:
y=(h-text_h)/2:
shadowcolor=black@0.65:
shadowx=7:
shadowy=7[prefinal];
[prefinal][logo]overlay=40:40;
" \
-frames:v 1 thumbnail.png
Technically the flow is simple enough.
First you see that we take multiple inputs, the background, the PNG (me) and the logo.
Then in the filter space we can grab them.
Indicating their argument position in the file input [0:v] -> takes background/background.png for example.
And on each one of this pictures we perform scaling operations.
For example we enforce the background to be 1920x1080.
And when it comes to the other images, we scale their height to 1080 and 144 pixels respectively enforcing to respect their ratio, with the -1 as the width.
Meaning that their width is recomputed according to their scaled height and their ratio -> ratio conserved
We give them a name after each operation, bg, me, logo respectively.
Here:
[bg][me]overlay=W-w+80:H-h+140[withme];
We put me onto bg.
The engine begin to draw it at W-w+80.
-
W= background width -> 1920 -
w= PNG width
And we got the same concept for the height.
We can approximately visualize the x coordinates from which it begins to draw the PNG here:
w
|
|
X V
--------------------->
| | |
| | | <---- h
| X-------|Y
| |
| |
| |
|-------------------|
V
Now, when it comes to the title, we put it at the center.
x=120 pixels from the left.
And y is the height of the textable area divided by 2 -> half height.
The shadow is attached to the text, so it begins at the exact same position as the text but with an x and y offset of 7 pixels so it really look like a shadow.
The shadow is black and has an opacity of 0.65 (65%).
Finally, we do a last overlay with the last image that we named in the previous operation -> prefinal.
The logo starts at 40 pixels from the left and 40 pixels from the top.
For example, this outputs:
![]()
With this title:
Cartesian product
Deep Dive
OR
Cartesian product\n Deep Dive
All we need is imagemagick
But we can also translate this exact operation with imagemagick to stay in its ecosystem.
#!/usr/bin/env bash
set -e # exits the script immediately when a command fails
FONT="/usr/share/fonts/truetype/jetbrains-mono/JetBrainsMono-Bold.ttf"
TEXT="$(< title.txt)"
convert background/background.png -resize 1920x1080\! /tmp/bg.png
convert me/me4.png -resize x1080 /tmp/me.png
convert logo_blue.png -resize x144 /tmp/logo.png
convert \
-background none \
-font "$FONT" \
-pointsize 108 \
-fill blue \
-interline-spacing 12 \
label:"$TEXT" \
text.png
convert -size 1920x1080 xc:none \
\( text.png -fill black -colorize 100 \) \
-gravity west -geometry +122+4 -composite \
text.png \
-gravity west -geometry +120+0 -composite \
/tmp/text_layer.png
convert /tmp/bg.png \
/tmp/me.png -gravity southeast -geometry -80-140 -composite \
/tmp/text_layer.png -composite \
/tmp/logo.png -gravity northwest -geometry +40+40 -composite \
thumbnail2.png
Here, we got the EXACT same resize procedure:
convert background/background.png -resize 1920x1080\! /tmp/bg.png
convert me/me4.png -resize x1080 /tmp/me.png
convert logo_blue.png -resize x144 /tmp/logo.png
\! forces resizing to the exact dimensions. The backslash escapes ! so the shell does not interpret it before ImageMagick sees it.
Here, we got the text PNG creation:
convert -size 1920x1080 xc:none \
\( text.png -fill black -colorize 100 \) \
-gravity west -geometry +122+4 -composite \
text.png \
-gravity west -geometry +120+0 -composite \
/tmp/text_layer.png
It outputs that image:

This creates a transparent image of dimensions 1920x1080:
convert -size 1920x1080 xc:none \
After that we fill the pixels that are not transparent with black. That is the text shadow (font-size 100).
\( text.png -fill black -colorize 100 \) \
After, we get outside from the sub-pipeline and we tell where to place the text.
\( text.png -fill black -colorize 100 \) \
-gravity west -geometry +122+4
We take the center point of the left / west edge of the image as a reference for the coordinate system, and define x=122 and y=4.
Note, just a small summary about the gravity coordinates:
northwest---------north---------northeast
| |
| |
| |
| |
| |
west center east
| |
| |
| |
| |
| |
southwest---------south---------southeast
Then we overlay the last image on the image stack (text shadow image) onto the first one (transparent background) with -composite.
Then, we load text.png (initial text image) and we we do the same.
During the final overlay, we just shift its position by 4 pixels in y from the shadow reference.
If text.png image had higher dimensions and/or was placed near an edge, its content may be cropped in the final result (and no offset position storage).
And in the final command, that is just the overlays.
convert /tmp/bg.png \
/tmp/me.png -gravity southeast -geometry -80-140 -composite \
/tmp/text_layer.png -composite \
/tmp/logo.png -gravity northwest -geometry +40+40 -composite \
thumbnail2.png
Corresponding to this composition:
logo.png
text_layer.png
me.png
bg.png
Speed comparisons
On my machine convert variant took:
❯ time bash script_convert.bash
real 0m2,024s
user 0m2,853s
sys 0m0,158s
While FFmpeg version took:
❯ time bash script_ffmpeg.bash
...
...
...
real 0m0,384s
user 0m0,332s
sys 0m0,075s
Conclusion
A good thumbnail balances channel identity and video novelty.
The composition is easy to reproduce and standardize from the command line, either with ImageMagick alone or with a mix of ImageMagick and FFmpeg.