Merge branch 'pgo'

Closes #107
This commit is contained in:
Daniel Eklöf 2020-11-22 19:31:25 +01:00
commit 10eda1faa5
No known key found for this signature in database
GPG key ID: 5BBD4992C116573F
8 changed files with 541 additions and 78 deletions

View file

@ -18,6 +18,15 @@
## Unreleased
### For packagers
Starting with this release, foot can be PGO:d (compiled using profile
guided optimizations) **without** a running Wayland session. This
means foot can be PGO:d in e.g. sandboxed build scripts. See
[INSTALL.md](INSTALL.md#user-content-performance-optimized-pgo).
### Added
* Implement reverse auto-wrap (_auto\_left\_margin_, _bw_, in

View file

@ -8,6 +8,12 @@
1. [Other](#other)
1. [Setup](#setup)
1. [Release build](#release-build)
1. [Size optimized](#size-optimized)
1. [Performance optimized, non-PGO](#performance-optimized-non-pgo)
1. [Performance optimized, PGO](#performance-optimized-pgo)
1. [Partial PGO](#partial-pgo)
1. [Full PGO](#full-pgo)
1. [Use the generated PGO data](#use-the-generated-pgo-data)
1. [Profile Guided Optimization](#profile-guided-optimization)
1. [Debug build](#debug-build)
1. [Running the new build](#running-the-new-build)
@ -126,72 +132,152 @@ mkdir -p bld/release && cd bld/release
### Release build
Below are instructions for building foot either [size
optimized](#size-optimized), [performance
optimized](performance-optimized-non-pgo), or performance
optimized using [PGO](#performance-optimized-pgo).
PGO - _Profile Guided Optimization_ - is a way to optimize a program
better than `-O3` can, and is done by compiling foot twice: first to
generate an instrumented version which is used to run a payload that
exercises the performance critical parts of foot, and then a second
time to rebuild foot using the generated profiling data to guide
optimization.
In addition to being faster, PGO builds also tend to be smaller than
regular `-O3` builds.
#### Size optimized
To optimize for size (i.e. produce a small binary):
```sh
export CFLAGS="$CFLAGS -Os -march=native"
meson --buildtype=release --prefix=/usr -Db_lto=true ../..
ninja
ninja test
ninja install
```
#### Performance optimized, non-PGO
To do a regular, non-PGO build optimized for performance:
```sh
export CFLAGS="$CFLAGS -O3 -march=native"
meson --buildtype=release --prefix=/usr -Db_lto=true ../..
wninja
ninja test
ninja install
```
Both `-O3` and `-Db_lto=true` are **highly** recommended.
For performance reasons, I strongly recommend doing a
[PGO](#profile-guided-optimization) (Profile Guided Optimization)
build. This requires a running Wayland session since we will be
executing an intermediate build of foot.
Use `-O2` instead of `-O3` if you prefer a slightly smaller (and
slower!) binary.
If you do not want this, just build:
#### Performance optimized, PGO
First, configure the build directory:
```sh
ninja
export CFLAGS="$CFLAGS -O3 -march=native -Wno-missing-profile"
meson --buildtype=release --prefix=/usr -Db_lto=true ../..
```
and then skip to [Running the new build](#running-the-new-build).
It is **very** important `-O3` is being used here, as GCC-10.1.x and
later have a regression where PGO with `-O2` is **much** slower.
**For packagers**: normally, you would configure compiler flags using
`-Dc_args`. This however "overwrites" `CFLAGS`. `makepkg` from Arch,
for example, uses `CFLAGS` to specify the default set of flags.
If you are using Clang instead of GCC, use the following `CFLAGS` instead:
Thus, we do `export CFLAGS+="..."` to at least not throw away those
flags.
```sh
export CFLAGS="$CFLAGS -O3 -march=native -Wno-ignored-optimization-argument -Wno-profile-instr-out-of-date"
```
When packaging, you may want to use the default `CFLAGS` only, but
note this: foot is a performance critical application that relies on
compiler optimizations to perform well.
In particular, with GCC 10.1, it is **very** important `-O3` is used
(and not e.g. `-O2`) when doing a [PGO](#profile-guided-optimization)
build.
#### Profile Guided Optimization
First, make sure you have configured a [release](#release-build) build
directory, but:
If using Clang, make sure to add `-Wno-ignored-optimization-argument
-Wno-profile-instr-out-of-date` to `CFLAGS`.
If using GCC, make sure to add `-Wno-missing-profile` to `CFLAGS`.
Then, tell meson we want to _generate_ profile data, and build:
Then, tell meson we want to _generate_ profiling data, and build:
```sh
meson configure -Db_pgo=generate
ninja
```
Next, we need to execute the intermediate build of foot, and run a
payload inside it that will exercise the performance critical code
paths. To do this, we will use the script
Next, we need to actually generate the profiling data.
There are two ways to do this: a [partial PGO build using a PGO
helper](#partial-pgo) binary, or a [full PGO build](#full-pgo) by
running the real foot binary. The latter has slightly better results
(i.e. results in a faster binary), but must be run in a Wayland
session.
A full PGO build also tends to be smaller than a partial build.
##### Partial PGO
This method uses a PGO helper binary that links against the VT parser
only. It is similar to a mock test; it instantiates a dummy terminal
instance and then directly calls the VT parser with stimuli.
It explicitly does **not** include the Wayland backend and as such, it
does not require a running Wayland session. The downside is that not
all code paths in foot is exercised. In particular, the **rendering**
code is not. As a result, the final binary built using this method is
slightly slower than when doing a [full PGO](#full-pgo) build.
We will use the `pgo` binary along with input corpus generated by
`scripts/generate-alt-random-writes.py`:
```sh
tmp_file=$(mktemp)
../../scripts/generate-alt-random-writes \
--rows=67 \
--cols=135 \
--scroll \
--scroll-region \
--colors-regular \
--colors-bright \
--colors-256 \
--colors-rgb \
--attr-bold \
--attr-italic \
--attr-underline \
${tmp_file}
./pgo ${tmp_file} ${tmp_file} ${tmp_file}
rm ${tmp_file}
```
The snippet above first creates an (empty) temporary file. Then, it
runs a script that generates random escape sequences (if you cat
`${tmp_file}` in a terminal, youll see random colored characters all
over the screen). Finally, we feed the randomly generated escape
sequences to the PGO helper. This is what generates the profiling data
used in the next step.
You are now ready to [use the generated PGO
data](#use-the-generated-pgo-data).
##### Full PGO
This method requires a running Wayland session.
We will use the script `scripts/generate-alt-random-writes.py`:
```sh
foot_tmp_file=$(mktemp)
./foot --config=/dev/null --term=xterm sh -c "<path-to-generate-alt-random-writes.py> --scroll --scroll-region --colors-regular --colors-bright --colors-rgb ${foot_tmp_file} && cat ${foot_tmp_file}"
./foot --config=/dev/null --term=xterm sh -c "<path-to-generate-alt-random-writes.py> --scroll --scroll-region --colors-regular --colors-bright --colors-256 --colors-rgb --attr-bold --attr-italic --attr-underline ${foot_tmp_file} && cat ${foot_tmp_file}"
rm ${foot_tmp_file}
```
You should see a foot window open up, with random colored text. The
window should close after ~1-2s.
##### Use the generated PGO data
Now that we have _generated_ PGO data, we need to rebuild foot. This
time telling meson (and ultimately gcc/clang) to _use_ the PGO data.
If using Clang, now do (this requires _llvm_ to have been installed):
```sh
@ -207,6 +293,7 @@ ninja
Continue reading in [Running the new build](#running-the-new-build)
### Debug build
```sh
@ -214,6 +301,7 @@ meson --buildtype=debug ../..
ninja
```
### Running the new build
You can now run it directly from the build directory:

View file

@ -20,18 +20,31 @@ build() {
meson --prefix=/usr --buildtype=release --wrap-mode=nofallback -Db_lto=true ..
find -name "*.gcda" -delete
meson configure -Db_pgo=generate
ninja
script_options="--scroll --scroll-region --colors-regular --colors-bright --colors-256 --colors-rgb --attr-bold --attr-italic --attr-underline"
tmp_file=$(mktemp)
if [[ -v WAYLAND_DISPLAY ]]; then
meson configure -Db_pgo=generate
find -name "*.gcda" -delete
ninja
tmp_file=$(mktemp)
./foot --config /dev/null --term=xterm -- sh -c "../scripts/generate-alt-random-writes.py --scroll --scroll-region --colors-regular --colors-bright --colors-rgb ${tmp_file} && cat ${tmp_file}"
rm "${tmp_file}"
meson configure -Db_pgo=use
./foot \
--config /dev/null \
--term=xterm \
sh -c "../scripts/generate-alt-random-writes.py ${script_options} ${tmp_file} && cat ${tmp_file}"
else
../scripts/generate-alt-random-writes.py \
--rows=67 \
--cols=135 \
${script_options} \
${tmp_file}
./pgo ${tmp_file} ${tmp_file} ${tmp_file}
fi
rm "${tmp_file}"
meson configure -Db_pgo=use
ninja
}

View file

@ -100,44 +100,70 @@ version = custom_target(
output: 'version.h',
command: [generate_version_sh, meson.project_version(), '@SOURCE_DIR@', '@OUTPUT@'])
misc = static_library(
'misc',
'hsl.c', 'hsl.h',
'log.c', 'log.h',
'macros.h',
'misc.c', 'misc.h',
'uri.c', 'uri.h',
'xmalloc.c', 'xmalloc.h',
)
vtlib = static_library(
'vtlib',
'base64.c', 'base64.h',
'csi.c', 'csi.h',
'dcs.c', 'dcs.h',
'osc.c', 'osc.h',
'sixel.c', 'sixel.h',
'vt.c', 'vt.h',
version,
dependencies: [pixman, fcft, tllist],
link_with: misc,
)
pgolib = static_library(
'pgolib',
'grid.c', 'grid.h',
'selection.c', 'selection.h',
'terminal.c', 'terminal.h',
dependencies: [pixman, fcft, tllist],
link_with: vtlib,
)
executable(
'pgo',
'pgo/pgo.c',
wl_proto_src,
dependencies: [math, threads, pixman, wayland_client, fcft, tllist],
link_with: pgolib,
)
executable(
'foot',
'async.c', 'async.h',
'base64.c', 'base64.h',
'config.c', 'config.h',
'commands.c', 'commands.h',
'csi.c', 'csi.h',
'dcs.c', 'dcs.h',
'extract.c', 'extract.h',
'fdm.c', 'fdm.h',
'grid.c', 'grid.h',
'hsl.c', 'hsl.h',
'input.c', 'input.h',
'log.c', 'log.h',
'macros.h',
'main.c',
'misc.c', 'misc.h',
'osc.c', 'osc.h',
'quirks.c', 'quirks.h',
'reaper.c', 'reaper.h',
'render.c', 'render.h',
'search.c', 'search.h',
'selection.c', 'selection.h',
'server.c', 'server.h', 'client-protocol.h',
'shm.c', 'shm.h',
'sixel.c', 'sixel.h',
'slave.c', 'slave.h',
'spawn.c', 'spawn.h',
'terminal.c', 'terminal.h',
'tokenize.c', 'tokenize.h',
'uri.c', 'uri.h',
'user-notification.h',
'vt.c', 'vt.h',
'wayland.c', 'wayland.h',
'xmalloc.c', 'xmalloc.h',
wl_proto_src + wl_proto_headers, version,
dependencies: [math, threads, pixman, wayland_client, wayland_cursor, xkb, fontconfig,
tllist, fcft],
link_with: pgolib,
install: true)
executable(

285
pgo/pgo.c Normal file
View file

@ -0,0 +1,285 @@
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/epoll.h>
#include <sys/timerfd.h>
#include <sys/mman.h>
#include <fcntl.h>
#include "async.h"
#include "config.h"
#include "user-notification.h"
#include "vt.h"
extern bool fdm_ptmx(struct fdm *fdm, int fd, int events, void *data);
static void
usage(const char *prog_name)
{
printf(
"Usage: %s stimuli-file1 stimuli-file2 ... stimuli-fileN\n",
prog_name);
}
enum async_write_status
async_write(int fd, const void *data, size_t len, size_t *idx)
{
return ASYNC_WRITE_DONE;
}
bool
fdm_add(struct fdm *fdm, int fd, int events, fdm_handler_t handler, void *data)
{
return true;
}
bool
fdm_del(struct fdm *fdm, int fd)
{
return true;
}
bool
fdm_event_add(struct fdm *fdm, int fd, int events)
{
return true;
}
bool
fdm_event_del(struct fdm *fdm, int fd, int events)
{
return true;
}
bool
render_resize_force(struct terminal *term, int width, int height)
{
return true;
}
void render_refresh(struct terminal *term) {}
void render_refresh_csd(struct terminal *term) {}
void render_refresh_title(struct terminal *term) {}
bool
render_xcursor_set(struct seat *seat, struct terminal *term, const char *xcursor)
{
return true;
}
struct wl_window *
wayl_win_init(struct terminal *term)
{
return NULL;
}
void wayl_win_destroy(struct wl_window *win) {}
bool
spawn(struct reaper *reaper, const char *cwd, char *const argv[],
int stdin_fd, int stdout_fd, int stderr_fd)
{
return true;
}
pid_t
slave_spawn(
int ptmx, int argc, const char *cwd, char *const *argv, const char *term_env,
const char *conf_shell, bool login_shell,
const user_notifications_t *notifications)
{
return 0;
}
int
render_worker_thread(void *_ctx)
{
return 0;
}
struct extraction_context *
extract_begin(enum selection_kind kind)
{
return NULL;
}
bool
extract_one(
const struct terminal *term, const struct row *row, const struct cell *cell,
int col, void *context)
{
return true;
}
bool
extract_finish(struct extraction_context *context, char **text, size_t *len)
{
return true;
}
void cmd_scrollback_up(struct terminal *term, int rows) {}
void cmd_scrollback_down(struct terminal *term, int rows) {}
int
main(int argc, const char *const *argv)
{
if (argc < 2) {
usage(argv[0]);
return EXIT_FAILURE;
}
const int row_count = 67;
const int col_count = 135;
const int grid_row_count = 16384;
int lower_fd = timerfd_create(CLOCK_MONOTONIC, TFD_CLOEXEC | TFD_NONBLOCK);
if (lower_fd < 0)
return EXIT_FAILURE;
int upper_fd = timerfd_create(CLOCK_MONOTONIC, TFD_CLOEXEC | TFD_NONBLOCK);
if (upper_fd < 0) {
close(lower_fd);
return EXIT_FAILURE;
}
struct row **rows = calloc(grid_row_count, sizeof(rows[0]));
for (int i = 0; i < grid_row_count; i++) {
rows[i] = calloc(1, sizeof(*rows[i]));
rows[i]->cells = calloc(col_count, sizeof(rows[i]->cells[0]));
}
struct config conf = {
.tweak = {
.delayed_render_lower_ns = 500000, /* 0.5ms */
.delayed_render_upper_ns = 16666666 / 2, /* half a frame period (60Hz) */
},
};
struct wayland wayl = {
.seats = tll_init(),
.monitors = tll_init(),
.terms = tll_init(),
};
struct terminal term = {
.conf = &conf,
.wl = &wayl,
.grid = &term.normal,
.normal = {
.num_rows = grid_row_count,
.num_cols = col_count,
.rows = rows,
.cur_row = rows[0],
},
.alt = {
.num_rows = grid_row_count,
.num_cols = col_count,
.rows = rows,
.cur_row = rows[0],
},
.scale = 1,
.width = col_count * 8,
.height = row_count * 15,
.cols = col_count,
.rows = row_count,
.cell_width = 8,
.cell_height = 15,
.scroll_region = {
.start = 0,
.end = row_count,
},
.selection = {
.start = {-1, -1},
.end = {-1, -1},
},
.delayed_render_timer = {
.lower_fd = lower_fd,
.upper_fd = upper_fd
}
};
tll_push_back(wayl.terms, &term);
int ret = EXIT_FAILURE;
for (int i = 1; i < argc; i++) {
struct stat st;
if (stat(argv[i], &st) < 0) {
fprintf(stderr, "error: %s: failed to stat: %s\n",
argv[i], strerror(errno));
goto out;
}
uint8_t *data = malloc(st.st_size);
if (data == NULL) {
fprintf(stderr, "error: %s: failed to allocate buffer: %s\n",
argv[i], strerror(errno));
goto out;
}
int fd = open(argv[1], O_RDONLY);
if (fd < 0) {
fprintf(stderr, "error: %s: failed to open: %s\n",
argv[i], strerror(errno));
goto out;
}
ssize_t amount = read(fd, data, st.st_size);
if (amount != st.st_size) {
fprintf(stderr, "error: %s: failed to read: %s\n",
argv[i], strerror(errno));
goto out;
}
close(fd);
int mem_fd = memfd_create("foot-pgo-ptmx", MFD_CLOEXEC);
if (mem_fd < 0) {
fprintf(stderr, "error: failed to create memory FD\n");
goto out;
}
if (write(mem_fd, data, st.st_size) < 0) {
fprintf(stderr, "error: failed to write memory FD\n");
close(mem_fd);
goto out;
}
free(data);
term.ptmx = mem_fd;
lseek(mem_fd, 0, SEEK_SET);
printf("Feeding VT parser with %s (%lld bytes)\n",
argv[i], (long long)st.st_size);
while (lseek(mem_fd, 0, SEEK_CUR) < st.st_size) {
if (!fdm_ptmx(NULL, -1, EPOLLIN, &term)) {
fprintf(stderr, "error: fdm_ptmx() failed\n");
close(mem_fd);
goto out;
}
}
close(mem_fd);
}
ret = EXIT_SUCCESS;
out:
tll_free(wayl.terms);
for (int i = 0; i < grid_row_count; i++) {
free(rows[i]->cells);
free(rows[i]);
}
free(rows);
close(lower_fd);
close(upper_fd);
return ret;
}

View file

@ -539,15 +539,11 @@ render_cell(struct terminal *term, pixman_image_t *pix,
pixman_image_unref(clr_pix);
/* Underline */
if (cell->attrs.underline) {
draw_underline(term, pix, attrs_to_font(term, &cell->attrs),
&fg, x, y, cell_cols);
}
if (cell->attrs.underline)
draw_underline(term, pix, font, &fg, x, y, cell_cols);
if (cell->attrs.strikethrough) {
draw_strikeout(term, pix, attrs_to_font(term, &cell->attrs),
&fg, x, y, cell_cols);
}
if (cell->attrs.strikethrough)
draw_strikeout(term, pix, font, &fg, x, y, cell_cols);
draw_cursor:
if (has_cursor && (term->cursor_style != CURSOR_BLOCK || !term->kbd_focus))

View file

@ -11,6 +11,7 @@ class ColorVariant(enum.IntEnum):
NONE = enum.auto()
REGULAR = enum.auto()
BRIGHT = enum.auto()
CUBE = enum.auto()
RGB = enum.auto()
@ -18,11 +19,17 @@ def main():
parser = argparse.ArgumentParser()
parser.add_argument(
'out', type=argparse.FileType(mode='w'), nargs='?', help='name of output file')
parser.add_argument('--cols', type=int)
parser.add_argument('--rows', type=int)
parser.add_argument('--colors-regular', action='store_true')
parser.add_argument('--colors-bright', action='store_true')
parser.add_argument('--colors-256', action='store_true')
parser.add_argument('--colors-rgb', action='store_true')
parser.add_argument('--scroll', action='store_true')
parser.add_argument('--scroll-region', action='store_true')
parser.add_argument('--attr-bold', action='store_true')
parser.add_argument('--attr-italic', action='store_true')
parser.add_argument('--attr-underline', action='store_true')
opts = parser.parse_args()
out = opts.out if opts.out is not None else sys.stdout
@ -33,15 +40,21 @@ def main():
termios.TIOCGWINSZ,
struct.pack('HHHH', 0, 0, 0, 0)))
if opts.rows is not None:
lines = opts.rows
if opts.cols is not None:
cols = opts.cols
# Number of characters to write to screen
count = 256 * 1024**1
# Characters to choose from
alphabet = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRTSTUVWXYZ0123456789 '
alphabet = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRTSTUVWXYZ0123456789 öäå 👨👩🧒'
color_variants = ([ColorVariant.NONE] +
([ColorVariant.REGULAR] if opts.colors_regular else []) +
([ColorVariant.BRIGHT] if opts.colors_bright else []) +
([ColorVariant.CUBE] if opts.colors_256 else []) +
([ColorVariant.RGB] if opts.colors_rgb else []))
# Enter alt screen
@ -57,9 +70,13 @@ def main():
bottom = rand.read(1)[0] % 3
out.write(f'\033[{top};{lines - bottom}r')
count = rand.read(1)[0] % (lines - 1)
lines_to_scroll = rand.read(1)[0] % (lines - 1)
rev = rand.read(1)[0] % 2
out.write(f'\033[{count + 1}{"T" if rev == 1 else "S"}')
if not rev and rand.read(1)[0] % 2:
out.write(f'\033[{lines};{cols}H')
out.write('\n' * lines_to_scroll)
else:
out.write(f'\033[{lines_to_scroll + 1}{"T" if rev == 1 else "S"}')
continue
# Generate a random location and a random character
@ -83,17 +100,44 @@ def main():
idx = rand.read(1)[0] % 8
out.write(f'\033[{base + idx}m')
elif color_variant == ColorVariant.CUBE:
do_bg = rand.read(1)[0] % 2
base = 48 if do_bg else 38
idx = rand.read(1)[0] % 256
if rand.read(1)[0] % 2:
# Old-style
out.write(f'\033[{base};5;{idx}m')
else:
# New-style (sub-parameter based)
out.write(f'\033[{base}:2:5:{idx}m')
elif color_variant == ColorVariant.RGB:
do_bg = rand.read(1)[0] % 2
base = 48 if do_bg else 38
rgb = rand.read(3)
out.write(f'\033[{48 if do_bg else 38}:2::{rgb[0]}:{rgb[1]}:{rgb[2]}m')
if rand.read(1)[0] % 2:
# Old-style
out.write(f'\033[{base};2;{rgb[0]};{rgb[1]};{rgb[2]}m')
else:
# New-style (sub-parameter based)
out.write(f'\033[{base}:2::{rgb[0]}:{rgb[1]}:{rgb[2]}m')
if opts.attr_bold and rand.read(1)[0] % 5 == 0:
out.write('\033[1m')
if opts.attr_italic and rand.read(1)[0] % 5 == 0:
out.write('\033[3m')
if opts.attr_underline and rand.read(1)[0] % 5 == 0:
out.write('\033[4m')
out.write(c * repeat)
if color_variant != ColorVariant.NONE:
do_sgr_reset = rand.read(1)[0] % 2
if do_sgr_reset:
out.write('\033[m')
do_sgr_reset = rand.read(1)[0] % 2
if do_sgr_reset:
reset_actions = ['\033[m', '\033[39m', '\033[49m']
idx = rand.read(1)[0] % len(reset_actions)
out.write(reset_actions[idx])
# Leave alt screen
out.write('\033[m\033[r\033[?1049l')

View file

@ -198,7 +198,9 @@ fdm_ptmx_out(struct fdm *fdm, int fd, int events, void *data)
static struct timespec last = {0};
#endif
static bool
/* Externally visible, but not declared in terminal.h, to enable pgo
* to call this function directly */
bool
fdm_ptmx(struct fdm *fdm, int fd, int events, void *data)
{
struct terminal *term = data;