diff --git a/CHANGELOG.md b/CHANGELOG.md index e3d47051..6b9e33f0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -18,6 +18,15 @@ ## Unreleased + +### For packagers + +Starting with this release, foot can be PGO:d (compiled using profile +guided optimizations) **without** a running Wayland session. This +means foot can be PGO:d in e.g. sandboxed build scripts. See +[INSTALL.md](INSTALL.md#user-content-performance-optimized-pgo). + + ### Added * Implement reverse auto-wrap (_auto\_left\_margin_, _bw_, in diff --git a/INSTALL.md b/INSTALL.md index 71649014..189ae192 100644 --- a/INSTALL.md +++ b/INSTALL.md @@ -8,6 +8,12 @@ 1. [Other](#other) 1. [Setup](#setup) 1. [Release build](#release-build) + 1. [Size optimized](#size-optimized) + 1. [Performance optimized, non-PGO](#performance-optimized-non-pgo) + 1. [Performance optimized, PGO](#performance-optimized-pgo) + 1. [Partial PGO](#partial-pgo) + 1. [Full PGO](#full-pgo) + 1. [Use the generated PGO data](#use-the-generated-pgo-data) 1. [Profile Guided Optimization](#profile-guided-optimization) 1. [Debug build](#debug-build) 1. [Running the new build](#running-the-new-build) @@ -126,72 +132,152 @@ mkdir -p bld/release && cd bld/release ### Release build +Below are instructions for building foot either [size +optimized](#size-optimized), [performance +optimized](performance-optimized-non-pgo), or performance +optimized using [PGO](#performance-optimized-pgo). + +PGO - _Profile Guided Optimization_ - is a way to optimize a program +better than `-O3` can, and is done by compiling foot twice: first to +generate an instrumented version which is used to run a payload that +exercises the performance critical parts of foot, and then a second +time to rebuild foot using the generated profiling data to guide +optimization. + +In addition to being faster, PGO builds also tend to be smaller than +regular `-O3` builds. + + +#### Size optimized + +To optimize for size (i.e. produce a small binary): + +```sh +export CFLAGS="$CFLAGS -Os -march=native" +meson --buildtype=release --prefix=/usr -Db_lto=true ../.. +ninja +ninja test +ninja install +``` + +#### Performance optimized, non-PGO + +To do a regular, non-PGO build optimized for performance: + ```sh export CFLAGS="$CFLAGS -O3 -march=native" meson --buildtype=release --prefix=/usr -Db_lto=true ../.. +wninja +ninja test +ninja install ``` -Both `-O3` and `-Db_lto=true` are **highly** recommended. -For performance reasons, I strongly recommend doing a -[PGO](#profile-guided-optimization) (Profile Guided Optimization) -build. This requires a running Wayland session since we will be -executing an intermediate build of foot. +Use `-O2` instead of `-O3` if you prefer a slightly smaller (and +slower!) binary. -If you do not want this, just build: + +#### Performance optimized, PGO + +First, configure the build directory: ```sh -ninja +export CFLAGS="$CFLAGS -O3 -march=native -Wno-missing-profile" +meson --buildtype=release --prefix=/usr -Db_lto=true ../.. ``` -and then skip to [Running the new build](#running-the-new-build). +It is **very** important `-O3` is being used here, as GCC-10.1.x and +later have a regression where PGO with `-O2` is **much** slower. -**For packagers**: normally, you would configure compiler flags using -`-Dc_args`. This however "overwrites" `CFLAGS`. `makepkg` from Arch, -for example, uses `CFLAGS` to specify the default set of flags. +If you are using Clang instead of GCC, use the following `CFLAGS` instead: -Thus, we do `export CFLAGS+="..."` to at least not throw away those -flags. +```sh +export CFLAGS="$CFLAGS -O3 -march=native -Wno-ignored-optimization-argument -Wno-profile-instr-out-of-date" +``` -When packaging, you may want to use the default `CFLAGS` only, but -note this: foot is a performance critical application that relies on -compiler optimizations to perform well. - -In particular, with GCC 10.1, it is **very** important `-O3` is used -(and not e.g. `-O2`) when doing a [PGO](#profile-guided-optimization) -build. - - -#### Profile Guided Optimization - -First, make sure you have configured a [release](#release-build) build -directory, but: - -If using Clang, make sure to add `-Wno-ignored-optimization-argument --Wno-profile-instr-out-of-date` to `CFLAGS`. - -If using GCC, make sure to add `-Wno-missing-profile` to `CFLAGS`. - -Then, tell meson we want to _generate_ profile data, and build: +Then, tell meson we want to _generate_ profiling data, and build: ```sh meson configure -Db_pgo=generate ninja ``` -Next, we need to execute the intermediate build of foot, and run a -payload inside it that will exercise the performance critical code -paths. To do this, we will use the script +Next, we need to actually generate the profiling data. + +There are two ways to do this: a [partial PGO build using a PGO +helper](#partial-pgo) binary, or a [full PGO build](#full-pgo) by +running the real foot binary. The latter has slightly better results +(i.e. results in a faster binary), but must be run in a Wayland +session. + +A full PGO build also tends to be smaller than a partial build. + + +##### Partial PGO + +This method uses a PGO helper binary that links against the VT parser +only. It is similar to a mock test; it instantiates a dummy terminal +instance and then directly calls the VT parser with stimuli. + +It explicitly does **not** include the Wayland backend and as such, it +does not require a running Wayland session. The downside is that not +all code paths in foot is exercised. In particular, the **rendering** +code is not. As a result, the final binary built using this method is +slightly slower than when doing a [full PGO](#full-pgo) build. + +We will use the `pgo` binary along with input corpus generated by `scripts/generate-alt-random-writes.py`: +```sh +tmp_file=$(mktemp) +../../scripts/generate-alt-random-writes \ + --rows=67 \ + --cols=135 \ + --scroll \ + --scroll-region \ + --colors-regular \ + --colors-bright \ + --colors-256 \ + --colors-rgb \ + --attr-bold \ + --attr-italic \ + --attr-underline \ + ${tmp_file} +./pgo ${tmp_file} ${tmp_file} ${tmp_file} +rm ${tmp_file} +``` + +The snippet above first creates an (empty) temporary file. Then, it +runs a script that generates random escape sequences (if you cat +`${tmp_file}` in a terminal, you’ll see random colored characters all +over the screen). Finally, we feed the randomly generated escape +sequences to the PGO helper. This is what generates the profiling data +used in the next step. + +You are now ready to [use the generated PGO +data](#use-the-generated-pgo-data). + + +##### Full PGO + +This method requires a running Wayland session. + +We will use the script `scripts/generate-alt-random-writes.py`: + ```sh foot_tmp_file=$(mktemp) -./foot --config=/dev/null --term=xterm sh -c " --scroll --scroll-region --colors-regular --colors-bright --colors-rgb ${foot_tmp_file} && cat ${foot_tmp_file}" +./foot --config=/dev/null --term=xterm sh -c " --scroll --scroll-region --colors-regular --colors-bright --colors-256 --colors-rgb --attr-bold --attr-italic --attr-underline ${foot_tmp_file} && cat ${foot_tmp_file}" rm ${foot_tmp_file} ``` You should see a foot window open up, with random colored text. The window should close after ~1-2s. + +##### Use the generated PGO data + +Now that we have _generated_ PGO data, we need to rebuild foot. This +time telling meson (and ultimately gcc/clang) to _use_ the PGO data. + If using Clang, now do (this requires _llvm_ to have been installed): ```sh @@ -207,6 +293,7 @@ ninja Continue reading in [Running the new build](#running-the-new-build) + ### Debug build ```sh @@ -214,6 +301,7 @@ meson --buildtype=debug ../.. ninja ``` + ### Running the new build You can now run it directly from the build directory: diff --git a/PKGBUILD b/PKGBUILD index e8dfca09..9fea5bd6 100644 --- a/PKGBUILD +++ b/PKGBUILD @@ -20,18 +20,31 @@ build() { meson --prefix=/usr --buildtype=release --wrap-mode=nofallback -Db_lto=true .. + find -name "*.gcda" -delete + meson configure -Db_pgo=generate + ninja + + script_options="--scroll --scroll-region --colors-regular --colors-bright --colors-256 --colors-rgb --attr-bold --attr-italic --attr-underline" + + tmp_file=$(mktemp) + if [[ -v WAYLAND_DISPLAY ]]; then - meson configure -Db_pgo=generate - find -name "*.gcda" -delete - ninja - - tmp_file=$(mktemp) - ./foot --config /dev/null --term=xterm -- sh -c "../scripts/generate-alt-random-writes.py --scroll --scroll-region --colors-regular --colors-bright --colors-rgb ${tmp_file} && cat ${tmp_file}" - rm "${tmp_file}" - - meson configure -Db_pgo=use + ./foot \ + --config /dev/null \ + --term=xterm \ + sh -c "../scripts/generate-alt-random-writes.py ${script_options} ${tmp_file} && cat ${tmp_file}" + else + ../scripts/generate-alt-random-writes.py \ + --rows=67 \ + --cols=135 \ + ${script_options} \ + ${tmp_file} + ./pgo ${tmp_file} ${tmp_file} ${tmp_file} fi + rm "${tmp_file}" + + meson configure -Db_pgo=use ninja } diff --git a/meson.build b/meson.build index c9308a74..f93ca732 100644 --- a/meson.build +++ b/meson.build @@ -100,44 +100,70 @@ version = custom_target( output: 'version.h', command: [generate_version_sh, meson.project_version(), '@SOURCE_DIR@', '@OUTPUT@']) +misc = static_library( + 'misc', + 'hsl.c', 'hsl.h', + 'log.c', 'log.h', + 'macros.h', + 'misc.c', 'misc.h', + 'uri.c', 'uri.h', + 'xmalloc.c', 'xmalloc.h', +) + +vtlib = static_library( + 'vtlib', + 'base64.c', 'base64.h', + 'csi.c', 'csi.h', + 'dcs.c', 'dcs.h', + 'osc.c', 'osc.h', + 'sixel.c', 'sixel.h', + 'vt.c', 'vt.h', + version, + dependencies: [pixman, fcft, tllist], + link_with: misc, +) + +pgolib = static_library( + 'pgolib', + 'grid.c', 'grid.h', + 'selection.c', 'selection.h', + 'terminal.c', 'terminal.h', + dependencies: [pixman, fcft, tllist], + link_with: vtlib, +) + +executable( + 'pgo', + 'pgo/pgo.c', + wl_proto_src, + dependencies: [math, threads, pixman, wayland_client, fcft, tllist], + link_with: pgolib, +) + executable( 'foot', 'async.c', 'async.h', - 'base64.c', 'base64.h', 'config.c', 'config.h', 'commands.c', 'commands.h', - 'csi.c', 'csi.h', - 'dcs.c', 'dcs.h', 'extract.c', 'extract.h', 'fdm.c', 'fdm.h', - 'grid.c', 'grid.h', - 'hsl.c', 'hsl.h', 'input.c', 'input.h', - 'log.c', 'log.h', - 'macros.h', 'main.c', - 'misc.c', 'misc.h', - 'osc.c', 'osc.h', 'quirks.c', 'quirks.h', 'reaper.c', 'reaper.h', 'render.c', 'render.h', 'search.c', 'search.h', - 'selection.c', 'selection.h', 'server.c', 'server.h', 'client-protocol.h', 'shm.c', 'shm.h', - 'sixel.c', 'sixel.h', 'slave.c', 'slave.h', 'spawn.c', 'spawn.h', - 'terminal.c', 'terminal.h', 'tokenize.c', 'tokenize.h', - 'uri.c', 'uri.h', 'user-notification.h', - 'vt.c', 'vt.h', 'wayland.c', 'wayland.h', - 'xmalloc.c', 'xmalloc.h', wl_proto_src + wl_proto_headers, version, dependencies: [math, threads, pixman, wayland_client, wayland_cursor, xkb, fontconfig, tllist, fcft], + link_with: pgolib, install: true) executable( diff --git a/pgo/pgo.c b/pgo/pgo.c new file mode 100644 index 00000000..aa73187e --- /dev/null +++ b/pgo/pgo.c @@ -0,0 +1,285 @@ +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include + +#include "async.h" +#include "config.h" +#include "user-notification.h" +#include "vt.h" + +extern bool fdm_ptmx(struct fdm *fdm, int fd, int events, void *data); + +static void +usage(const char *prog_name) +{ + printf( + "Usage: %s stimuli-file1 stimuli-file2 ... stimuli-fileN\n", + prog_name); +} + +enum async_write_status +async_write(int fd, const void *data, size_t len, size_t *idx) +{ + return ASYNC_WRITE_DONE; +} + +bool +fdm_add(struct fdm *fdm, int fd, int events, fdm_handler_t handler, void *data) +{ + return true; +} + +bool +fdm_del(struct fdm *fdm, int fd) +{ + return true; +} + +bool +fdm_event_add(struct fdm *fdm, int fd, int events) +{ + return true; +} + +bool +fdm_event_del(struct fdm *fdm, int fd, int events) +{ + return true; +} + +bool +render_resize_force(struct terminal *term, int width, int height) +{ + return true; +} + +void render_refresh(struct terminal *term) {} +void render_refresh_csd(struct terminal *term) {} +void render_refresh_title(struct terminal *term) {} + +bool +render_xcursor_set(struct seat *seat, struct terminal *term, const char *xcursor) +{ + return true; +} + +struct wl_window * +wayl_win_init(struct terminal *term) +{ + return NULL; +} + +void wayl_win_destroy(struct wl_window *win) {} + +bool +spawn(struct reaper *reaper, const char *cwd, char *const argv[], + int stdin_fd, int stdout_fd, int stderr_fd) +{ + return true; +} + +pid_t +slave_spawn( + int ptmx, int argc, const char *cwd, char *const *argv, const char *term_env, + const char *conf_shell, bool login_shell, + const user_notifications_t *notifications) +{ + return 0; +} + +int +render_worker_thread(void *_ctx) +{ + return 0; +} + +struct extraction_context * +extract_begin(enum selection_kind kind) +{ + return NULL; +} + +bool +extract_one( + const struct terminal *term, const struct row *row, const struct cell *cell, + int col, void *context) +{ + return true; +} + +bool +extract_finish(struct extraction_context *context, char **text, size_t *len) +{ + return true; +} + +void cmd_scrollback_up(struct terminal *term, int rows) {} +void cmd_scrollback_down(struct terminal *term, int rows) {} + +int +main(int argc, const char *const *argv) +{ + if (argc < 2) { + usage(argv[0]); + return EXIT_FAILURE; + } + + const int row_count = 67; + const int col_count = 135; + const int grid_row_count = 16384; + + int lower_fd = timerfd_create(CLOCK_MONOTONIC, TFD_CLOEXEC | TFD_NONBLOCK); + if (lower_fd < 0) + return EXIT_FAILURE; + + int upper_fd = timerfd_create(CLOCK_MONOTONIC, TFD_CLOEXEC | TFD_NONBLOCK); + if (upper_fd < 0) { + close(lower_fd); + return EXIT_FAILURE; + } + + struct row **rows = calloc(grid_row_count, sizeof(rows[0])); + for (int i = 0; i < grid_row_count; i++) { + rows[i] = calloc(1, sizeof(*rows[i])); + rows[i]->cells = calloc(col_count, sizeof(rows[i]->cells[0])); + } + + struct config conf = { + .tweak = { + .delayed_render_lower_ns = 500000, /* 0.5ms */ + .delayed_render_upper_ns = 16666666 / 2, /* half a frame period (60Hz) */ + }, + }; + + struct wayland wayl = { + .seats = tll_init(), + .monitors = tll_init(), + .terms = tll_init(), + }; + + struct terminal term = { + .conf = &conf, + .wl = &wayl, + .grid = &term.normal, + .normal = { + .num_rows = grid_row_count, + .num_cols = col_count, + .rows = rows, + .cur_row = rows[0], + }, + .alt = { + .num_rows = grid_row_count, + .num_cols = col_count, + .rows = rows, + .cur_row = rows[0], + }, + .scale = 1, + .width = col_count * 8, + .height = row_count * 15, + .cols = col_count, + .rows = row_count, + .cell_width = 8, + .cell_height = 15, + .scroll_region = { + .start = 0, + .end = row_count, + }, + .selection = { + .start = {-1, -1}, + .end = {-1, -1}, + }, + .delayed_render_timer = { + .lower_fd = lower_fd, + .upper_fd = upper_fd + } + }; + + tll_push_back(wayl.terms, &term); + + int ret = EXIT_FAILURE; + + for (int i = 1; i < argc; i++) { + struct stat st; + if (stat(argv[i], &st) < 0) { + fprintf(stderr, "error: %s: failed to stat: %s\n", + argv[i], strerror(errno)); + goto out; + } + + uint8_t *data = malloc(st.st_size); + if (data == NULL) { + fprintf(stderr, "error: %s: failed to allocate buffer: %s\n", + argv[i], strerror(errno)); + goto out; + } + + int fd = open(argv[1], O_RDONLY); + if (fd < 0) { + fprintf(stderr, "error: %s: failed to open: %s\n", + argv[i], strerror(errno)); + goto out; + } + + ssize_t amount = read(fd, data, st.st_size); + if (amount != st.st_size) { + fprintf(stderr, "error: %s: failed to read: %s\n", + argv[i], strerror(errno)); + goto out; + } + + close(fd); + + int mem_fd = memfd_create("foot-pgo-ptmx", MFD_CLOEXEC); + if (mem_fd < 0) { + fprintf(stderr, "error: failed to create memory FD\n"); + goto out; + } + + if (write(mem_fd, data, st.st_size) < 0) { + fprintf(stderr, "error: failed to write memory FD\n"); + close(mem_fd); + goto out; + } + + free(data); + + term.ptmx = mem_fd; + lseek(mem_fd, 0, SEEK_SET); + + printf("Feeding VT parser with %s (%lld bytes)\n", + argv[i], (long long)st.st_size); + + while (lseek(mem_fd, 0, SEEK_CUR) < st.st_size) { + if (!fdm_ptmx(NULL, -1, EPOLLIN, &term)) { + fprintf(stderr, "error: fdm_ptmx() failed\n"); + close(mem_fd); + goto out; + } + } + close(mem_fd); + } + + ret = EXIT_SUCCESS; + +out: + tll_free(wayl.terms); + + for (int i = 0; i < grid_row_count; i++) { + free(rows[i]->cells); + free(rows[i]); + } + + free(rows); + close(lower_fd); + close(upper_fd); + return ret; +} diff --git a/render.c b/render.c index 12601ac2..9a3fef0f 100644 --- a/render.c +++ b/render.c @@ -539,15 +539,11 @@ render_cell(struct terminal *term, pixman_image_t *pix, pixman_image_unref(clr_pix); /* Underline */ - if (cell->attrs.underline) { - draw_underline(term, pix, attrs_to_font(term, &cell->attrs), - &fg, x, y, cell_cols); - } + if (cell->attrs.underline) + draw_underline(term, pix, font, &fg, x, y, cell_cols); - if (cell->attrs.strikethrough) { - draw_strikeout(term, pix, attrs_to_font(term, &cell->attrs), - &fg, x, y, cell_cols); - } + if (cell->attrs.strikethrough) + draw_strikeout(term, pix, font, &fg, x, y, cell_cols); draw_cursor: if (has_cursor && (term->cursor_style != CURSOR_BLOCK || !term->kbd_focus)) diff --git a/scripts/generate-alt-random-writes.py b/scripts/generate-alt-random-writes.py index 9af73cfd..f9601595 100755 --- a/scripts/generate-alt-random-writes.py +++ b/scripts/generate-alt-random-writes.py @@ -11,6 +11,7 @@ class ColorVariant(enum.IntEnum): NONE = enum.auto() REGULAR = enum.auto() BRIGHT = enum.auto() + CUBE = enum.auto() RGB = enum.auto() @@ -18,11 +19,17 @@ def main(): parser = argparse.ArgumentParser() parser.add_argument( 'out', type=argparse.FileType(mode='w'), nargs='?', help='name of output file') + parser.add_argument('--cols', type=int) + parser.add_argument('--rows', type=int) parser.add_argument('--colors-regular', action='store_true') parser.add_argument('--colors-bright', action='store_true') + parser.add_argument('--colors-256', action='store_true') parser.add_argument('--colors-rgb', action='store_true') parser.add_argument('--scroll', action='store_true') parser.add_argument('--scroll-region', action='store_true') + parser.add_argument('--attr-bold', action='store_true') + parser.add_argument('--attr-italic', action='store_true') + parser.add_argument('--attr-underline', action='store_true') opts = parser.parse_args() out = opts.out if opts.out is not None else sys.stdout @@ -33,15 +40,21 @@ def main(): termios.TIOCGWINSZ, struct.pack('HHHH', 0, 0, 0, 0))) + if opts.rows is not None: + lines = opts.rows + if opts.cols is not None: + cols = opts.cols + # Number of characters to write to screen count = 256 * 1024**1 # Characters to choose from - alphabet = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRTSTUVWXYZ0123456789 ' + alphabet = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRTSTUVWXYZ0123456789 öäå 👨👩🧒' color_variants = ([ColorVariant.NONE] + ([ColorVariant.REGULAR] if opts.colors_regular else []) + ([ColorVariant.BRIGHT] if opts.colors_bright else []) + + ([ColorVariant.CUBE] if opts.colors_256 else []) + ([ColorVariant.RGB] if opts.colors_rgb else [])) # Enter alt screen @@ -57,9 +70,13 @@ def main(): bottom = rand.read(1)[0] % 3 out.write(f'\033[{top};{lines - bottom}r') - count = rand.read(1)[0] % (lines - 1) + lines_to_scroll = rand.read(1)[0] % (lines - 1) rev = rand.read(1)[0] % 2 - out.write(f'\033[{count + 1}{"T" if rev == 1 else "S"}') + if not rev and rand.read(1)[0] % 2: + out.write(f'\033[{lines};{cols}H') + out.write('\n' * lines_to_scroll) + else: + out.write(f'\033[{lines_to_scroll + 1}{"T" if rev == 1 else "S"}') continue # Generate a random location and a random character @@ -83,17 +100,44 @@ def main(): idx = rand.read(1)[0] % 8 out.write(f'\033[{base + idx}m') + elif color_variant == ColorVariant.CUBE: + do_bg = rand.read(1)[0] % 2 + base = 48 if do_bg else 38 + + idx = rand.read(1)[0] % 256 + if rand.read(1)[0] % 2: + # Old-style + out.write(f'\033[{base};5;{idx}m') + else: + # New-style (sub-parameter based) + out.write(f'\033[{base}:2:5:{idx}m') + elif color_variant == ColorVariant.RGB: do_bg = rand.read(1)[0] % 2 + base = 48 if do_bg else 38 rgb = rand.read(3) - out.write(f'\033[{48 if do_bg else 38}:2::{rgb[0]}:{rgb[1]}:{rgb[2]}m') + + if rand.read(1)[0] % 2: + # Old-style + out.write(f'\033[{base};2;{rgb[0]};{rgb[1]};{rgb[2]}m') + else: + # New-style (sub-parameter based) + out.write(f'\033[{base}:2::{rgb[0]}:{rgb[1]}:{rgb[2]}m') + + if opts.attr_bold and rand.read(1)[0] % 5 == 0: + out.write('\033[1m') + if opts.attr_italic and rand.read(1)[0] % 5 == 0: + out.write('\033[3m') + if opts.attr_underline and rand.read(1)[0] % 5 == 0: + out.write('\033[4m') out.write(c * repeat) - if color_variant != ColorVariant.NONE: - do_sgr_reset = rand.read(1)[0] % 2 - if do_sgr_reset: - out.write('\033[m') + do_sgr_reset = rand.read(1)[0] % 2 + if do_sgr_reset: + reset_actions = ['\033[m', '\033[39m', '\033[49m'] + idx = rand.read(1)[0] % len(reset_actions) + out.write(reset_actions[idx]) # Leave alt screen out.write('\033[m\033[r\033[?1049l') diff --git a/terminal.c b/terminal.c index 2c5e0a85..f78e103c 100644 --- a/terminal.c +++ b/terminal.c @@ -198,7 +198,9 @@ fdm_ptmx_out(struct fdm *fdm, int fd, int events, void *data) static struct timespec last = {0}; #endif -static bool +/* Externally visible, but not declared in terminal.h, to enable pgo + * to call this function directly */ +bool fdm_ptmx(struct fdm *fdm, int fd, int events, void *data) { struct terminal *term = data;