foot/composed.c

150 lines
3.2 KiB
C
Raw Normal View History

composed: store compose chains in a binary search tree The previous implementation stored compose chains in a dynamically allocated array. Adding a chain was easy: resize the array and append the new chain at the end. Looking up a compose chain given a compose chain key/index was also easy: just index into the array. However, searching for a pre-existing chain given a codepoint sequence was very slow. Since the array wasn’t sorted, we typically had to scan through the entire array, just to realize that there is no pre-existing chain, and that we need to add a new one. Since this happens for *each* codepoint in a grapheme cluster, things quickly became really slow. Things were ok:ish as long as the compose chain struct was small, as that made it possible to hold all the chains in the cache. Once the number of chains reached a certain point, or when we were forced to bump maximum number of allowed codepoints in a chain, we started thrashing the cache and things got much much worse. So what can we do? We can’t sort the array, because a) that would invalidate all existing chain keys in the grid (and iterating the entire scrollback and updating compose keys is *not* an option). b) inserting a chain becomes slow as we need to first find _where_ to insert it, and then memmove() the rest of the array. This patch uses a binary search tree to store the chains instead of a simple array. The tree is sorted on a “key”, which is the XOR of all codepoints, truncated to the CELL_COMB_CHARS_HI-CELL_COMB_CHARS_LO range. The grid now stores CELL_COMB_CHARS_LO+key, instead of CELL_COMB_CHARS_LO+index. Since the key is truncated, collisions may occur. This is handled by incrementing the key by 1. Lookup is of course slower than before, O(log n) instead of O(1). Insertion is slightly slower as well: technically it’s O(log n) instead of O(1). However, we also need to take into account the re-allocating the array will occasionally force a full copy of the array when it cannot simply be growed. But finding a pre-existing chain is now *much* faster: O(log n) instead of O(n). In most cases, the first lookup will either succeed (return a true match), or fail (return NULL). However, since key collisions are possible, it may also return false matches. This means we need to verify the contents of the chain before deciding to use it instead of inserting a new chain. But remember that this comparison was being done for each and every chain in the previous implementation. With lookups being much faster, and in particular, no longer requiring us to check the chain contents for every singlec chain, we can now use a dynamically allocated ‘chars’ array in the chain. This was previously a hardcoded array of 10 chars. Using a dynamic allocated array means looking in the array is slower, since we now need two loads: one to load the pointer, and a second to load _from_ the pointer. As a result, the base size of a compose chain (i.e. an “empty” chain) has now been reduced from 48 bytes to 32. A chain with two codepoints is 40 bytes. This means we have up to 4 codepoints while still using less, or the same amount, of memory as before. Furthermore, the Unicode random test (i.e. write random “unicode” chars) is now **faster** than current master (i.e. before text-shaping support was added), **with** test-shaping enabled. With text-shaping disabled, we’re _even_ faster.
2021-06-24 13:17:07 +02:00
#include "composed.h"
#include <stdlib.h>
#include <stdbool.h>
#include "debug.h"
#include "terminal.h"
uint32_t
composed_key_from_chars(const uint32_t chars[], size_t count)
{
if (count == 0)
return 0;
uint32_t key = chars[0];
for (size_t i = 1; i < count; i++)
key = composed_key_from_key(key, chars[i]);
return key;
}
uint32_t
composed_key_from_key(uint32_t prev_key, uint32_t next_char)
{
unsigned bits = 32 - __builtin_clz(CELL_COMB_CHARS_HI - CELL_COMB_CHARS_LO);
/* Rotate old key 8 bits */
uint32_t new_key = (prev_key << 8) | (prev_key >> (bits - 8));
/* xor with new char */
new_key ^= next_char;
/* Multiply with magic hash constant */
new_key *= 2654435761ul;
/* And mask, to ensure the new value is within range */
new_key &= CELL_COMB_CHARS_HI - CELL_COMB_CHARS_LO;
return new_key;
}
UNITTEST
{
const char32_t chars[] = U"abcdef";
uint32_t k1 = composed_key_from_key(chars[0], chars[1]);
uint32_t k2 = composed_key_from_chars(chars, 2);
xassert(k1 == k2);
uint32_t k3 = composed_key_from_key(k2, chars[2]);
uint32_t k4 = composed_key_from_chars(chars, 3);
xassert(k3 == k4);
}
composed: store compose chains in a binary search tree The previous implementation stored compose chains in a dynamically allocated array. Adding a chain was easy: resize the array and append the new chain at the end. Looking up a compose chain given a compose chain key/index was also easy: just index into the array. However, searching for a pre-existing chain given a codepoint sequence was very slow. Since the array wasn’t sorted, we typically had to scan through the entire array, just to realize that there is no pre-existing chain, and that we need to add a new one. Since this happens for *each* codepoint in a grapheme cluster, things quickly became really slow. Things were ok:ish as long as the compose chain struct was small, as that made it possible to hold all the chains in the cache. Once the number of chains reached a certain point, or when we were forced to bump maximum number of allowed codepoints in a chain, we started thrashing the cache and things got much much worse. So what can we do? We can’t sort the array, because a) that would invalidate all existing chain keys in the grid (and iterating the entire scrollback and updating compose keys is *not* an option). b) inserting a chain becomes slow as we need to first find _where_ to insert it, and then memmove() the rest of the array. This patch uses a binary search tree to store the chains instead of a simple array. The tree is sorted on a “key”, which is the XOR of all codepoints, truncated to the CELL_COMB_CHARS_HI-CELL_COMB_CHARS_LO range. The grid now stores CELL_COMB_CHARS_LO+key, instead of CELL_COMB_CHARS_LO+index. Since the key is truncated, collisions may occur. This is handled by incrementing the key by 1. Lookup is of course slower than before, O(log n) instead of O(1). Insertion is slightly slower as well: technically it’s O(log n) instead of O(1). However, we also need to take into account the re-allocating the array will occasionally force a full copy of the array when it cannot simply be growed. But finding a pre-existing chain is now *much* faster: O(log n) instead of O(n). In most cases, the first lookup will either succeed (return a true match), or fail (return NULL). However, since key collisions are possible, it may also return false matches. This means we need to verify the contents of the chain before deciding to use it instead of inserting a new chain. But remember that this comparison was being done for each and every chain in the previous implementation. With lookups being much faster, and in particular, no longer requiring us to check the chain contents for every singlec chain, we can now use a dynamically allocated ‘chars’ array in the chain. This was previously a hardcoded array of 10 chars. Using a dynamic allocated array means looking in the array is slower, since we now need two loads: one to load the pointer, and a second to load _from_ the pointer. As a result, the base size of a compose chain (i.e. an “empty” chain) has now been reduced from 48 bytes to 32. A chain with two codepoints is 40 bytes. This means we have up to 4 codepoints while still using less, or the same amount, of memory as before. Furthermore, the Unicode random test (i.e. write random “unicode” chars) is now **faster** than current master (i.e. before text-shaping support was added), **with** test-shaping enabled. With text-shaping disabled, we’re _even_ faster.
2021-06-24 13:17:07 +02:00
const struct composed *
composed: store compose chains in a binary search tree The previous implementation stored compose chains in a dynamically allocated array. Adding a chain was easy: resize the array and append the new chain at the end. Looking up a compose chain given a compose chain key/index was also easy: just index into the array. However, searching for a pre-existing chain given a codepoint sequence was very slow. Since the array wasn’t sorted, we typically had to scan through the entire array, just to realize that there is no pre-existing chain, and that we need to add a new one. Since this happens for *each* codepoint in a grapheme cluster, things quickly became really slow. Things were ok:ish as long as the compose chain struct was small, as that made it possible to hold all the chains in the cache. Once the number of chains reached a certain point, or when we were forced to bump maximum number of allowed codepoints in a chain, we started thrashing the cache and things got much much worse. So what can we do? We can’t sort the array, because a) that would invalidate all existing chain keys in the grid (and iterating the entire scrollback and updating compose keys is *not* an option). b) inserting a chain becomes slow as we need to first find _where_ to insert it, and then memmove() the rest of the array. This patch uses a binary search tree to store the chains instead of a simple array. The tree is sorted on a “key”, which is the XOR of all codepoints, truncated to the CELL_COMB_CHARS_HI-CELL_COMB_CHARS_LO range. The grid now stores CELL_COMB_CHARS_LO+key, instead of CELL_COMB_CHARS_LO+index. Since the key is truncated, collisions may occur. This is handled by incrementing the key by 1. Lookup is of course slower than before, O(log n) instead of O(1). Insertion is slightly slower as well: technically it’s O(log n) instead of O(1). However, we also need to take into account the re-allocating the array will occasionally force a full copy of the array when it cannot simply be growed. But finding a pre-existing chain is now *much* faster: O(log n) instead of O(n). In most cases, the first lookup will either succeed (return a true match), or fail (return NULL). However, since key collisions are possible, it may also return false matches. This means we need to verify the contents of the chain before deciding to use it instead of inserting a new chain. But remember that this comparison was being done for each and every chain in the previous implementation. With lookups being much faster, and in particular, no longer requiring us to check the chain contents for every singlec chain, we can now use a dynamically allocated ‘chars’ array in the chain. This was previously a hardcoded array of 10 chars. Using a dynamic allocated array means looking in the array is slower, since we now need two loads: one to load the pointer, and a second to load _from_ the pointer. As a result, the base size of a compose chain (i.e. an “empty” chain) has now been reduced from 48 bytes to 32. A chain with two codepoints is 40 bytes. This means we have up to 4 codepoints while still using less, or the same amount, of memory as before. Furthermore, the Unicode random test (i.e. write random “unicode” chars) is now **faster** than current master (i.e. before text-shaping support was added), **with** test-shaping enabled. With text-shaping disabled, we’re _even_ faster.
2021-06-24 13:17:07 +02:00
composed_lookup(struct composed *root, uint32_t key)
{
struct composed *node = root;
while (node != NULL) {
if (key == node->key)
return node;
node = key < node->key ? node->left : node->right;
}
return NULL;
}
const struct composed *
composed_lookup_without_collision(struct composed *root, uint32_t *key,
const char32_t *prefix_text, size_t prefix_len,
char32_t wc, int forced_width)
{
while (true) {
const struct composed *cc = composed_lookup(root, *key);
if (cc == NULL)
return NULL;
bool match = cc->count == prefix_len + 1 &&
cc->forced_width == forced_width &&
cc->chars[prefix_len] == wc;
if (match) {
for (size_t i = 0; i < prefix_len; i++) {
if (cc->chars[i] != prefix_text[i]) {
match = false;
break;
}
}
}
if (match)
return cc;
(*key)++;
*key &= CELL_COMB_CHARS_HI - CELL_COMB_CHARS_LO;
2025-01-27 07:35:10 +01:00
/* TODO: this will loop infinitely if the composed table is full */
}
return NULL;
}
void
composed: store compose chains in a binary search tree The previous implementation stored compose chains in a dynamically allocated array. Adding a chain was easy: resize the array and append the new chain at the end. Looking up a compose chain given a compose chain key/index was also easy: just index into the array. However, searching for a pre-existing chain given a codepoint sequence was very slow. Since the array wasn’t sorted, we typically had to scan through the entire array, just to realize that there is no pre-existing chain, and that we need to add a new one. Since this happens for *each* codepoint in a grapheme cluster, things quickly became really slow. Things were ok:ish as long as the compose chain struct was small, as that made it possible to hold all the chains in the cache. Once the number of chains reached a certain point, or when we were forced to bump maximum number of allowed codepoints in a chain, we started thrashing the cache and things got much much worse. So what can we do? We can’t sort the array, because a) that would invalidate all existing chain keys in the grid (and iterating the entire scrollback and updating compose keys is *not* an option). b) inserting a chain becomes slow as we need to first find _where_ to insert it, and then memmove() the rest of the array. This patch uses a binary search tree to store the chains instead of a simple array. The tree is sorted on a “key”, which is the XOR of all codepoints, truncated to the CELL_COMB_CHARS_HI-CELL_COMB_CHARS_LO range. The grid now stores CELL_COMB_CHARS_LO+key, instead of CELL_COMB_CHARS_LO+index. Since the key is truncated, collisions may occur. This is handled by incrementing the key by 1. Lookup is of course slower than before, O(log n) instead of O(1). Insertion is slightly slower as well: technically it’s O(log n) instead of O(1). However, we also need to take into account the re-allocating the array will occasionally force a full copy of the array when it cannot simply be growed. But finding a pre-existing chain is now *much* faster: O(log n) instead of O(n). In most cases, the first lookup will either succeed (return a true match), or fail (return NULL). However, since key collisions are possible, it may also return false matches. This means we need to verify the contents of the chain before deciding to use it instead of inserting a new chain. But remember that this comparison was being done for each and every chain in the previous implementation. With lookups being much faster, and in particular, no longer requiring us to check the chain contents for every singlec chain, we can now use a dynamically allocated ‘chars’ array in the chain. This was previously a hardcoded array of 10 chars. Using a dynamic allocated array means looking in the array is slower, since we now need two loads: one to load the pointer, and a second to load _from_ the pointer. As a result, the base size of a compose chain (i.e. an “empty” chain) has now been reduced from 48 bytes to 32. A chain with two codepoints is 40 bytes. This means we have up to 4 codepoints while still using less, or the same amount, of memory as before. Furthermore, the Unicode random test (i.e. write random “unicode” chars) is now **faster** than current master (i.e. before text-shaping support was added), **with** test-shaping enabled. With text-shaping disabled, we’re _even_ faster.
2021-06-24 13:17:07 +02:00
composed_insert(struct composed **root, struct composed *node)
{
node->left = node->right = NULL;
if (*root == NULL) {
*root = node;
return;
composed: store compose chains in a binary search tree The previous implementation stored compose chains in a dynamically allocated array. Adding a chain was easy: resize the array and append the new chain at the end. Looking up a compose chain given a compose chain key/index was also easy: just index into the array. However, searching for a pre-existing chain given a codepoint sequence was very slow. Since the array wasn’t sorted, we typically had to scan through the entire array, just to realize that there is no pre-existing chain, and that we need to add a new one. Since this happens for *each* codepoint in a grapheme cluster, things quickly became really slow. Things were ok:ish as long as the compose chain struct was small, as that made it possible to hold all the chains in the cache. Once the number of chains reached a certain point, or when we were forced to bump maximum number of allowed codepoints in a chain, we started thrashing the cache and things got much much worse. So what can we do? We can’t sort the array, because a) that would invalidate all existing chain keys in the grid (and iterating the entire scrollback and updating compose keys is *not* an option). b) inserting a chain becomes slow as we need to first find _where_ to insert it, and then memmove() the rest of the array. This patch uses a binary search tree to store the chains instead of a simple array. The tree is sorted on a “key”, which is the XOR of all codepoints, truncated to the CELL_COMB_CHARS_HI-CELL_COMB_CHARS_LO range. The grid now stores CELL_COMB_CHARS_LO+key, instead of CELL_COMB_CHARS_LO+index. Since the key is truncated, collisions may occur. This is handled by incrementing the key by 1. Lookup is of course slower than before, O(log n) instead of O(1). Insertion is slightly slower as well: technically it’s O(log n) instead of O(1). However, we also need to take into account the re-allocating the array will occasionally force a full copy of the array when it cannot simply be growed. But finding a pre-existing chain is now *much* faster: O(log n) instead of O(n). In most cases, the first lookup will either succeed (return a true match), or fail (return NULL). However, since key collisions are possible, it may also return false matches. This means we need to verify the contents of the chain before deciding to use it instead of inserting a new chain. But remember that this comparison was being done for each and every chain in the previous implementation. With lookups being much faster, and in particular, no longer requiring us to check the chain contents for every singlec chain, we can now use a dynamically allocated ‘chars’ array in the chain. This was previously a hardcoded array of 10 chars. Using a dynamic allocated array means looking in the array is slower, since we now need two loads: one to load the pointer, and a second to load _from_ the pointer. As a result, the base size of a compose chain (i.e. an “empty” chain) has now been reduced from 48 bytes to 32. A chain with two codepoints is 40 bytes. This means we have up to 4 codepoints while still using less, or the same amount, of memory as before. Furthermore, the Unicode random test (i.e. write random “unicode” chars) is now **faster** than current master (i.e. before text-shaping support was added), **with** test-shaping enabled. With text-shaping disabled, we’re _even_ faster.
2021-06-24 13:17:07 +02:00
}
uint32_t key = node->key;
struct composed *prev = NULL;
struct composed *n = *root;
while (n != NULL) {
xassert(n->key != node->key);
composed: store compose chains in a binary search tree The previous implementation stored compose chains in a dynamically allocated array. Adding a chain was easy: resize the array and append the new chain at the end. Looking up a compose chain given a compose chain key/index was also easy: just index into the array. However, searching for a pre-existing chain given a codepoint sequence was very slow. Since the array wasn’t sorted, we typically had to scan through the entire array, just to realize that there is no pre-existing chain, and that we need to add a new one. Since this happens for *each* codepoint in a grapheme cluster, things quickly became really slow. Things were ok:ish as long as the compose chain struct was small, as that made it possible to hold all the chains in the cache. Once the number of chains reached a certain point, or when we were forced to bump maximum number of allowed codepoints in a chain, we started thrashing the cache and things got much much worse. So what can we do? We can’t sort the array, because a) that would invalidate all existing chain keys in the grid (and iterating the entire scrollback and updating compose keys is *not* an option). b) inserting a chain becomes slow as we need to first find _where_ to insert it, and then memmove() the rest of the array. This patch uses a binary search tree to store the chains instead of a simple array. The tree is sorted on a “key”, which is the XOR of all codepoints, truncated to the CELL_COMB_CHARS_HI-CELL_COMB_CHARS_LO range. The grid now stores CELL_COMB_CHARS_LO+key, instead of CELL_COMB_CHARS_LO+index. Since the key is truncated, collisions may occur. This is handled by incrementing the key by 1. Lookup is of course slower than before, O(log n) instead of O(1). Insertion is slightly slower as well: technically it’s O(log n) instead of O(1). However, we also need to take into account the re-allocating the array will occasionally force a full copy of the array when it cannot simply be growed. But finding a pre-existing chain is now *much* faster: O(log n) instead of O(n). In most cases, the first lookup will either succeed (return a true match), or fail (return NULL). However, since key collisions are possible, it may also return false matches. This means we need to verify the contents of the chain before deciding to use it instead of inserting a new chain. But remember that this comparison was being done for each and every chain in the previous implementation. With lookups being much faster, and in particular, no longer requiring us to check the chain contents for every singlec chain, we can now use a dynamically allocated ‘chars’ array in the chain. This was previously a hardcoded array of 10 chars. Using a dynamic allocated array means looking in the array is slower, since we now need two loads: one to load the pointer, and a second to load _from_ the pointer. As a result, the base size of a compose chain (i.e. an “empty” chain) has now been reduced from 48 bytes to 32. A chain with two codepoints is 40 bytes. This means we have up to 4 codepoints while still using less, or the same amount, of memory as before. Furthermore, the Unicode random test (i.e. write random “unicode” chars) is now **faster** than current master (i.e. before text-shaping support was added), **with** test-shaping enabled. With text-shaping disabled, we’re _even_ faster.
2021-06-24 13:17:07 +02:00
prev = n;
n = key < n->key ? n->left : n->right;
}
xassert(prev != NULL);
xassert(n == NULL);
if (key < prev->key) {
xassert(prev->left == NULL);
prev->left = node;
} else {
xassert(prev->right == NULL);
prev->right = node;
}
}
void
composed_free(struct composed *root)
{
if (root == NULL)
return;
composed_free(root->left);
composed_free(root->right);
free(root->chars);
free(root);
}