Ubuntu 22.04.3: Chinese Search in Flarum and How to fix 'PHP Notice: fwrite(): send failed with errno=32 Broken pipe' error in my case

The main content of this article is to use the Sonic engine to enable Flarum to search Chinese as well.

In my Flarum, the search module isn’t perfect enough so far, almost all the discussions are using Chinese, so the default search module doesn’t work. Here, in this section, I primarily focus on utilizing the Sonic extension to implement Chinese text search. In my case, the server-side encounters some errors. I will list how I can debug and fix them.

Extensions

Here is the plugin link.

https://discuss.flarum.org/d/28826-flarum-sonic

Install Sonic env in Ubuntu 22.04.3

My sever is micro tier, so I will choose docker to install sonic env. If you are unfamiliar with docker, you can consider it’s a standalone container/server. In this container, you can install everything just as you would on a server.

Install Docker

To install Docker on Ubuntu, you can follow these steps:

  1. Update your system’s package index
  2. Install the required dependencies
  3. Add the official Docker GPG key
  4. Set up the stable Docker repository (Replace $(lsb_release -cs) with the codename of your Ubuntu release if it’s not automatically detected.)
  5. Install Docker
  6. Verify that Docker is installed
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
 
sudo apt update

sudo apt install apt-transport-https ca-certificates curl software-properties-common

curl -fsSL https://download.docker.com/linux/ubuntu/gpg 
| sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

echo "deb [signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] 
https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" 
| sudo tee /etc/apt/sources.list.d/docker.list > /dev/null


# here replace $(lsb_release -cs) 
# for example
# echo "deb [signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu focal stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null


sudo apt install docker-ce docker-ce-cli containerd.io

sudo docker --version

when run sudo docker --version you will see the version in the terminal

Install Sonic in docker

v1.4.8 is the latest version so far, you can replace any version you want

1
docker pull valeriansaliou/sonic:v1.4.8

Now, create a config file for snoic, I set up config.cfg under my home folder, the path it’s up to you.

you need root permission.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
cd /home

# create sonic folder under home path
mkdir sonic

sudo vim /home/sonic/config.cfg

# or 
cd sonic
sudo vim config.cfg

in the config.cfg, just copy paste the default content the developer provided

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
[server]

log_level = "error"

[channel]

inet = "0.0.0.0:1491"
tcp_timeout = 30

auth_password = "SecretPassword"

[channel.search]

query_limit_default = 10
query_limit_maximum = 100
query_alternates_try = 4

suggest_limit_default = 5
suggest_limit_maximum = 20

[store]

[store.kv]

path = "/var/lib/sonic/store/kv/"

retain_word_objects = 1000

[store.kv.pool]

inactive_after = 1800

[store.kv.database]

flush_after = 900

compress = true
parallelism = 2
max_files = 100
max_compactions = 1
max_flushes = 1
write_buffer = 16384
write_ahead_log = true

[store.fst]

path = "/var/lib/sonic/store/fst/"

[store.fst.pool]

inactive_after = 300

[store.fst.graph]

consolidate_after = 180

max_size = 2048
max_words = 250000

Then, save the file and exit.

Install Sonic extension

1
composer require ganuonglachanh/sonic

Navigate to you admin panel, switch on the extension, no need to change anything, just save it.

Running Sonic

In the server, switch to root permission to run:

1
docker run -p 1491:1491 -v /home/sonic/config.cfg:/etc/sonic.cfg -v ~/sonic/store/:/var/lib/sonic/store/ valeriansaliou/sonic:v1.4.8

Then, navigate to fluarm folder run:

1
php flarum sonic:addtoindex

If it prints the “xxx successfully!”, you are lucky, all the steps have done! We can highlight the keyword

1
2
3
.DiscussionListItem-main mark {
    background: #ffff80 !important;
}

Error: PHP Notice: fwrite(): send failed with errno=32 Broken pipe

If there are tons of error print it out, like this, well, let’s debug it.

✨ Copy same env

It’s quite complex, to be honest. I created another instance and RDS to test whether the issue is due to my server environment or something else. Additionally, I don’t want to impact my current production environment.

Oh, also need to create a new security group, and open port 80, 22(If you don’t mind the security issue in test env, also can allow all traffic)

Just make the version is the same as the production env.

Follow my previous blog, I installed php env, composer, and flarum.

Don’t forget to edit composer.json file, and delete composer.lock. Then install again. The CLI is

1
composer install

After that, In RDS section, need to set up EC2 connection.

Also I exported my sql from RDS, and imported my test mysql env.

Debug in Test env

Now, follow the snoic installation. In snoic source code, we can find the entry. I don’t understand PHP code, but I know where the error is printed out. The logic is to retrieve all the public posts and push them somewhere. In this code, I need to find where the first error occurs.

I changed the code directly, added a break inside the loop, and printed every content retrieved.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
foreach ($posts as $post) {
    $start = microtime(true);
    $content = strip_tags($post->content);
    if (trim($content) !== '') {
        try {
            $this->info("Post ID: {$post->id}, Content: $content");
            $ingest->push('postCollection', 'flarumBucket', $post->id, $content, $locale);
        } catch (\Throwable $e) {
            $this->info(PHP_EOL);
            $this->error("Post id {$post->id} with " . strlen($content) . ' bytes of content failed after ' . round((microtime(true) - $start) * 1000, 2) . 'ms');
            break; // Break out of the loop on exception
        }
    }
    $progress->advance();
}

I received 700 normal contents, and then it jumped out of the loop. Now I see that the error part is very long and large because one of the users typed a lot of content in one section.

I have two solutions: split the content or simply ignore long content. I choose the latter one, as it is easier, albeit at the cost of some search accuracy

1
2
3
4
5
6
7
8
$posts = Post::select('id', 'content')
    ->where('type', '=', 'comment')
    ->where('is_approved', 1)
    ->where('is_private', 0)
    ->whereNull('hidden_at')
    ->whereBetween('id', [1, 1130])
    ->orWhereBetween('id', [1134, 1172])
    ->get();

I already know the post ID for the long content, so I skipped those. Then, I ran the CLI in the Flarum path:

1
php flarum sonic:addtoindex

All the test pass!

I also try to split the content, but didn’t work😭😭😭, can anyone tell me the reason?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
$chunkSize = 10000; // Set your desired chunk size

foreach ($posts as $post) {
    $this->info("Post ID: {$post->id}, Content: {$post->content}");
    $start = microtime(true);
    $content = strip_tags($post->content);

    // Split the content into chunks
    $contentChunks = str_split($content, $chunkSize);

    foreach ($contentChunks as $chunk) {
        if (trim($chunk) !== '') {
            try {
                $ingest->push('postCollection', 'flarumBucket', $post->id, $chunk, $locale);
            } catch (\Throwable $e) {
                $this->info(PHP_EOL);
                $this->error("Post id {$post->id} with " . strlen($chunk) . ' bytes of content failed after ' . round((microtime(true) - $start) * 1000, 2) . 'ms');
                break 2; // Break out of both inner and outer loops on exception
            }
        }
    }

    $progress->advance();
}