The best grep tool in the world; ripgrep

19 June 2018   3 comments   Linux, Web development, MacOSX

https://github.com/BurntSushi/ripgrep

tl;dr; ripgrep (aka. rg) is the best tool to grep today.

ripgrep is a tool for searching files. Its killer feature is that it's fast. Like, really really fast. Faster than sift, git grep, ack, regular grep etc.

If you don't believe me, either read this detailed blog post from its author or just jump straight to the conclusion:

Benchmark
Benchmark

I used to use git grep whenever I was inside a git repo and sift for everything else. That alone, was a huge step up from regular grep. Granted, almost all my git repos are small enough that regular git grep is faster than I can perceive many times. But with ripgrep I can just add --no-ignore-vcs and it searches in all the files mentioned in .gitignore too. That's useful when you want to search in your own source as well as the files in node_modules.

The installation instructions are easy. I installed it with brew install ripgrep and the best way to learn how to use it is rg --help . Remember that it has a lot of cool features that are well worth learning. It's written in Rust and so far I haven't had a single crash, ever. The ability to search by file type gets some getting used to (tip! use: rg --type-list) and remember that you can pipe rg output to another rg. For example, to search for all lines that contain query and string you can use rg query | rg string.

Comments

ROGER DELOY PACK

Nooo...you can't say "to grep with..." noooo...

Georgi

>For both searching single files ..., no other tool obviously stands above ripgrep in either performance or correctness.

No, searching literals (i.e. exact matching) with Kazahana is superior, why, because the tool has the fastest memmem() function.
A quick example, on laptop with i7-3630QM:

```
Testfile: 13,113,340,782 OpenSubtitle_corpus_en_2018_(441,450,449_lines_FROM_446,612_files).txt

Benchmarking literal (Exact Matching) "Now, Hercules and his little friends won't stand a chance" ...

F:\grep_vs_Kazahana>timer32.exe "Kazahana_Monad_GCC_472_SSE41_32bit.exe" "Now, Hercules and his little friends won't stand a chance" "OpenSubtitle_corpus_en_2018_(441,450,449_lines_FROM_446,612_files).txt" 520123

Kernel Time = 3.656 = 66%
User Time = 1.796 = 32%
Process Time = 5.453 = 99% Virtual Memory = 512 MB
Global Time = 5.475 = 100% Physical Memory = 513 MB

F:\grep_vs_Kazahana>timer32.exe "Kazahana_Monad_GCC_730_SSE41_64bit.exe" "Now, Hercules and his little friends won't stand a chance" "OpenSubtitle_corpus_en_2018_(441,450,449_lines_FROM_446,612_files).txt" 520123

Kernel Time = 3.718 = 75%
User Time = 1.203 = 24%
Process Time = 4.921 = 99% Virtual Memory = 511 MB
Global Time = 4.940 = 100% Physical Memory = 512 MB

F:\grep_vs_Kazahana>timer32.exe "Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+EX+CS_fix_DEFINE_Trolldom_MONAD-Thread_IntelV15_SSE41_64bit.exe" "Now, Hercules and his little friends won't stand a chance" "OpenSubtitle_corpus_en_2018_(441,450,449_lines_FROM_446,612_files).txt" 520123

Kernel Time = 3.671 = 74%
User Time = 1.265 = 25%
Process Time = 4.937 = 100% Virtual Memory = 511 MB
Global Time = 4.911 = 100% Physical Memory = 512 MB

F:\grep_vs_Kazahana>timer32.exe "Kazahana_Hexadecad_GCC_730_SSE41_64bit.exe" "Now, Hercules and his little friends won't stand a chance" "OpenSubtitle_corpus_en_2018_(441,450,449_lines_FROM_446,612_files).txt" 520123

Kernel Time = 3.718 = 80%
User Time = 5.015 = 108%
Process Time = 8.734 = 188% Virtual Memory = 513 MB
Global Time = 4.627 = 100% Physical Memory = 514 MB

F:\grep_vs_Kazahana>timer32.exe "Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+EX+CS_fix_DEFINE_Trolldom_HEXADECAD-Threads_IntelV15_SSE41_64bit.exe" "Now, Hercules and his little friends won't stand a chance" "OpenSubtitle_corpus_en_2018_(441,450,449_lines_FROM_446,612_files).txt" 520123

Kernel Time = 7.671 = 137%
User Time = 30.140 = 539%
Process Time = 37.812 = 677% Virtual Memory = 515 MB
Global Time = 5.582 = 100% Physical Memory = 514 MB

F:\grep_vs_Kazahana>set LC_ALL=C

F:\grep_vs_Kazahana>timer32.exe grep.exe -F -c "Now, Hercules and his little friends won't stand a chance" "OpenSubtitle_corpus_en_2018_(441,450,449_lines_FROM_446,612_files).txt"
1

Kernel Time = 3.375 = 19%
User Time = 14.375 = 80%
Process Time = 17.750 = 99% Virtual Memory = 2 MB
Global Time = 17.762 = 100% Physical Memory = 5 MB

F:\grep_vs_Kazahana>timer32.exe "ripgrep-11.0.1-x86_64-pc-windows-gnu.exe" -c "Now, Hercules and his little friends won't stand a chance" "OpenSubtitle_corpus_en_2018_(441,450,449_lines_FROM_446,612_files).txt"
1

Kernel Time = 2.953 = 42%
User Time = 4.000 = 57%
Process Time = 6.953 = 100% Virtual Memory = 26 MB
Global Time = 6.948 = 100% Physical Memory = 4096 MB
```

Anonymous

I wonder if this varies by OS... I've just tried every one of the windows releases (they have both gnu and msvc compiled versions) on the ripgrep site and I can't find a single one which isn't at least 60% slower than standard GNU grep build for MinGW when it comes to searching our codebase of CPP files for a literal string.

$ time rg -F -cpp wsregex_iterator
PGApplicationInformation\PGHostedFileHelper.cpp:1

real 0m3.690s
user 0m0.015s
sys 0m0.031s

ben.staniford@L-8N7F4M2 MINGW64 /z/WindowsClient (bs/92539/CrashServiceNowDiscoverScan)
$ time grep -r --include=*.{cpp,hpp,h} wsregex_iterator *
PGApplicationInformation/PGHostedFileHelper.cpp: auto itPs1Files = std::wsregex_iterator(tmp.begin(), tmp.end(), rgxPs1File);

real 0m2.286s
user 0m0.171s
sys 0m1.921s

So ripgrep seems to kinda suck on Windows although it's not as bad as silver searcher which takes more than 5x longer than grep to do this search..

Your email will never ever be published

Related posts

Previous:
How to unset aliases set by Oh My Zsh 14 June 2018
Next:
A good Django view function cache decorator for http.JsonResponse 20 June 2018
Related by Keyword:
Rust > Go > Python ...to parse millions of dates in CSV files 15 May 2018
How I found out where a bash alias was set up 09 May 2018
gg - wrapping git-grep 11 August 2009
Redirect stderr into becoming dots in Bash 02 September 2006
Grep results expanded 23 April 2005