
I used to revise a manuscript of several hundred pages dozens of times until my eyes turned red.
I had to revise and re-revise things countless times—when guidelines changed, when I found typos, or when I needed to standardize terminology.
Yes, that was me. Haha;;;
After learning regular expressions, I finished work that would have taken a week in just 1-2 hours!
For researchers who need to write papers or organize and process data,
For journal coordinators who need to manage documents of dozens or hundreds of pages,
For publishers who need to edit books year-round,
And for translators who work with texts written in various languages—
After learning regex, your world will be divided into “before regex” and “after regex.”
How to Use
There are various terms and syntax used in regular expressions.
It’s not too late to learn the terms and syntax after actually using it first.
After all, our goal is to use it, not just to study it!
I’ll explain based on Notepad++, which is great for beginners.
If you haven’t installed it yet, please refer to the Notepad++ post.
Once you succeed with the examples below, you’ll probably get itchy hands.
You’ll be thinking, “What else can I do with this?”
A Taste of Regular Expressions
Replacing Multiple Spaces with a Single Space
Let’s start with an easy practice.
Here’s the first verse of the South Korean national anthem.
동해물과 백두산이 마르고 닳도록
하느님이 보우하사 우리나라 만세
무궁화 삼천리 화려 강산
대한 사람 대한으로 길이 보전하세
- How can we replace all the various spaces with a single space?
- The first method that comes to mind is to replace 2 spaces with 1 space, and repeat this about 10 times.
- With regular expressions, you can do it in one go. (You need to check “regular expression” in the Find & Replace dialog.)
- Find:
[ ]{1,20} - Replace:
(1 space)
- Find:
- Let’s break down the regular expression.
[]means 1 character.{1,20}means the minimum and maximum range of the preceding character.- Therefore,
[ ]{1,20}is a pattern that finds from 1 space up to 20 spaces.
- When working with documents, indentation or position adjustment is often expressed with spaces.
- This is a simple way to handle such cases.
Finding Text in Brackets
Let’s raise the difficulty level a bit.
Sometimes you get requests like this:
I need to extract the index terms from the book—please extract the parts marked with «…».
1. <<동해물>>과 <<백두산>>이 마르고 닳도록
하느님이 보우하사 우리나라 만세
2. <<남산>> 위에 저 소나무 철갑을 두른 듯
바람 서리 불변함은 우리 기상일세
3. 가을 하늘 공활한데 높고 구름 없이
밝은 달은 우리 가슴 일편단심일세
4. 이 기상과 이 맘으로 충성을 다하여
괴로우나 즐거우나 <<나라>> 사랑하세
<후렴>
무궁화 삼천리 화려 <<강산>>
대한 사람 대한으로 길이 보전하세
- If it’s short like the example, you can do it by hand, but for a 500-page book, it’s a completely different story.
- Even without knowing regex, let’s think about a pattern to delete everything at once.
- How about deleting everything between
>>and<<and inserting a line break? Let’s try it.- Find:
>>[\s|\S]+?<< - Replace:
>>\r\n<<(Linux:>>\n<<, Mac:>>\r<<)
- Find:
- Then you get this:
1. <<동해물>>
<<백두산>>
<<남산>>
<<나라>>
<<강산>>
대한 사람 대한으로 길이 보전하세
- You can manually remove the simple parts on the first and last lines, and if you need to remove duplicates or sort alphabetically, you can refer to the Notepad++ post.
- Let’s break down the regular expression.
- We saw above that
[]represents 1 character, right? |means “or.”- So
[\s|\S]means\sor\S. \means it will be used for a special purpose in regex.\smeans all whitespace characters. Whitespace characters include line breaks, spaces, tabs, etc.\Smeans all characters that are not whitespace.+means one or more of the preceding character.?means to find the shortest pattern.\r\n(Windows),\r(Mac), or\n(Linux) means a line break.- In summary, it means to find the shortest pattern of all text between
>>and<<, whether it’s a line break, space, or any character.
- We saw above that
- My explanation might be insufficient, so try running Find & Replace once more.
- If you look at the Find & Replace results and think, “This could make my grunt work a bit easier,” then you’re someone who needs regular expressions.
Important Notes
- Never edit the original directly; always make a habit of creating and editing a copy.
- You’ll understand this deeply at least once. Yes, losing data is a rite of passage. Welcome to the world of regex!