What Are Regular Expressions (Regex)?
Regular expressions, often shortened to regex, are sequences of characters that define a search pattern. They are used to find, validate, extract, or replace text that matches specific rules — from checking an email address format to parsing logs or sanitizing input data.
Regex is supported in almost every programming language, including JavaScript, Python, Java, PHP, C#, and Perl.
Why Use Regex?
Regular expressions allow developers to:
- Validate input formats (emails, phone numbers, postal codes).
- Find and replace patterns in text.
- Extract structured data from unstructured text.
- Simplify complex string operations with concise syntax.
Basic Regex Components
| Symbol | Meaning | Example | Matches |
|---|---|---|---|
. | Any single character except newline | c.t | cat, cut, cot |
^ | Start of a line or string | ^Hello | Matches only if line starts with Hello |
$ | End of a line or string | world$ | Matches only if line ends with world |
* | 0 or more repetitions | lo*l | ll, lol, loool |
+ | 1 or more repetitions | lo+l | lol, loool (not ll) |
? | 0 or 1 occurrence | colou?r | color or colour |
{n} | Exactly n occurrences | \d{4} | Matches 4 digits (e.g. 2025) |
{n,} | n or more occurrences | \d{2,} | Matches at least two digits |
{n,m} | Between n and m occurrences | a{2,4} | aa, aaa, aaaa |
Character Classes
| Syntax | Meaning | Example | Matches |
|---|---|---|---|
[abc] | Any one of a, b, or c | b[aiu]t | bat, bit, but |
[^abc] | Any character except a, b, or c | [^0-9] | Any non-digit |
[a-z] | Any lowercase letter | [a-z]+ | regex, test, word |
[A-Z] | Any uppercase letter | [A-Z]+ | HELLO, WORLD |
[0-9] or \d | Any digit | \d{3} | 123, 007 |
\D | Any non-digit | \D+ | abc!, test |
\w | Word characters (letters, digits, underscore) | \w+ | hello_123 |
\W | Non-word characters | \W+ | spaces, punctuation, etc. |
\s | Whitespace (space, tab, newline) | \s+ | space or tab |
\S | Non-whitespace | \S+ | word, test |
Regex Examples for Common Use Cases
1. Validate an Email Address
^[\w.-]+@[\w.-]+\.\w{2,}$
Explanation:
^→ start of string[\w.-]+→ username (letters, digits, underscore, dot, dash)@→ literal@[\w.-]+→ domain name\.\w{2,}→ dot followed by at least two letters$→ end of string
✅ Matches:
hello@example.com
john.doe@my-domain.org
🚫 Does not match:
hello@.com
@domain.com
2. Validate a Phone Number
^\+?\d{1,3}?[-.\s]?\(?\d{1,4}\)?[-.\s]?\d{3,5}[-.\s]?\d{4,6}$
Covers most international phone formats:
+1 202 555 0198
(202) 555-0198
0040-721-999-888
3. Extract Hashtags from Text
#\w+
In JavaScript:
const text = "Learning #regex is #fun and #powerful!";
const hashtags = text.match(/#\w+/g);
console.log(hashtags); // ["#regex", "#fun", "#powerful"]
4. Remove Special Characters
/[\W_]+/g
Used to replace non-alphanumeric characters with spaces or an empty string.
Example:
const clean = "Hello, World!".replace(/[\W_]+/g, ' ');
console.log(clean); // "Hello World"
5. Match Dates (DD/MM/YYYY)
^(0?[1-9]|[12][0-9]|3[01])/(0?[1-9]|1[0-2])/\d{4}$
Covers common European date formats.
✅ Matches:
01/01/2025
9/11/2025
31/12/1999
6. Find Duplicate Words
\b(\w+)\s+\1\b
In Python:
import re
text = "This is is a test test line"
duplicates = re.findall(r'\b(\w+)\s+\1\b', text)
print(duplicates) # ['is', 'test']
Lookahead and Lookbehind Assertions
Advanced regex uses lookaheads and lookbehinds to match context without consuming text.
| Type | Syntax | Description |
|---|---|---|
| Positive Lookahead | X(?=Y) | Match X only if followed by Y |
| Negative Lookahead | X(?!Y) | Match X only if not followed by Y |
| Positive Lookbehind | (?<=Y)X | Match X only if preceded by Y |
| Negative Lookbehind | (?<!Y)X | Match X only if not preceded by Y |
Example:
\d+(?= euros)
Matches digits followed by the word “euros”.
In the text Price: 120 euros, it matches 120.
Regex Examples by Language
🟦 JavaScript Example
const regex = /\d{4}-\d{2}-\d{2}/g;
const dates = "2025-01-01 and 2025-12-31";
console.log(dates.match(regex)); // ["2025-01-01", "2025-12-31"]
🐍 Python Example
import re
pattern = r"\b[A-Z][a-z]+"
text = "John met Alice and Bob at the park."
print(re.findall(pattern, text))
# Output: ['John', 'Alice', 'Bob']
☕ Java Example
import java.util.regex.*;
public class RegexDemo {
public static void main(String[] args) {
String text = "Order ID: 12345, Date: 2025-11-01";
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(text);
while (m.find()) {
System.out.println("Found number: " + m.group());
}
}
}
Performance Tips
- ✅ Use Anchors (
^and$) when validating entire strings — prevents unnecessary partial matches. - 🧠 Avoid excessive backtracking by simplifying groups and quantifiers.
- ⚡ Precompile regex patterns (especially in Java or C#) for reuse in loops.
- 🔍 Test patterns on tools like regex101.com or RegExr to visualize matches and debugging hints.
Common Regex Pitfalls
- Forgetting to escape special characters like
.,?,+, or(when you mean to match them literally. - Using
.*too broadly — it can match across multiple lines if not carefully constrained. - Ignoring performance when applying regex to very large files or logs.
Conclusion
Regular expressions are a powerful, language-independent tool for text manipulation and data validation.
From cleaning user input to extracting structured data, regex expressions can dramatically simplify your code — once you get comfortable with their syntax.
Whether you’re writing JavaScript, Python, or Java, learning regex is a must-have skill for developers who work with text data.


