CODE TUNING: Chapter 6.4: Java Regex Examples - Phone number, Check for a certain number range, Building a link checker, Finding an elements which start in a new line,

The following lists typical examples for the usage of regular expressions. I hope you find similarities to your real-world problems.

Task: Write a regular expression which matches a text line if this text line contains either the word "Joe" or the word "Jim" or both.

Create a project de.vogella.regex.eitheror and the following class.

package de.vogella.regex.eitheror;

import org.junit.Test;

import static org.junit.Assert.assertFalse;

import static org.junit.Assert.assertTrue;

public class EitherOrCheck {

  @Test

  public void testSimpleTrue() {

    String s = "humbapumpa jim";

    assertTrue(s.matches(".*(jim|joe).*"));

    s = "humbapumpa jom";

    assertFalse(s.matches(".*(jim|joe).*"));

    s = "humbaPumpa joe";

    assertTrue(s.matches(".*(jim|joe).*"));

    s = "humbapumpa joe jim";

    assertTrue(s.matches(".*(jim|joe).*"));

Task: Write a regular expression which matches any phone number.

A phone number in this example consists either out of 7 numbers in a row or out of 3 number, a (white)space or a dash and then 4 numbers.

package de.vogella.regex.phonenumber;

import org.junit.Test;

import static org.junit.Assert.assertFalse;

import static org.junit.Assert.assertTrue;

public class CheckPhone {

  @Test

  public void testSimpleTrue() {

    String pattern = "\\d\\d\\d([,\\s])?\\d\\d\\d\\d";

    String s= "1233323322";

    assertFalse(s.matches(pattern));

    s = "1233323";

    assertTrue(s.matches(pattern));

    s = "123 3323";

    assertTrue(s.matches(pattern));

The following example will check if a text contains a number with 3 digits.

Create the Java project de.vogella.regex.numbermatch and the following class.

package de.vogella.regex.numbermatch;

import java.util.regex.Matcher;

import java.util.regex.Pattern;

import org.junit.Test;

import static org.junit.Assert.assertFalse;

import static org.junit.Assert.assertTrue;

public class CheckNumber {

  @Test

  public void testSimpleTrue() {

    String s= "1233";

    assertTrue(test(s));

    s= "0";

    assertFalse(test(s));

    s = "29 Kasdkf 2300 Kdsdf";

    assertTrue(test(s));

    s = "99900234";

    assertTrue(test(s));

  public static boolean test (String s){

    Pattern pattern = Pattern.compile("\\d{3}");

    Matcher matcher = pattern.matcher(s);

    if (matcher.find()){

      return true;

    return false;

The following example allows you to extract all valid links from a webpage. It does not consider links which start with "javascript:" or "mailto:".

Create a Java project called de.vogella.regex.weblinks and the following class:

package de.vogella.regex.weblinks;

import java.io.BufferedReader;

import java.io.IOException;

import java.io.InputStreamReader;

import java.net.MalformedURLException;

import java.net.URL;

import java.util.ArrayList;

import java.util.List;

import java.util.regex.Matcher;

import java.util.regex.Pattern;

public class LinkGetter {

  private Pattern htmltag;

  private Pattern link;

  public LinkGetter() {

    htmltag = Pattern.compile("<a\\b[^>]*href=\"[^>]*>(.*?)</a>");

    link = Pattern.compile("href=\"[^>]*\">");

  public List<String> getLinks(String url) {

    List<String> links = new ArrayList<String>();

    try {

      BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(new URL(url).openStream()));

      String s;

      StringBuilder builder = new StringBuilder();

      while ((s = bufferedReader.readLine()) != null) {

        builder.append(s);

      Matcher tagmatch = htmltag.matcher(builder.toString());

      while (tagmatch.find()) {

        Matcher matcher = link.matcher(tagmatch.group());

        matcher.find();

        String link = matcher.group().replaceFirst("href=\"", "")

            .replaceFirst("\">", "")

            .replaceFirst("\"[\\s]?target=\"[a-zA-Z_0-9]*", "");

        if (valid(link)) {

          links.add(makeAbsolute(url, link));

    } catch (MalformedURLException e) {

      e.printStackTrace();

    } catch (IOException e) {

      e.printStackTrace();

    return links;

  private boolean valid(String s) {

    if (s.matches("javascript:.*|mailto:.*")) {

      return false;

    return true;

  private String makeAbsolute(String url, String link) {

    if (link.matches("http://.*")) {

      return link;

    if (link.matches("/.*") && url.matches(".*$[^/]")) {

      return url + "/" + link;

    if (link.matches("[^/].*") && url.matches(".*[^/]")) {

      return url + "/" + link;

    if (link.matches("/.*") && url.matches(".*[/]")) {

      return url + link;

    if (link.matches("/.*") && url.matches(".*[^/]")) {

      return url + link;

    throw new RuntimeException("Cannot make the link absolute. Url: " + url

        + " Link " + link);

The following regular expression matches duplicated words.

\b(\w+)\s+\1\b

\b is a word boundary and \1 references to the captured match of the first group, i.e., the first word.

The (?!-in)\b(\w+) \1\b finds duplicate words if they do not start with "-in".

Add (?s) to search across multiple lines.

The following regular expression allows you to find the "title" word, in case it starts in a new line, potentially with leading spaces.

(\n\s*)title

Monday, 5 December 2016