Regular Expression to Find Extra-Long Strings?

Regular Expression to Find Extra-Long Strings?

7
NewbieNewbie
7

    Jul 21, 2007#1

    I'm importing some big data tables into my HTML pages. Apparently, some of the table cells contain long strings without spaces, which are breaking my layout. That is, I can't set the table width to say, 500 pixels because these long, unbroken strings are stretching out the cell widths in at least one of the columns. And since there are thousands of rows, I can't find them manually.

    Naturally, I don't know in advance how long the strings actually are or what they contain, only that they are much longer than a typical word. They may be occuring at the beginning of a sentence, mid-paragraph (bounding by spaces), or at the end of a sentence (terminated with a period). They are probably 30 are more characters long, but could be much longer in the case of a long and nasty URL.

    How can construct a regular expression (or other Find / Replace method) to find these long strings? Thanks for your help!

    262
    MasterMaster
    262

      Jul 21, 2007#2

      First, you didn't tell us your version of UE and preferred regex style, so let's assume that you use version 12.00 or above.

      Switch to perl style regex as described in the announcement and use this regex:

      [^\s]{25,}

      which match all words, urls, anything without spaces,tabs with 25 characters and above. (Remember to check "regular expression" in the find dialog).

      I you want to experiment with some very primitive hyphenation you could search for

      ([^\s]{25})([^\s])

      and replace

      \1- \2

      which inserts a "hyphen" (dash) in words etc. after character no. 25.

      7
      NewbieNewbie
      7

        Jul 22, 2007#3

        Thanks, that worked!

        Sorry I missed the sticky . . . I was using an older 11.2 version of UE, so I download the current version for this task.

        It turns out there was some long unbroken lines as I expected, mostly non-breaking space characters that some bozo had in their webpage, and that were subsequently imported into my page. 

        But that's not what was breaking the layout. I found a line of keywords separated by commas in an unusual and grammatically incorrect way:

        Instead of this pattern . . .

        "word, word, word, etc."

        they used

        "word ,word ,word ,etc."

        that is, with the space preceding the comma instead of the comma preceding the space. After I did a search / replace to fix that, everything snapped back into shape. Hard to believe that would break the layout in the IE7 browser, but it did. Firefox was not affected.

        Would you recommend standardizing on the PERL regex, or does one just choose a regex standard depending on the need at hand?

        262
        MasterMaster
        262

          Jul 22, 2007#4

          Tuco wrote:Would you recommend standardizing on the PERL regex, or does one just choose a regex standard depending on the need at hand?
          I have used UE regex style "all my UE life", so I find myself using this to all the simple straight forward stuff. I switch to Perl regex style only for the more complex regex tasks. (I also have the regex tool RegexBuddy in my toolbox for this). So it is really a matter of which style you like and/or how complex your regex needs are.

          The only thing I struggle with is to remember which regex engine I last swiched to :?

          119
          Power UserPower User
          119

            Jul 23, 2007#5

            Tuco wrote:Would you recommend standardizing on the PERL regex, or does one just choose a regex standard depending on the need at hand?
            I'd recommend standardizing on the Perl syntax:
            1) Regular expressions are tricky enough without fussing with different dialects.
            2) Perl's regexes are much more powerful than the legacy UE style.
            3) A lot of tools and programming languages use a regex syntax that is essentially the same as the Perl's. If you learn the Perl syntax you'll be able to reuse the knowledge elsewhere.

            7
            NewbieNewbie
            7

              Jul 24, 2007#6

              That's excellent . . . thank you very much, I appreciate the help.