How can a script detect Base64?

How can a script detect Base64?

30
Basic UserBasic User
30

    Jan 29, 2010#1

    I have text files that are a combination of plaintext and base64. I'm looking to write a script that will decode the base64 sections.

    What I need to know is whether there is any way to programmatically detect them, so they can be decoded without touching the plaintext.


    Does anyone know how I can do this?

    6,603548
    Grand MasterGrand Master
    6,603548

      Jan 30, 2010#2

      Base64 encoding is mainly used by email applications and they store normally a header above embedded files encoded with base64. So you can search for that headers and evaluate them by the script to correct decode the base64 encoded block below each header.

      In case your file does not contain such headers I suggest to use a regular expression find searching for strings not containg a space or tab character. Every found string larger than X bytes (let's say 100), is typically not a normal text and is therefore probably a base64 encoded block.
      Best regards from an UC/UE/UES for Windows user from Austria

      30
      Basic UserBasic User
      30

        Jan 30, 2010#3

        Unfortunately, not all the strings are that long. And shorter strings can be mistaken for email addresses.

        I was able to get a serviceable result by limiting the search to the characters that Base64 uses:

        (?:[A-Za-z0-9+/=]{20,})


        So, I'll use that. Thanks.

        1
        NewbieNewbie
        1

          May 25, 2012#4

          A long shot but Bracket did you end up writing a script to do this?

          Our document management system has cataloged a bunch of text files with base64 encoding, I need to write a script that can go through them all, and only convert the blocks that are base64 encoded, leaving the plain text alone.

          30
          Basic UserBasic User
          30

            May 25, 2012#5

            rev0luci0n wrote:A long shot but Bracket did you end up writing a script to do this?
            Actually, I did. It serves my needs perfectly. It's possible that you might need to tweak the RegEx strings for your own documents. But see how this works for you:

            Code: Select all

            var WorkingFile = UltraEdit.activeDocument;
            
            // Function to execute a RegEx Find. Set CaseFlag to 1 for Case Sensitive, or 0 for not.
            function findRegEx(SearchString, CaseFlag)
            {
            	UltraEdit.perlReOn();
            	
            	if (CaseFlag == 1)
            	{
            		WorkingFile.findReplace.matchCase = true;
            	}
            	else
            	{
            		WorkingFile.findReplace.matchCase = false;
            	}
            	
            	WorkingFile.findReplace.regExp = true;
            	WorkingFile.findReplace.find(SearchString);
            }
            
            
            function Base64Decode(SearchString)
            {
            
            	// Flag to indicate if the loop should be broken
            	var BreakFlag = 0;
            	
            	WorkingFile.top();
            	
            	// Run the search and decode every found string
            	do
            	{
            		findRegEx (SearchString, 0);
            		
            		// If it found a match, decode it.
            		if (WorkingFile.isFound() == true)
            		{
            			WorkingFile.decodeBase64();
            		}
            		
            		else
            		{
            			BreakFlag = 1;
            		}
            	
            	} while (BreakFlag == 0);
            
            	WorkingFile.top();
            }
            
            
            // ---------------------------------------------------------------------
            // Main Execution
            // ---------------------------------------------------------------------
            
            
            Base64Decode("(?<=>>> 334 ).*");
            Base64Decode("(?:[A-Za-z0-9+/=]{20,})");

            6,603548
            Grand MasterGrand Master
            6,603548

              May 25, 2012#6

              Good script, Bracket. But it can be made faster with in a first step remove some lines by using boolean variables instead of integer variables:

              Code: Select all

              var WorkingFile = UltraEdit.activeDocument;
              
              // Function to execute a RegEx Find. Set CaseFlag to true for Case Sensitive, or false for not.
              function findRegEx(SearchString, CaseFlag)
              {
                 UltraEdit.perlReOn();
                 WorkingFile.findReplace.matchCase = CaseFlag;
                 WorkingFile.findReplace.regExp = true;
                 WorkingFile.findReplace.find(SearchString);
              }
              
              function Base64Decode(SearchString)
              {
              
                 // Flag to indicate if the loop should be broken
                 var BreakFlag = false;
              
                 WorkingFile.top();
              
                 // Run the search and decode every found string
                 do
                 {
                    findRegEx (SearchString, false);
              
                    // If it found a match, decode it.
                    if (WorkingFile.isFound() == true)
                    {
                       WorkingFile.decodeBase64();
                    }
              
                    else
                    {
                       BreakFlag = true;
                    }
              
                 } while (BreakFlag == false);
              
                 WorkingFile.top();
              }
              
              // ---------------------------------------------------------------------
              // Main Execution
              // ---------------------------------------------------------------------
              
              Base64Decode("(?<=>>> 334 ).*");
              Base64Decode("(?:[A-Za-z0-9+/=]{20,})");
              And in a second step it can be once more speed up by removing even more lines (function calls) resulting finally in:

              Code: Select all

              var WorkingFile = UltraEdit.activeDocument;
              
              function Base64Decode(SearchString)
              {
                 WorkingFile.top();
                 UltraEdit.perlReOn();
                 WorkingFile.findReplace.matchCase = false;
                 WorkingFile.findReplace.regExp = true;
                 while(WorkingFile.findReplace.find(SearchString))
                 {
                    WorkingFile.decodeBase64();
                 }
                 WorkingFile.top();
              }
              
              // ---------------------------------------------------------------------
              // Main Execution
              // ---------------------------------------------------------------------
              
              Base64Decode("(?<=>>> 334 ).*");
              Base64Decode("(?:[A-Za-z0-9+/=]{20,})");
              Please note that I have executed whether the script written by Bracket nor the modified scripts above. So I don't know if they really work.

              30
              Basic UserBasic User
              30

                May 25, 2012#7

                Thanks for the suggestions. The boolean vs integer flags I can't imagine would make that much of a difference (or rather, it shouldn't). As for the second set of modifications, it's true that it can be optimized - I've written function libraries that I use all the time, so I end up putting my scripts together in a more modular format for rapid creation, rather than the absolute most streamlined execution. :)