Cleaning OCR'd Text Files

  • Download Strawberry Perl with Padre IDE http://padre.perlide.org/download.html
  • Open Padre, Click Search on main Nav bar, and select option Replace in Files
  • Enter search term and replace with terms
  • Choose directory where files are located (will also search through all subdirectories)
  • Choose regular expression and case sensitive as needed

(Create duplicate directory for cleaning so as not to alter master txt files)


Text to Clean:

  • Spider-Man to spiderman
  • Mary Jane to MaryJane
  • Spidey to spiderman ? (any other synonyms?)
    ------------------------------
  • What about other names to crunch together? Doc Ock, Fantastic Four, etc.
  • Other text cleaning suggestions?


1960's

  • Removed addresses and salutations per last conversation with John
  • Spider-man to Spiderman
  • Spidey to Spiderman
  • S-M to Spiderman
  • Mary Jane to MaryJane

File named 1960s -Clean B on Box