Version 8.5 (2270)
- Textsoap 8 0 9 – Automate Tedious Text Document Cleaning Pad
- Textsoap 8 0 9 – Automate Tedious Text Document Cleaning Kit
- Improved compatibility with macOS Catalina
- Performance improvements for app integration, Services in macOS Mojave and Catalina.
- Custom Services support removed. The mechanism used is unavailable when supporting Catalina.
- TextSoap is for people who work with text. TextSoap effortlessly cleans up text from endlessly different formats. Wash away unwanted characters, spaces, tabs. Fix paragraphs with hard returns at the end of each line, as well as a myriad of other formatting issues that come your way and do it all while retaining desired font styles.
- Berkeley Electronic Press Selected Works.
Version 8.4.10 (2250)
Click Title for torrent TextSoap 8.4 – Automate tedious text document cleaning. Size: 15.34 MB TextSoap is for people who work with text. TextSoap effortlessly cleans up text from endlessly different formats. Wash away unwanted characters, spaces, tabs. Fix paragraphs with hard returns at the end of each line, as well as a myriad of.
Fixed
- App could strip off style information from Rich Text files (rtf, docx).
- Services inspector text did not display correctly in Dark Mode.
- Descriptions between Capitalize Sentences and Capitalize Sentences (Alt) were reversed.
- Show Group shortcut option did not display colors associated with group items.
- App would not always respect preference for Appearance: Use Gray Clipboard Workspace Window.
- Mojave behavior change caused left margin of text to slide under line ruler when toggling 'Show Line Numbers'.
Improved Trickster 2 4 2 download free.
- Capitalize Sentences now works better with sentences that include quoted text.
Version 8.4.9 (2232)
Fixed
- Updated 'Capitalize Common Tech Names' with new iPhone model names.
- Fixed TextSoap Menu items not displaying correctly in Dark Mode on Mojave.
- Fixed delays when using TextSoap Menu, global keyboard shortcuts on Mojave.
- Note: If you are still seeing long delays, you may need to trigger Mojave to notice updated helper app:
- Go to TextSoap app, Preferences > TextSoap Menu.
- Unselect, then re-select 'Install TextSoap Menu' option.
- Go to System Preferences > Security & Privacy > Privacy > Accessibility
- Find, then Unselect, then re-select textsoapMenu app in list.
- Go to TextSoap app, Preferences > TextSoap Menu.
- Note: If you are still seeing long delays, you may need to trigger Mojave to notice updated helper app:
Version 8.4.8 (2227)
Fixed
- Default document type (plain or rich) now accurately reflects the option and behaves consistently.
- Returned preference option to disable clipboard workspace.
- Addressed display issues in custom cleaners under macOS Mojave.
- Additional improvements when running under macOS Mojave.
Version 8.4.7 (2220)
Added
- Additional changes for General Data Protection Regulation (GDPR).
- With each new version, we now ask you to verify whether you wish to share diagnostics/crash reports.
- New General Preference options to control whether you share diagnostics/crash reports.
Fixed
- Extract Text action: 'Match Case' option was not reliably tracked.
- Hyperlinks To Text action: 'Hyperlink URL' option was not reliably tracked.
- Define Macro, Apply Macro: Reworked macro implementation to improve reliability.
- Additional internal improvements.
Version 8.4.6 (2211)
Added
- [SetApp] Added macOS Services menu items.
- New Action (Advanced feature): Perform Named Service. Will perform named OS X Service on the given text.
- Do not call a TextSoap specified service item (just apply the cleaner directly)
- Service name must include any submenus as specified in their definition. Submenus are no longer displayed in macOS, but to use the Service, it has to be specified. You may need to examine the NSServices / NSMenuItem entry within the app's info.plist to get the exact value. For example to use BBEdit's 'New BBEdit Document with Selection' Service, you need to specify it based on it's NSMenuItem entry: 'BBEdit/New BBEdit Document with Selection'. We did say it was an advanced feature.
Improved
- [Direct] New License menu item in App menu to easily view current license info.
- [Direct] New Recover option to retrieve existing TextSoap 8 license codes. Select new License menu item to access.
- [Direct] License tab now shows current license key.
Fixed
- Fixed issue that caused false error to be reported in Console when using the cleanFile command in AppleScript on a rich text document.
- Updated frameworks
Version 8.4.5 (2200)
Fixed
- Fixed issue with actions that set color. Checkbox value was not always recognized.
- Fixed issue where custom cleaner preview was sometimes caching text and not always reflecting changes made by user to that text.
Version 8.4.4 (2197)
Improved
- Updated documentation
- Updated frameworks
Fixed
- Fixed problem with custom cleaner preview not displaying correctly the first time preview is clicked.
- Fixed an issue found with layouts in custom cleaner.
Version 8.4.3 (2193)
Improved
- Internal improvements
- Updated frameworks
Version 8.4.2 (2190)
Additions
- New cleaner added - Paragraph Ruler: Extra Spacing After
Fixes
- Fixed issue that could prevent actions from displaying in custom cleaner editor
- Accessibility improvements
- Additional internal improvements
- Updated internal frameworks
Version 8.4.1 (2179)
Fixes
- Fixed an issue that could potentially cause a crash at launch.
- Additional changes to improve stability.
Version 8.4 (2176)
Additions
- Under macOS Sierra (10.12), Customize Navigator, Custom Cleaner Editors and Custom Group Editors are grouped together as tabs.
- Support for literal characters (i.e. t,n,x{20} ) added to the following actions:
- Add Prefix
- Add Suffix
- Remove Prefix
- Remove Suffix
- Insert Text
- New DateNow Expansion cleaner : Converts ${DATENOW} into today's date ('April 21, 2017'). Additional options available to specify various date formatting. See 'Date-related Actions' help topic for complete details under Help > TextSoap Help.
Fixes
- Updated internal frameworks to improve stability
- Fixed layout issues in editor for some line actions.
- Fixed issue where setting font would not work if you had previously converted it to plain text (with no text attributes).
- Fixed crashing issue & record corruption if user deleted name of custom cleaner in editor and triggered a save.
Version 8.3.4 (2163)
Additions
- Added iPhone 7, iPhone 7 Plus, macOS Sierra to tech names list.
Fixes
- Fixed layout issues with tag text, hyperlinks actions in custom cleaner editor.
- Fixed issue that prevented using some function keys as a cleaner action shortcuts.
Version 8.3.3 (2160)
Fixes
- Addressed issues related to crashes when viewing license info.
Version 8.3.2 (2157)
Fixes
- Fixed issue introduced in 8.3.1 that could cause crash at startup on some systems.
- When converting clipboard workspace/document to plain text, now uses plain text font specified in preferences.
Version 8.3.1 (2154)
Fixes
- Fixed issue which required two clicks of TextSoap Menu before previously set group would display.
- Fixed issue that could cause crash issue with some text when using Title Case action and cleaner.
Changes
- TextSoap Menu shortcuts now only activate cleaner actions.
- Supplemental windows (Regex Ref, Release Notes, Regex Tutorial) now prefer single tabbed window in macOS Sierra.
- If automatic update checking is disabled, now displays reminder if user has not checked for updates in past 90 days.
Version 8.3 (2143)
Additions
- Added new option in Appearance Preferences for a Gray Clipboard Workspace Window.
Changes
- Improved case transformation cleaners offer improved international language support. This affects Uppercase, Lowercase, Capitalized Words related cleaners.
- Title Case cleaner rewritten with to follow common suggestions, edge cases, as well as offering better international language support.
- Removed 'Add Section Marker' button from Custom Cleaner List Editor. Existing Section markers continue to work (for now), but are considered deprecated and will be completely removed in the future. This should not affect most users.
- Additional internal changes to improve stability.
Fixes
- Fixed issue that cause layout issues when inserting special characters and selecting either the ICU or POSIX property categories.
Version 8.2.1 (2129)
Fixes
- Fixed potential crash when selecting button to bring up Find options in window's Find tab.
- Fixed several syntax highlighting issues with regular expressions.
- Additional compatibility improvements for macOS Sierra.
- Updated components to improve stability.
- Editing text in custom cleaner preview popup is now disabled, but you can still select (copy) text.
- Cleaner statistics are correctly updated when about window/tab is activated.
Version 8.2 (2127)
Note
- In order to simplify Lists within custom cleaners, we are looking to remove Section Markers and instead supporting a single list within a custom cleaner. If you are using List section markers to create multiple lists, please contact support(at)unmarked.com so we can better understand your solution and look at ways to better address your needs. If you don't know what List section markers are, then you are likely not affected by this change.
Additions
- Added Insert Text category to add specific column indicators in Batch Find and Replace action.
- New in-app help file. New topics added, including several custom cleaner tutorials.
- New in-app help viewer.
- Added Copy icon to toolbar.
- New expanded help file. New tutorials added. See Help > TextSoap Help for updated docs.
- Clean clipboard contents using TextSoap Menu. Hold down option key and select TextSoap Menu icon in menubar to clean clipboard contents.
- Leave contents on the clipboard. Hold down Shift key when using TextSoap Menu to copy text from the current app, apply the specified cleaner and then leave it on the clipboard.
Fixes
- Fixed extraneous clipboard contents logging when launching.
- Fixed issue that prevented Run Automator from functioning properly.
- Fixed issue that kept text editor font preferences from working correctly.
- Fixed issues related to drawing disabled actions.
- Fixed filtering issues for actions in custom cleaner.
- Fixed issue that could cause an exception when dragging in new actions into custom cleaner workspace.
Version 8.1 (2116)
Additions
QuickClean Menu - QuickClean lets you use cmd-1 thru cmd-9 to quickly access your favorite cleaners. Select a group and cmd-1 thru cmd-9 will be mapped to the first nine cleaners within that group. Change the group and instantly change the cleaners mapped to each shortcut. See which mappings, select Text > QuickClean menu. Create custom group for complete control over cmd-1 thru cmd-9 shortcuts within TextSoap.
Search History keeps a history of your manual searches.
- Access your history in interactive finds.
- Access items in find/replace actions within custom cleaners.
Insert Special Text Characters option. Insert tabs, returns, regex metacharacters, etc.
- Now has popup menu for various categories.
- Added properties categories for ICU p{Property} and POSIX [:Property:] formats.
- Negate or Use Long Names ({Letter} vs p{L}) when specifying properties.
- Optionally capture given sequence,
- Specify count (1, 0+, 1+, N, N+, N-M) for given sequence.
Toolbar icons added to clipboard Workspace and documents. Customize cleaners/groups action moved to the top.
Fileward 1 74. Highlight Current Line option now available for text editor.
New Cleaners
- Fixup macOS name -- converts 'Mac OS X', 'OS X' to 'macOS'
- Reverse Line Order
- Remove All Spaces
- Remove All Whitespace
Changes
Preferences have been reworked.
- New Text Editor preferences controls default settings for clipboard workspace and new text documents.
- Added preference option to set showing current line on new text editors.
- Appearance pane controls color for invisibles and current line, and action color style as well as toolbar display options.
- New default document type is now in General preferences.
TextSoap Menu Shortcut Actions improvements
- Replaced popup menu with a popover.
- Added a filter option for cleaners to apply.
Custom Cleaner Editor:
- Text Fields in actions now auto-size based on the text entered.
- The delete (X) icon was removed from each action. To remove an action, select it and select Edit > Delete in menu, or press Delete key.
- User Note actions no longer have an enable button (there is no action performed for notes).
- You can now enter returns inside a User Note action.
- Find/Replace actions now have a color associated with them (purple).
- Macro List definition is now a variant of Cyan.
- New icons for custom cleaner toolbar.
Fixes
- 'Capitialize Common Tech Names' now correctly fixes up multi-word tech names like 'OS X El Capitan', 'iPhone 6s Plus' and 'MacBook Pro'.
- Current text field changes were not committed when clicking preview in custom cleaner.
Version 8.0.9 (2097)
Additions
- New: Interactive Find - text is now grayed out and matched text is highlighted in blue. Also added a 'Done' button to remove any match highlights from the text when user is finished searching.
- New: Added Clean with TextSoap 8 MyScrub Service menu item. This will apply the 'MyScrub' cleaner, which is specified in Preferences > General.
- New: Support for opening text files with unknown extensions.
New: Scripting commands for the main application. When using textsoap8Agent, it acts a go-between with the main TextSoap app, running the main in 'agent mode' (which allows it to run w/o a user interface). These three new scripting commands allow AppleScripts to directly connect to the main app and enable 'agent mode'.
These commands will allow an AppleScript to test and control the agent mode state of an app.
- enableAgentMode -- transform app to use agent mode. Command is ignored if app is already in agent mode.
- disableAgentMode -- turn off agent mode. Returns app to standard app mode if needed
- isAgentMode -- indicates whether app is currently in agent mode
- New: Helper app to inspect OS X Service definitions. Available via OS X Services preference. Click 'Launch Inspector'.
- New: Under Preferences > Advanced, there are now two buttons that allow you to launch and quit both textsoap8Agent and TextSoap Menu. For the latter, it is recommended that you use the checkbox in Preferences > TextSoap Menu to install and launch the app.
Fixes
- Fixed: Calling AppleScript pickCleaner command on textsoap8Agent could cause a crash.
- Fixed: TextSoap Menu Palette did not remember its position and size between launches of TextSoap Menu.
Changes
- Changed: OS X Service standard menu item renamed to 'Clean with TextSoap 8'.
Changed: TextSoap's launch behavior has changed. Earlier versions of TextSoap launched as an accessory and transformed itself into a standard app. This created some subtle, but noticeable inconsistencies between a normal app launch and TextSoap. Starting with TextSoap 8.0.9, TextSoap will launch as a standard app and transform itself into an accessory when asked to launch in agent mode (by TextSoap Menu, textsoap8Agent, etc). This provides a more standard user experience launching the app when opening files, clicking on the icon, etc.
When TextSoap is launched in agent mode (to support TextSoap Menu, textsoap8Agent, etc), it may briefly show up in the dock and then disappear. This is expected. To minimize this display, when TextSoap is launched in agent mode, it will stick around in the background for a while, to avoid a re-launch. If TextSoap is already running as a standard app, TextSoap Menu and textsoap8Agent will leave it alone.
Known Issues
- Using TextSoap's' OS X Service actions within TextSoap app can cause OS X Services action to time out and not work. In general, I recommend using TextSoap Menu over OS X Services. TextSoap Menu provides more customization options, is richer and more dynamic (allowing access to all cleaners, allows changing groups, etc) and avoids many of the issues with OS X Services.
Version 8.0.8 (2085)
Fixes
- Fixed: 'If Text Matches' action does not correctly process regex options.
- Fixed: Markdown cleaner was treating some lines with colons as meta-data, creating incorrect results. (#581)
- Fixed: Choosing color from Color Panel in custom group editor for a group item did not work if that item was not selected. Now, when the color panel is brought up, the group item for that color well will be selected (to receive color changes).
- Fixed: Scripting definition changes believed to be the causing issues with some integration options. Reverted back to original 8.0.6 definitions.
Version 8.0.7 (2076)
Fixes
- Fixed: Clicking 'Remind Me Later' for updates did not postpone next check as expected.
- Fixed: Default automatic check update interval was too low.
- Fixed: New Doc Word Count preference setting was not being honored for documents.
- Fixed: Clipboard Workspace would reload (overwriting existing contents) if app was hidden and then re-activated by clicking on icon in dock.
- Fixed: Clicking 'Credits' tab in About window might cause a crash, or not present credits.
Version 8.0.6 (2070)
Fixes
- Fixed: Framework was causing regular crashes.
Version 8.0.5 (2067)
Additions
- New: Clipboard Workspace show ruler, show lines numbers, show word count, show invisibles settings are now 'sticky' between launches.
- New: Text in Clipboard Workspace can now be saved to disk.
- New: Contents in Clipboard Workspace can now be printed as a text document.
Fixes
- Fixed: Updates to custom cleaners & groups did not always properly propogate. Changes should be more robust in propogating into the various parts of the app or helpers.
- Fixed: Labels and separators weren't stripped out when requesting cleaners within a group via AppleScript (asTokens=NO).
- Fixed: Cleaner list search finds any matches in the current group and then also any matches in the 'Library' group. However, it could create duplicate entries for custom cleaners if they were in the current group (one for the match in the group, one for the match in the library). This is clearly demonstrated with the 'Standard' group which automatically includes custom cleaners.
- Fixed: When creating a batch find & replace action, a group named '-ALL-' was created by default. Group specifying all items is now called 'All'. Both '-ALL-' and 'All Items' will automatically be converted to 'All'.
- Fixed: Custom cleaner editor would not expand beyond a certain width on some monitors. Limited access to full screen and/or created void areas.
Changes
- Changed: #SCRUB2 cleaner, a custom remnant from long ago, has been removed. If you used this, you can create a custom cleaner as an alternative.
- Changed: About TextSoap window updated.
Version 8.0.4 (2060)
Additions
- New: Added Customize button to all editor windows (lower right).
- New: Clipboard Workspace supports converting between plain and rich text.
Fixes
- Fixed: In some situations, the Library group would not show all custom cleaners.
- Fixed: Avoid potential issue with batch find action layout.
- Fixed: $l (lowercase-L) was not colorized as a token in regex replacement strings.
- Fixed: (Regression in 2054) Selecting custom groups in cleaner list leaves Library group active.
- Fixed: Converting between plain and rich text documents did not adhere to warning preference.
- Fixed: Capitalize with Title Case (and related actions) would delete some separators (i.e. colons).
Version 8.0.3 (2052)
Additions
- New: Added support to drag cleaner and library files onto TextSoap icon (in dock) to import them. Will bring up the importer window and add the files.
- New: If clipboard workspace window was closed, clicking on app icon will re-load the clipboard workspace.
Fixes
- Fixed: Regular expressions could find a zero length match, which caused extra, and incorrect, replace actions.
- Fixed: Regular expressions did not always recognize when options (multiline, dot matches) were set.
- Fixed: New documents do not remember last size set.
- Fixed: Quitting with open custom cleaner editor could lose changes made.
- Fixed: Launching TextSoap app via 3rd party launcher after it was used by TextSoap Menu now works correctly.
Changes
- Added: New Advanced preference option added to enable larger toolbar buttons. When enabled (relaunch required), app will use regular size action buttons and revert to the older style ruler (with slightly bigger icons).
- Changed: Add note to TextSoap Menu preferences to remind users that TextSoap Menu is required to for global keyboard shortcuts to be enabled.
- Changed: Keyboard shortcut for accessing Custom Cleaners and Groups is now cmd-0.
- Changed: TextSoap will now restart TextSoap Menu at launch if it was 'installed' but the user quit it manually.
- Changed: Find fields can be grown with new slider on Find & other actions.
- Changed: Replace fields will automatically grow (2x) if Find field is made larger.
- Changed: Added resizing sliders to Text Extract, Note, Insert Text actions.
- Changed: Added an 'Edit List Data' button to the batch actions, renamed toolbar button 'Lists' to 'List Data'.
Version 8.0.2 (2045)
Fixes
- Fixed: Cleaner list item names sometimes remained truncated after sidebar resize.
- Fixed: Closing custom cleaner editor could cause crash in some cases if no author was specified.
- Fixed: Printing dialog causes ambiguous layout error.
- Fixed: Documents would not print correctly.
- Fixed: Clipboard Workspace print option was disabled.
- Fixed: Issues with custom cleaner names within custom groups.
- Fixed: TextSoap Menu palette would not properly display.
- Fixed: TextSoap Menu would not dynamically reflect changes in custom cleaners or groups.
- Fixed: Custom cleaners in search matches were not sorted.
- Fixed: Custom cleaners did not display correctly on re-launch.
- Fixed: Custom cleaner editor window position and size are now saved.
- Fixed: Batch Processor window did not accept individual text files via drag-n-drop.
- Fixed: A crash when batch processing files & folders.
- Fixed: Launching TextSoap from TextSoap Menu would not always show app menu.
- Fixed: TextSoap Menu was not automatically updated if it was launched.
Changes
- Change: Action note icon updated. Now indicates if there is a note attached.
- Change: User note for action will display as a tooltip for the note icon.
- Change: When displaying Standard group within TextSoap Menu, if there are a large number of custom cleaners, they are displayed in a sub-menu named 'Custom Cleaners'.
Version 8.0.1 (2034)
Adobe acrobat pro dc 2017. Additions
- New: File > Import. menu item to access cleaner importer.
- New: File > Export. menu item to access cleaner exporter.
- New: File > New > Custom Cleaner menu item to create a new custom cleaner.
- New: File > New > Custom Group menu item to create a new custom group.
Fixes
- Fixed: App would not always properly activate on launch (stayed in background, menu issues).
- Fixed: Clipboard Workspace did not remember its size and position.
- Fixed: Clipboard Workspace did not honor default font settings for new docs.
- Fixed: Clipboard Workspace did not honor default zoom preference.
- Fixed: New Documents were not honoring default zoom preference.
- Fixed: New Documents were not honoring default font settings for new docs (based on type of document).
- Fixed: Importing previous cleaners didn't correctly convert Ignore Case option to new Match Case for Find actions. Please re-import your cleaners to correct this issue.
- Fixed: Corrected typo for 'Capitalize with Title Case' cleaner.
- Fixed: Imported custom groups were not properly saved.
- Fixed: Custom Text Wrap action was not using the value specified.
- Fixed: Hyperlinks to Text action wipes out any hyperlink used.
- Fixed: Extract middle characters action did not use correct start position.
- Fixed: Multiple uses of 'Markdown Text' cleaner could result in stray characters.
- Fixed: Potential crash under certain conditions with 'Markdown Text' cleaner (when appending text).
- Fixed: Importing a previous database library (TS6 or TS7) would not properly pick up the custom cleaner names, leaving UUIDs instead. To fix, remove your previous custom groups and re-import them.
- Fixed: Custom cleaners were not always showing up in some cleaner lists (like TextSoap Menu prefs).
Changes Adobe animate cc 2018 18 0.
- Changed: If previous database (v6,v7) found, will prompt to import previous cleaners at launch.
- Changed: Window > 'Customize Navigator' renamed to Window > 'Custom Cleaners & Groups.' to make functionality more explicit. The name change is also visible in the Groups popup menu in the cleaner list sidebar.
Fixes
- Fixed issue which required two clicks of TextSoap Menu before previously set group would display.
- Fixed issue that could cause crash issue with some text when using Title Case action and cleaner.
Changes
- TextSoap Menu shortcuts now only activate cleaner actions.
- Supplemental windows (Regex Ref, Release Notes, Regex Tutorial) now prefer single tabbed window in macOS Sierra.
- If automatic update checking is disabled, now displays reminder if user has not checked for updates in past 90 days.
Version 8.3 (2143)
Additions
- Added new option in Appearance Preferences for a Gray Clipboard Workspace Window.
Changes
- Improved case transformation cleaners offer improved international language support. This affects Uppercase, Lowercase, Capitalized Words related cleaners.
- Title Case cleaner rewritten with to follow common suggestions, edge cases, as well as offering better international language support.
- Removed 'Add Section Marker' button from Custom Cleaner List Editor. Existing Section markers continue to work (for now), but are considered deprecated and will be completely removed in the future. This should not affect most users.
- Additional internal changes to improve stability.
Fixes
- Fixed issue that cause layout issues when inserting special characters and selecting either the ICU or POSIX property categories.
Version 8.2.1 (2129)
Fixes
- Fixed potential crash when selecting button to bring up Find options in window's Find tab.
- Fixed several syntax highlighting issues with regular expressions.
- Additional compatibility improvements for macOS Sierra.
- Updated components to improve stability.
- Editing text in custom cleaner preview popup is now disabled, but you can still select (copy) text.
- Cleaner statistics are correctly updated when about window/tab is activated.
Version 8.2 (2127)
Note
- In order to simplify Lists within custom cleaners, we are looking to remove Section Markers and instead supporting a single list within a custom cleaner. If you are using List section markers to create multiple lists, please contact support(at)unmarked.com so we can better understand your solution and look at ways to better address your needs. If you don't know what List section markers are, then you are likely not affected by this change.
Additions
- Added Insert Text category to add specific column indicators in Batch Find and Replace action.
- New in-app help file. New topics added, including several custom cleaner tutorials.
- New in-app help viewer.
- Added Copy icon to toolbar.
- New expanded help file. New tutorials added. See Help > TextSoap Help for updated docs.
- Clean clipboard contents using TextSoap Menu. Hold down option key and select TextSoap Menu icon in menubar to clean clipboard contents.
- Leave contents on the clipboard. Hold down Shift key when using TextSoap Menu to copy text from the current app, apply the specified cleaner and then leave it on the clipboard.
Fixes
- Fixed extraneous clipboard contents logging when launching.
- Fixed issue that prevented Run Automator from functioning properly.
- Fixed issue that kept text editor font preferences from working correctly.
- Fixed issues related to drawing disabled actions.
- Fixed filtering issues for actions in custom cleaner.
- Fixed issue that could cause an exception when dragging in new actions into custom cleaner workspace.
Version 8.1 (2116)
Additions
QuickClean Menu - QuickClean lets you use cmd-1 thru cmd-9 to quickly access your favorite cleaners. Select a group and cmd-1 thru cmd-9 will be mapped to the first nine cleaners within that group. Change the group and instantly change the cleaners mapped to each shortcut. See which mappings, select Text > QuickClean menu. Create custom group for complete control over cmd-1 thru cmd-9 shortcuts within TextSoap.
Search History keeps a history of your manual searches.
- Access your history in interactive finds.
- Access items in find/replace actions within custom cleaners.
Insert Special Text Characters option. Insert tabs, returns, regex metacharacters, etc.
- Now has popup menu for various categories.
- Added properties categories for ICU p{Property} and POSIX [:Property:] formats.
- Negate or Use Long Names ({Letter} vs p{L}) when specifying properties.
- Optionally capture given sequence,
- Specify count (1, 0+, 1+, N, N+, N-M) for given sequence.
Toolbar icons added to clipboard Workspace and documents. Customize cleaners/groups action moved to the top.
Fileward 1 74. Highlight Current Line option now available for text editor.
New Cleaners
- Fixup macOS name -- converts 'Mac OS X', 'OS X' to 'macOS'
- Reverse Line Order
- Remove All Spaces
- Remove All Whitespace
Changes
Preferences have been reworked.
- New Text Editor preferences controls default settings for clipboard workspace and new text documents.
- Added preference option to set showing current line on new text editors.
- Appearance pane controls color for invisibles and current line, and action color style as well as toolbar display options.
- New default document type is now in General preferences.
TextSoap Menu Shortcut Actions improvements
- Replaced popup menu with a popover.
- Added a filter option for cleaners to apply.
Custom Cleaner Editor:
- Text Fields in actions now auto-size based on the text entered.
- The delete (X) icon was removed from each action. To remove an action, select it and select Edit > Delete in menu, or press Delete key.
- User Note actions no longer have an enable button (there is no action performed for notes).
- You can now enter returns inside a User Note action.
- Find/Replace actions now have a color associated with them (purple).
- Macro List definition is now a variant of Cyan.
- New icons for custom cleaner toolbar.
Fixes
- 'Capitialize Common Tech Names' now correctly fixes up multi-word tech names like 'OS X El Capitan', 'iPhone 6s Plus' and 'MacBook Pro'.
- Current text field changes were not committed when clicking preview in custom cleaner.
Version 8.0.9 (2097)
Additions
- New: Interactive Find - text is now grayed out and matched text is highlighted in blue. Also added a 'Done' button to remove any match highlights from the text when user is finished searching.
- New: Added Clean with TextSoap 8 MyScrub Service menu item. This will apply the 'MyScrub' cleaner, which is specified in Preferences > General.
- New: Support for opening text files with unknown extensions.
New: Scripting commands for the main application. When using textsoap8Agent, it acts a go-between with the main TextSoap app, running the main in 'agent mode' (which allows it to run w/o a user interface). These three new scripting commands allow AppleScripts to directly connect to the main app and enable 'agent mode'.
These commands will allow an AppleScript to test and control the agent mode state of an app.
- enableAgentMode -- transform app to use agent mode. Command is ignored if app is already in agent mode.
- disableAgentMode -- turn off agent mode. Returns app to standard app mode if needed
- isAgentMode -- indicates whether app is currently in agent mode
- New: Helper app to inspect OS X Service definitions. Available via OS X Services preference. Click 'Launch Inspector'.
- New: Under Preferences > Advanced, there are now two buttons that allow you to launch and quit both textsoap8Agent and TextSoap Menu. For the latter, it is recommended that you use the checkbox in Preferences > TextSoap Menu to install and launch the app.
Fixes
- Fixed: Calling AppleScript pickCleaner command on textsoap8Agent could cause a crash.
- Fixed: TextSoap Menu Palette did not remember its position and size between launches of TextSoap Menu.
Changes
- Changed: OS X Service standard menu item renamed to 'Clean with TextSoap 8'.
Changed: TextSoap's launch behavior has changed. Earlier versions of TextSoap launched as an accessory and transformed itself into a standard app. This created some subtle, but noticeable inconsistencies between a normal app launch and TextSoap. Starting with TextSoap 8.0.9, TextSoap will launch as a standard app and transform itself into an accessory when asked to launch in agent mode (by TextSoap Menu, textsoap8Agent, etc). This provides a more standard user experience launching the app when opening files, clicking on the icon, etc.
When TextSoap is launched in agent mode (to support TextSoap Menu, textsoap8Agent, etc), it may briefly show up in the dock and then disappear. This is expected. To minimize this display, when TextSoap is launched in agent mode, it will stick around in the background for a while, to avoid a re-launch. If TextSoap is already running as a standard app, TextSoap Menu and textsoap8Agent will leave it alone.
Known Issues
- Using TextSoap's' OS X Service actions within TextSoap app can cause OS X Services action to time out and not work. In general, I recommend using TextSoap Menu over OS X Services. TextSoap Menu provides more customization options, is richer and more dynamic (allowing access to all cleaners, allows changing groups, etc) and avoids many of the issues with OS X Services.
Version 8.0.8 (2085)
Fixes
- Fixed: 'If Text Matches' action does not correctly process regex options.
- Fixed: Markdown cleaner was treating some lines with colons as meta-data, creating incorrect results. (#581)
- Fixed: Choosing color from Color Panel in custom group editor for a group item did not work if that item was not selected. Now, when the color panel is brought up, the group item for that color well will be selected (to receive color changes).
- Fixed: Scripting definition changes believed to be the causing issues with some integration options. Reverted back to original 8.0.6 definitions.
Version 8.0.7 (2076)
Fixes
- Fixed: Clicking 'Remind Me Later' for updates did not postpone next check as expected.
- Fixed: Default automatic check update interval was too low.
- Fixed: New Doc Word Count preference setting was not being honored for documents.
- Fixed: Clipboard Workspace would reload (overwriting existing contents) if app was hidden and then re-activated by clicking on icon in dock.
- Fixed: Clicking 'Credits' tab in About window might cause a crash, or not present credits.
Version 8.0.6 (2070)
Fixes
- Fixed: Framework was causing regular crashes.
Version 8.0.5 (2067)
Additions
- New: Clipboard Workspace show ruler, show lines numbers, show word count, show invisibles settings are now 'sticky' between launches.
- New: Text in Clipboard Workspace can now be saved to disk.
- New: Contents in Clipboard Workspace can now be printed as a text document.
Fixes
- Fixed: Updates to custom cleaners & groups did not always properly propogate. Changes should be more robust in propogating into the various parts of the app or helpers.
- Fixed: Labels and separators weren't stripped out when requesting cleaners within a group via AppleScript (asTokens=NO).
- Fixed: Cleaner list search finds any matches in the current group and then also any matches in the 'Library' group. However, it could create duplicate entries for custom cleaners if they were in the current group (one for the match in the group, one for the match in the library). This is clearly demonstrated with the 'Standard' group which automatically includes custom cleaners.
- Fixed: When creating a batch find & replace action, a group named '-ALL-' was created by default. Group specifying all items is now called 'All'. Both '-ALL-' and 'All Items' will automatically be converted to 'All'.
- Fixed: Custom cleaner editor would not expand beyond a certain width on some monitors. Limited access to full screen and/or created void areas.
Changes
- Changed: #SCRUB2 cleaner, a custom remnant from long ago, has been removed. If you used this, you can create a custom cleaner as an alternative.
- Changed: About TextSoap window updated.
Version 8.0.4 (2060)
Additions
- New: Added Customize button to all editor windows (lower right).
- New: Clipboard Workspace supports converting between plain and rich text.
Fixes
- Fixed: In some situations, the Library group would not show all custom cleaners.
- Fixed: Avoid potential issue with batch find action layout.
- Fixed: $l (lowercase-L) was not colorized as a token in regex replacement strings.
- Fixed: (Regression in 2054) Selecting custom groups in cleaner list leaves Library group active.
- Fixed: Converting between plain and rich text documents did not adhere to warning preference.
- Fixed: Capitalize with Title Case (and related actions) would delete some separators (i.e. colons).
Version 8.0.3 (2052)
Additions
- New: Added support to drag cleaner and library files onto TextSoap icon (in dock) to import them. Will bring up the importer window and add the files.
- New: If clipboard workspace window was closed, clicking on app icon will re-load the clipboard workspace.
Fixes
- Fixed: Regular expressions could find a zero length match, which caused extra, and incorrect, replace actions.
- Fixed: Regular expressions did not always recognize when options (multiline, dot matches) were set.
- Fixed: New documents do not remember last size set.
- Fixed: Quitting with open custom cleaner editor could lose changes made.
- Fixed: Launching TextSoap app via 3rd party launcher after it was used by TextSoap Menu now works correctly.
Changes
- Added: New Advanced preference option added to enable larger toolbar buttons. When enabled (relaunch required), app will use regular size action buttons and revert to the older style ruler (with slightly bigger icons).
- Changed: Add note to TextSoap Menu preferences to remind users that TextSoap Menu is required to for global keyboard shortcuts to be enabled.
- Changed: Keyboard shortcut for accessing Custom Cleaners and Groups is now cmd-0.
- Changed: TextSoap will now restart TextSoap Menu at launch if it was 'installed' but the user quit it manually.
- Changed: Find fields can be grown with new slider on Find & other actions.
- Changed: Replace fields will automatically grow (2x) if Find field is made larger.
- Changed: Added resizing sliders to Text Extract, Note, Insert Text actions.
- Changed: Added an 'Edit List Data' button to the batch actions, renamed toolbar button 'Lists' to 'List Data'.
Version 8.0.2 (2045)
Fixes
- Fixed: Cleaner list item names sometimes remained truncated after sidebar resize.
- Fixed: Closing custom cleaner editor could cause crash in some cases if no author was specified.
- Fixed: Printing dialog causes ambiguous layout error.
- Fixed: Documents would not print correctly.
- Fixed: Clipboard Workspace print option was disabled.
- Fixed: Issues with custom cleaner names within custom groups.
- Fixed: TextSoap Menu palette would not properly display.
- Fixed: TextSoap Menu would not dynamically reflect changes in custom cleaners or groups.
- Fixed: Custom cleaners in search matches were not sorted.
- Fixed: Custom cleaners did not display correctly on re-launch.
- Fixed: Custom cleaner editor window position and size are now saved.
- Fixed: Batch Processor window did not accept individual text files via drag-n-drop.
- Fixed: A crash when batch processing files & folders.
- Fixed: Launching TextSoap from TextSoap Menu would not always show app menu.
- Fixed: TextSoap Menu was not automatically updated if it was launched.
Changes
- Change: Action note icon updated. Now indicates if there is a note attached.
- Change: User note for action will display as a tooltip for the note icon.
- Change: When displaying Standard group within TextSoap Menu, if there are a large number of custom cleaners, they are displayed in a sub-menu named 'Custom Cleaners'.
Version 8.0.1 (2034)
Adobe acrobat pro dc 2017. Additions
- New: File > Import. menu item to access cleaner importer.
- New: File > Export. menu item to access cleaner exporter.
- New: File > New > Custom Cleaner menu item to create a new custom cleaner.
- New: File > New > Custom Group menu item to create a new custom group.
Fixes
- Fixed: App would not always properly activate on launch (stayed in background, menu issues).
- Fixed: Clipboard Workspace did not remember its size and position.
- Fixed: Clipboard Workspace did not honor default font settings for new docs.
- Fixed: Clipboard Workspace did not honor default zoom preference.
- Fixed: New Documents were not honoring default zoom preference.
- Fixed: New Documents were not honoring default font settings for new docs (based on type of document).
- Fixed: Importing previous cleaners didn't correctly convert Ignore Case option to new Match Case for Find actions. Please re-import your cleaners to correct this issue.
- Fixed: Corrected typo for 'Capitalize with Title Case' cleaner.
- Fixed: Imported custom groups were not properly saved.
- Fixed: Custom Text Wrap action was not using the value specified.
- Fixed: Hyperlinks to Text action wipes out any hyperlink used.
- Fixed: Extract middle characters action did not use correct start position.
- Fixed: Multiple uses of 'Markdown Text' cleaner could result in stray characters.
- Fixed: Potential crash under certain conditions with 'Markdown Text' cleaner (when appending text).
- Fixed: Importing a previous database library (TS6 or TS7) would not properly pick up the custom cleaner names, leaving UUIDs instead. To fix, remove your previous custom groups and re-import them.
- Fixed: Custom cleaners were not always showing up in some cleaner lists (like TextSoap Menu prefs).
Changes Adobe animate cc 2018 18 0.
- Changed: If previous database (v6,v7) found, will prompt to import previous cleaners at launch.
- Changed: Window > 'Customize Navigator' renamed to Window > 'Custom Cleaners & Groups.' to make functionality more explicit. The name change is also visible in the Groups popup menu in the cleaner list sidebar.
Steps to re-import older TextSoap database
If you need to re-import your older TextSoap 6 or 7 database, here is how you can do it.
- File > Import.
- Select your textsoap7.textsoapdata file (it's at ~/Library/Application Support/TextSoap/) and click 'Review Items'
- You see all the cleaners and groups in the file, select them All
- Check the 'Replace existing without prompts' options (otherwise it will ask to replace every cleaner/group).
- Click 'Import Selected'
Version 8.0 (2020)
General
- New: New content-focused interface. Stripped away a lot of the interface chrome.
- New: Regular expression syntax coloring and validation to quickly spot common errors.
- New: Customize Navigator provides simple access to user's' customized data
- New: Easily import/export multiple cleaners with libraries.
- New: Many of the customer requested cleaners added.
- Improved: Rewritten custom cleaner editor.
- Improved: Custom group editor.
- Improved: TextSoap Menu options now integreated into preferences.
Text Editor
- New: Option to show line numbers.
- New: Option to toggle whether text wraps to window.
- New: Supports opening/saving these file types:
- Microsoft Word 97 (.doc) Document
- Microsoft Word 2003 (.xml) Document
- Microsoft Word 2007 (.docx) Document
- OpenDocument Text (.odt) Document
- Note: Conversion of these file types is limited to same level of functionality supported by TextEdit.
- New: Commands to move selected line(s) up & down (#513).
- New: Command to select line/paragraph at cursor (#516).
- New: Paste Over Command (#484).
- New: Live preview merges shows interactive matches, including regular expressions.
- New: Captured group results are now available for each match (when using regular expressions).
- Improved: Word count handling, esp. in very large documents.
- Improved: Find/Cleaner palette sidebar is now resizable.
Sidebar (Assistant)
- Cleaner List
- Filter applies to current group and also appends any matches from library.
- Group names are now sorted within each category (built-in, user).
- Library cleaners are now sorted.
- Find
- Live Preview shows you matches as you type (integrated Regex Lab from TextSoap 7).
- Syntax highlighter for regular expressions and replacement strings.
- Highlights common errors found in expressions (such as unmatched parentheses, incomplete properties, character classes).
- Option to display captured values of expression matches.
Customize Navigator
- New: Navigator provides simple point access to:
- New: Import/Export of Cleaners & Libraries
- Libraries
- New: Save multiple cleaners into a single Library file.
- New: Import multiple cleaners from a single Library file.
- Edit custom cleaners
- Edit custom groups
New Cleaners Added
- Normalize Dates to MM-DD-YYYY Format cleaner.
- Normalize Dates to DD-MM-YYYY Format cleaner.
- Normalize Dates to YYYY-MM-DD Format cleaner.
- Capitalize Lines : Useful for song lyrics or various lists, takes each line and capitalizes it.
- Fix Jammed Words : attempts to fix up words that are jammed together. It uses the spell check to look at words marked as misspelled that could be fixed by simply adding a space between them.
- Reverse All Characters
- Reverse Word Order
- Reverse Characters in Each Word
- Word Count - Notification : uses OS X User Notification to display word count of selected text.
- Word Count - Remove All Notifications : removes all TextSoap word count notifications.
- Strip Diacritic Marks : strip off any diacritic marks from text. ü becomes u, é becomes e, etc.
- Make Unicode Names : convert unicode characters to their unicode name. 😀 becomes N{GRINNING FACE}.
Misc Changes
- New Dates Group with date related cleaners.
- Capitalize Sentences no longer converts sentence to lowercase first. Use Capitalize Sentences (Alt) to continue this behavior.
- Added additional capitalized tech names.
New and Improved Actions
- Actions are now color-coded.
- Find and Replace fields are now syntax colored for special characters and regular expressions.
- A new Copy Text to Clipboard action
- Capitalize Common Tech Names updated with additional names.
New Custom Cleaner Editor
- A complete new user experience.
- Each cleaner receives its own window to work in.
- Larger text fields make everything easier to read.
- Syntax coloring for regular expressions.
- Actions are now truly hierarchical, allowing conditionals and others to embedded actions.
- Embedded actions are included when you drag-n-drop top-level action.
- Named Group action allows for arbitrary embedding of actions.
- When you disable a Named Group, all the actions embedded within are disabled.
- Color-coded actions based on category of action.
- Many titles now show additional information about settings defined the action.
- Titles can be customized by user.
- Macros offer a simpler approach to repeated actions (vs. subroutines).
- Action list can now be categorized into: All, Actions, Cleaners.
- New batch list editor makes it easier to work with large lists of data.
- Supports copy/paste using tab-delimited format
- Supports find and replace within the list
- List editor supports option to find using only specified columns
Do you need to extract the right data from a list of PDF files but right now you're stuck?
If yes, you've come to the right place.
Note: This article treats PDF documents that are machine-readable. If that's not your case, I recommend you use Adobe Acrobat Pro that will do it automatically for you. Then, come back here.
In this article, you will learn:
- How to extract the content of a PDF file in R (two techniques)
- How to clean the raw document so that you can isolate the data you want
After explaining the tools I'm using, I will show you a couple examples so that you can easily replicate it on your problem.
Why PDF files?
When I started to work as a freelance data scientist, I did several jobs consisting in only extracting data from PDF files.
My clients usually had two options: Either do it manually (or hire someone to do it), or try to find a way to automate it.
The first way being really tedious and costly when the number of files increases, they turned to the second solution for which I helped them.
For example, a client had thousands of invoices that all had the same structure and wanted to get important data from it:
- the number of sold items,
- the profits made at each transaction,
- the data from his customers
Having everything in PDF files isn't handy at all. Instead, he wanted a clean spreadsheet where he could easily find who bought what and when and make calculations from it.
Another classical example is when you want to do data analysis from reports or official documents. You will usually find those saved under PDF files rather than freely accessible on webpages.
Similarly, I needed to extract thousands of speeches made at the U.N. General Assembly.
So, how do you even get started?
Two techniques to extract raw text from PDF files
Use pdftools::pdf_text
The first technique requires you to install the pdftools
package from CRAN:
A quick glance at the documentation will show you the few functions of the package, the most important of which being pdf_text
.
For this article, I will use an official record from the UN that you can find on this link
This function will directly import the raw text in a character vector with spaces to show the white space and n
to show the line breaks.
Having a full page in one element of a vector is not the most practical. Using strsplit
will help you separate lines from each other:
If you want to know more about the functions of the pdftools
Recortes para mac. package, I recommend you read Introducing pdftools - A fast and portable PDF extractor, written by the author himself.
Use the tm
package
tm
is the go-to package when it comes to doing text mining/analysis in R.
For our problem, it will help us import a PDF document in R while keeping its structure intact. Plus, it makes it ready for any text analysis you want to do later.
The readPDF
function from the tm
package doesn't actually read a PDF file like pdf_text
from the previous example we did. Instead, it will help you create your own function, the benefit of it being that you can choose whatever PDF extracting engine you want.
By default, it will use xpdf
, available at http://www.xpdfreader.com/download.html
You have to:
- Download the archive from the website (under the Xpdf tools section).
- Unzip it.
- Make sure it is in the PATH of your computer.
Then, you can create your PDF extracting function:
The control argument enables you to set up parameters as you would write them in the command line. Think of the above function as writing xpdf -layout
in the shell.
Then, you're ready to import the PDF document:
Notice the difference with the excerpt from the first method. New empty lines appeared, corresponding more closely to the document. This can help to identify where the header stops in this case.
Another difference is how pages are managed. With the second method, you get the whole text at once, with page breaks symbolized with the f
symbol. With the first method, you simply had a list where 1 page = 1 element.
This is the first line of the second page, with an added f
in front of it.
Extract the right information
Naturally, you don't want to stop there. Once you have the PDF document in R, you want to extract the actual pieces of text that interest you, and get rid of the rest.
That's what this part is about.
I will use a few common tools for string manipulation in R:
- The
grep
andgrepl
functions. - Base string manipulation functions (such as
str_split
). - The
stringr
package.
My goal is to extract all the speeches from the speakers of the document we've worked on so far (this one), but I don't care about the speeches from the president.
Here are the steps I will follow:
- Clean the headers and footers on all pages.
- Get the two columns together.
- Find the rows of the speakers.
- Extract the correct rows.
I will use regular expressions (regex) regularly in the code. If you have absolute ly no knowledge of it, I recommend you follow a tutorial about it, because it is essential as soon as you start managing text data.
Textsoap 8 0 9 – Automate Tedious Text Document Cleaning Pad
If you have some basic knowledge, that should be enough. I'm not a big expert either.
1. Clean the headers and footers on all pages.
Notice how each page contains text at the top and at the bottom that will interfere with our extraction.
Now, our document is a bit cleaner. Next step is to do something about the two columns, which is super annoying.
2. Get the two columns together.
My idea (there might be better ones) is to use the str_split
function to split the rows every time two spaces appear (i.e. it's not a normal sentence).
Then, because sometimes there are multiple spaces together at the beginning of the rows, I detect where there is text, where there is not, and I pick the elements with text.
It's a bit arbitrary, you'll see, but it works:
Now, let's put it together, thanks to the marker page
that we added earlier:
Textsoap 8 0 9 – Automate Tedious Text Document Cleaning Kit
Now that we have a nice clean vector of all text lines in the right order, we can start extracting the speeches.
3. Find the rows of the speakers
This is where you must look into the document to spot some patterns that would help us detect where the speeches start and end.
It's actually fairly easy since all speakers are introduced with 'Mr.' or 'Mrs.'. And the president is always called 'The President:' or 'The Acting President:'
Let's get these rows:
Now it's easy. We know where the speeches start, and they always end with someone else speaking (whether another speaker or the president).
Finally, we could get all the speeches in a list. We can now analyze what each country representative talk about, how this evolves over more documents, over the years, depending on the topic discussed, etc.
Now, one could argue that for one document, it would be easier to extract it in a semi-manually way (by specifying the row numbers manually, for example). This is true.
But the idea here is to replicate this same process over hundreds, or even thousands, of such documents.
This is where the fun begins, as they will all have their specificities, the format might evolve, sometimes stuff is misspelled, etc. In fact, even with this example, the extraction is not perfect! You can try to improve it if you want.