Full text search using PowerShell, Everything, and Lucene

26803605966_33613e76a6_m
Searching for files is something everyone does on a very regular basis. While Windows is consistently changing the way this is done with every new operating system, the built-in functionality is still far from being sufficient. Therefore, I’m always looking for methods on how to improve this (you can also find several blog posts in relation to file searches around here). In regards to searching for files based on file names or paths, I’m pretty happy with the performance of Everything. If it is about searching for files based on their content (aka full-text search), there is still room for improvement in my opinion.
Recently I’ve been watching the session recordings from the PowerShell Conference Europe 2016 (I can highly recommend anyone that is interested in PowerShell to watch those).

In one of the videos, Bruce Payette talks about how to use Lucene.net through PowerShell, subsequently Doug Finke has also picked up the topic and wrapped all of it into a GUI. Lucene is basically the gold standard when it comes to full-text search.

Naturally I also wanted to see how Lucene could help to improve the Windows search capabilities further. My goal was to put it to a test and potentially further improve the implementations in order to be able to index and query text based files on my entire drive.
Using Bruce’s and Doug’s implementation, the search worked almost instantaneous even against a huge volume of files to be indexed. Only the creation of the index takes quite some time since the enumeration of the files to be indexed is based on either Get-ChildItem or System.IO.Directory.EnumerateFiles.

I’ve refactored the implementation into a new module (SearchLucene.psm1) where I based the file enumeration on the Everything command-line interface and made several additional changes. As a result, the creation of the index for my c: drive (SSD) for all .txt, .ps1, and .psm1 files takes now less than a minute.
Usage:
Prerequesites:

  • SearchLucene.psm1 module installed (The example considers, that you have put the downloaded files into a folder called ‘SearchLucene’, that resides within one of $env:PSModulePath folders
  • Everything command-line interface installed (Requires the GUI version to be installed)
Import-Module SearchLucene
#Create the index on disk within the $env:TEMP\test folder. And index all ps1, and psm1 files for the c: drive
<#
default values for each parameter are:
- DirectoryToBeIndexed = 'c:\',
- Include = @('*.ps1','*.psm1','*.txt*')
- IndexDirectory = "$env:temp\luceneIndex"
#>
New-LuceneIndex -DirectoryToBeIndexed 'c:\' -Include @('*.ps1','*.psm1') -IndexDirectory "$env:TEMP\test"

#Search all indexed .ps1 files for the word 'kw2016'
Find-FileLucene 'kw2016' -Include '.ps1'
#outputs a list of file paths that include the word test

#Search all indexed .ps1 files for the word 'test' and output the matching line and line number for each match found within the file
Find-FileLucene 'test' -Include '.ps1' -Detailed

#Same as above but output the result in a table grouped by folder
Find-FileLucene 'kw2016' -Include '.ps1' -Detailed | 
	Sort-Object {Split-Path $_.Path} | 
	Format-Table -GroupBy {Split-Path $_.Path}

SearchLucene
This is just a small example on how Lucene.net can be used to implement full-text search. The solution could be further improved by including other file types, re-creating or updating the index based on a schedule or triggered by file modifications.

shareThoughts


Photo Credit: Cho Shane via Compfight cc

Search file content by keyword using Everything + PowerShell + GUI

tree
Even with Windows 10 MS still didn’t manage to include a proper in-built file search functionality. If it is about searching for files I definitely prefer the excellent Everything search engine (see also my post on a PowerShell wrapper around Everything commandline) .But quite frequently I also need to search for keywords/pattern within files. PowerShell’s Get-ChildItem and Select-String can certainly do this together:

#search through all .ps(m)1 files for instances of the word 'mySearchString'
$path = 'c:\scripts\powershell'
Get-ChildItem $path -Include ("*.ps1","*.psm1")) -Recurse |
     Select-String 'mySearchString' | select Path, Line, LineNumber

While this does the job it doesn’t follow my preferred workflow and is also not very quick when running it against a large set of files. I would prefer to have the ability to search and drill down a list of files within a Graphical User Interface just like Everything and then search through the filtered list of files using keyword(s)/pattern(s) and get back the search results within a reasonable time-frame.
Say hello to “File Searcher” (I didn’t spend any time thinking about a catchy name):
FileSearcher
The three text boxes at the top of the UI can be used to:

  1. Search for files using Everything command-line (es.exe)
  2. Search within the list of files for content by keyword (using a replacement for Select-String more on that below)
  3. Filter the results by keywords (across all columns). This can be done against the list of files and against the list of results (Path, Line, LineNumber)

Let’s first look at two use cases.
1. Assuming we want to search for some PowerShell files starting with “Posh-” across the whole hard drive:

  • After importing the module (Import-Module $path\FileSearcher.psm1) files can be searched using the textbox at the top of the window
  • Using ‘posh-*.ps1’ and hitting Enter as the search term will get us what we want
  • On my machine this results into a quite long list. I can scroll through the list to see whether I really want to search through all those files or further drill it down either by refining the initial search or using the ‘filter results’ textbox.
  • For the example’s sake let’s assume I’d like to filter the result list to show only those entries that contain the word ‘string’ (within the full path)
  • Now I would like to search those files for instances of the word ‘select’. Entering the keyword into the 3rd text-box filters the results as I type.
  • The result is a list of ‘Path, Line, LineNumber’ results that can be further filtered by using the ‘filter results’ text-box again
  • Double-clicking one of the entries will open the file in notepad++ (of course only if this is installed) putting the cursor on the respective line. (This works only of notepad++ is not already open)

FilesearchExample1
2. A second use case are situations where I want to “pre-populate” the list of files via command-line instead of using the GUI. Here is how to do that:

  • Pipe a list of files into the FileSearcher function:
    Import-Module $path\FileSearcher.psm1
    $path = 'c:\scripts\powershell'
    Get-ChildItem $path -Include ("*.ps1","*.psm1")) -Recurse | FileSearcher
    
  • Use the UI to further refine and/or search the list of files for contents by keyword

The content search functionality is realized through a custom cmdlet (Search-FileContent) implemented in F# based on the solution (I have only changed the original solution to accept an array of strings for the full paths) provided in this blog post. This speeds up the performance significantly as compared to Select-String through the usage of parallel asynchronous tasks.
The UI also support some options:

  • For file search (through Everything) the “no Recurse” option is applied if the first search term is a path (using the parents:DEPTH option which requires an up-to-date Everything version) e.g. the search term ‘c:\scripts .ps1’ with the option enabled would only search for .ps1 files within the c:\scripts directory.
  • The content search offers options similar to the Select-String switches to treat the keyword not as a regular expression (SimpleMatch) or/and do a case sensitive search.

Dependencies:

  1. Everything command line version (requires the UI version to be installed,too) installed to ‘C:\Program Files*\es\es.exe’
  2. The Search-FileContent cmdlet is implemented via the SearchFileContent.dll which can be downloaded from my GitHub repository and needs to reside in the same folder as the FileSearcher.psm1 file.
  3. Because the Search-FileContent cmdlet is written in F# it requires the FSharp.Core assembly to be present which can be downloaded and installed via the following PowerShell code:
    $webclient = New-Object Net.WebClient
    $url = 'http://download.microsoft.com/download/E/A/3/EA38D9B8-E00F-433F-AAB5-9CDA28BA5E7D/FSharp_Bundle.exe'
    $webclient.DownloadFile($url, "$pwd\FSharp_Bundle.exe")
    .\FSharp_Bundle.exe /install /quiet
    
  4. The ability to open files from the file search content results via double-click with the cursor on the respective line requires Notepad++

The FileSearcher module itself can be also downloaded from my GitHub repository.
Please use the comment function if you have any feedback or suggestions on how to improve the tool.

shareThoughts


Photo Credit: Robb North via Compfight cc

Using Everything search command line (es.exe) via PowerShell

tree8

Everything by voidtools is a great search utility for Windows. It returns almost instantaneous results for file and folder searches by utilizing the Master File Table(s). There is also a command-line version of everything (es.exe) and this post is about a wrapper I wrote in PowerShell around es.exe.
The full version including full help (which I’m skipping here to keep it shorter) can be downloaded from my GitHub repository

function Get-ESSearchResult {
    [CmdletBinding()]
    [Alias("search")]
    Param
    (
        #searchterm
        [Parameter(Mandatory=$true, Position=0)]
        $SearchTerm,
        #openitem
        [switch]$OpenItem,
        [switch]$CopyFullPath,
        [switch]$OpenFolder,
        [switch]$AsObject
    )
    $esPath = 'C:\Program Files*\es\es.exe'
    if (!(Test-Path (Resolve-Path $esPath).Path)){
        Write-Warning "Everything commandline es.exe could not be found on the system please download and install via http://www.voidtools.com/es.zip"
        exit
    }
	$result = & (Resolve-Path $esPath).Path $SearchTerm
    if($result.Count -gt 1){
        $result = $result | Out-GridView -PassThru
    }
    foreach($record in $result){
        switch ($PSBoundParameters){
	        { $_.ContainsKey("CopyFullPath") } { $record | clip }
	        { $_.ContainsKey("OpenItem") }     { if (Test-Path $record -PathType Leaf) {  & "$record" } }
	        { $_.ContainsKey("OpenFolder") }   {  & "explorer.exe" /select,"$(Split-Path $record)" }
	        { $_.ContainsKey("AsObject") }     { $record | Get-ItemProperty }
	        default                            { $record | Get-ItemProperty | 
                                                    select Name,DirectoryName,@{Name="Size";Expression={$_.Length | Get-FileSize }},LastWriteTime
                                               }
        }
    }
}

The function contains a call to “Get-FileSize” a helper filter in order to return the file size of the selected items in proper format:

filter Get-FileSize {
	"{0:N2} {1}" -f $(
	if ($_ -lt 1kb) { $_, 'Bytes' }
	elseif ($_ -lt 1mb) { ($_/1kb), 'KB' }
	elseif ($_ -lt 1gb) { ($_/1mb), 'MB' }
	elseif ($_ -lt 1tb) { ($_/1gb), 'GB' }
	elseif ($_ -lt 1pb) { ($_/1tb), 'TB' }
	else { ($_/1pb), 'PB' }
	)
}

How does it work? The Get-ESSearchResult function (alias search) searches for all items containing the search term (SearchTerm parameter is the only mandatory parameter). The search results (if multiple) are piped to Out-GridView with the -PassThru option enabled so that the result can be seen in GUI and one or multiple items from within the search results can be selected. By default (no switches turned on) the selected item(s) are converted to FileSystemInfo objects and their Name, DirectoryName, FileSize and LastModifiedDate are output. The resulting objects can be used for further processing (copying, deleting….).

The switch Parameters add the following features and can be used in any combination:

  • -OpenItem : Invoke the selected item(s) (only applies to files not folders)
  • -CopyFullPath : Copy the full Path of the selected item to the clipboard
  • -OpenFolder : Opens the folder(s) that contain(s) the selected item(s) in windows explorer
  • -AsObject : Similar to default output but the full FileSystemInfo objects related to the selected item(s) are output

I hope that the function can also help some of you to find your files and folders faster from the commandline.
I’ve written another blog post in relation to Everything and PowerShell:
Search fiel content by keyword using Everyting + PowerShell + GUI

shareThoughts


photo credit: 983 Foggy Day via photopin (license)