Full text search using PowerShell, Everything, and Lucene

26803605966_33613e76a6_m
Searching for files is something everyone does on a very regular basis. While Windows is consistently changing the way this is done with every new operating system, the built-in functionality is still far from being sufficient. Therefore, I’m always looking for methods on how to improve this (you can also find several blog posts in relation to file searches around here). In regards to searching for files based on file names or paths, I’m pretty happy with the performance of Everything. If it is about searching for files based on their content (aka full-text search), there is still room for improvement in my opinion.
Recently I’ve been watching the session recordings from the PowerShell Conference Europe 2016 (I can highly recommend anyone that is interested in PowerShell to watch those).

In one of the videos, Bruce Payette talks about how to use Lucene.net through PowerShell, subsequently Doug Finke has also picked up the topic and wrapped all of it into a GUI. Lucene is basically the gold standard when it comes to full-text search.

Naturally I also wanted to see how Lucene could help to improve the Windows search capabilities further. My goal was to put it to a test and potentially further improve the implementations in order to be able to index and query text based files on my entire drive.
Using Bruce’s and Doug’s implementation, the search worked almost instantaneous even against a huge volume of files to be indexed. Only the creation of the index takes quite some time since the enumeration of the files to be indexed is based on either Get-ChildItem or System.IO.Directory.EnumerateFiles.

I’ve refactored the implementation into a new module (SearchLucene.psm1) where I based the file enumeration on the Everything command-line interface and made several additional changes. As a result, the creation of the index for my c: drive (SSD) for all .txt, .ps1, and .psm1 files takes now less than a minute.
Usage:
Prerequesites:

  • SearchLucene.psm1 module installed (The example considers, that you have put the downloaded files into a folder called ‘SearchLucene’, that resides within one of $env:PSModulePath folders
  • Everything command-line interface installed (Requires the GUI version to be installed)
Import-Module SearchLucene
#Create the index on disk within the $env:TEMP\test folder. And index all ps1, and psm1 files for the c: drive
<#
default values for each parameter are:
- DirectoryToBeIndexed = 'c:\',
- Include = @('*.ps1','*.psm1','*.txt*')
- IndexDirectory = "$env:temp\luceneIndex"
#>
New-LuceneIndex -DirectoryToBeIndexed 'c:\' -Include @('*.ps1','*.psm1') -IndexDirectory "$env:TEMP\test"

#Search all indexed .ps1 files for the word 'kw2016'
Find-FileLucene 'kw2016' -Include '.ps1'
#outputs a list of file paths that include the word test

#Search all indexed .ps1 files for the word 'test' and output the matching line and line number for each match found within the file
Find-FileLucene 'test' -Include '.ps1' -Detailed

#Same as above but output the result in a table grouped by folder
Find-FileLucene 'kw2016' -Include '.ps1' -Detailed | 
	Sort-Object {Split-Path $_.Path} | 
	Format-Table -GroupBy {Split-Path $_.Path}

SearchLucene
This is just a small example on how Lucene.net can be used to implement full-text search. The solution could be further improved by including other file types, re-creating or updating the index based on a schedule or triggered by file modifications.

shareThoughts


Photo Credit: Cho Shane via Compfight cc

Get help for Windows built-in command-line tools with PowerShell

26709891580_b8657b36d2_m
One of the reasons I like PowerShell is its built-in help system (here is a nice post in case you don’t know how to use PowerShell’s built-in help). E.g.:

Get-Help Get-Command
Get-Help Get-Command -Examples
Get-Help Get-Command -Parameter Name

In fact, once you get comfortable using PowerShell help aka Get-Help, you start missing similar built-in documentation for other tools/scripting languages. Wouldn’t it be nice if one could use Get-He.lp for Windows command-line tools?:

Get-Help chkdsk
Get-Help chkdsk -Examples
Get-Help chkdsk -Paramater c

GetLegacyHelp
Say ‘Hello’ to Get-LegacyHelp! With Get-LegacyHelp you can retrieve help for built-in windows command line tools in a similar way as Get-Help works against PowerShell cmdlets.
How does it work? Importing the module (Get-LegacyHelp.psm1) and running Get-LegacyHelp (alias glh) the first time will perform the following steps:

  • Download the Windows command line reference help file WinCmdRef.chm
  • Download and extract the HTML Agility pack dll
  • Decomple the .chm into separate .html files using hh.exe
  • Rename the html files and extract the information (using HTML Agility Pack) into PSObject format
  • Export the entire object to disk using Export-CliXml

Afterwards (and for any subsequent invocation):

  • The XML is imported…
  • Relevant information filtered..
  • and the data is displayed in a similar format to the Get-Help output

Get-LegacyHelp supports the -Parameter, -Full, and -Examples parameters:
GetLegacyHelpExamples

GetLegacyHelpParameter

The Module can be downloaded from my GitHub repository. It’s not perfect since the information is not consistently structured throughout the .chm file. If you want to improve it, please feel free to fork and share.

shareThoughts


Photo Credit: Bruno Zaffoni via Compfight cc

Reporting against Pester test results

26464648144_721725d757_m
Pester is (for very good reasons) getting more and more popular. If you don’t know about Pester I would highly recommend you to start using it. Here are some good resources to learn about the framework:

In this post, I assume that you have already some previous experience using Pester. Most of the articles and videos about Pester I’ve seen so far, do not go into much details about reporting on test results from Pester. Let’s first see what the result could look like:

ReportUnitScreen
The screenshot above shows the output from ReportUnit, which can take the Pester NUnit XML output and turn it into a very nice HTML report.
Ok, having seen what could be done, we take a step back and see what is possible using the built-in Pester capabilities. Let’s create some dummy functions and tests first:

$tempFolder = New-Item "$env:Temp\PesterTest" -ItemType Directory -force
foreach ($num in (1..10)){
    $functionTemplate =  @"
function Test$num {
    $num
}
"@ | Set-Content -Path (Join-Path $tempFolder "Test$num.ps1")
    $testTemplate = @"
`$here = Split-Path -Parent `$MyInvocation.MyCommand.Path
`$sut = (Split-Path -Leaf `$MyInvocation.MyCommand.Path) -replace '\.Tests\.', '.'
. "`$here\`$sut"
Describe "Test$num" {
    It "Should output $num" {
        $(if ($Num -eq 8){
        "Test$num | Should Be 9"
        }else{
        "Test$num | Should Be $num"
        })
    }
}
"@ | Set-Content -Path (Join-Path $tempFolder "Test$num.Tests.ps1")
}

With a few liens of code we created a folder (PesterTest) within the temp directory that contains 10 script files (Test1 – Test10.ps1) including a simple function that outputs a number in correspondence to the script number. We also created a very basic test against each script file which tests whether the function’s output is correct. For good measure I’ve also added a bug into the Test8.ps1 script.
Running Invoke-Pester against the folder without any additional arguments results in the default console output for Pester:

PesterConsoleOutput
While this looks nice, it’s not good enough if you want to run this unattended/automated or if you have a very long list of tests in your test-suite. Using the ‘-PassThru’ switch Parameter will make Pester return a rich object containing the detailed test results and also a lot of contextual information (error message, environment, stacktrace….) in addition to the console output:

$testResults = Invoke-Pester -PassThru
#display overall test-suite results
$testResults
#display specific tests within test suite
$testResults.TestResult

PesterObject
Pester can even do better, using the ‘OutputFile’ and ‘OutputFormat’ the result is turned into an XML in NUnit compatible format. The .xml file can be imported into tools like TeamCity in order to view test results in a human readable way. For people without access to full-fledged development tools, there is ReportUnit an open source command line tool that automatically transforms the XML into a nice HTML report (see screenshot at the top of the post). Let’s use PowerShell to download and extract ReportUnit.exe and run it against an output file generated by Pester:

#run the test-suite and generate the NUnit output file
Push-Location $tempFolder
Invoke-Pester -OutputFile report.xml -OutputFormat NUnitXml

#download and extract ReportUnit.exe
$url = 'http://relevantcodes.com/Tools/ReportUnit/reportunit-1.2.zip'
$fullPath = Join-Path $tempFolder $url.Split("/")[-1]
(New-Object Net.WebClient).DownloadFile($url,$fullPath)
(New-Object -ComObject Shell.Application).Namespace($tempFolder.FullName).CopyHere((New-Object -ComObject Shell.Application).Namespace($fullPath).Items(),16)
del $fullPath

#run reportunit against report.xml and display result in browser
& .\reportunit.exe report.xml
ii report.html

Happy testing!

shareThoughts


Photo Credit: Photosightfaces via Compfight cc

PowerShell tricks – Open a dialog as topmost window

26738830652_745071e136_m
Windows.Forms contains provides easy access to several built-in dialogs (see MSDN: Dialog-Box Controls and Components). Here is an usage example to show a “FolderBrowse” dialog:

Add-Type -AssemblyName Windows.Forms
$FolderBrowser = New-Object System.Windows.Forms.FolderBrowserDialog
$FolderBrowser.Description = 'Select the folder containing the data'
$result = $FolderBrowser.ShowDialog()
if ($result -eq [Windows.Forms.DialogResult]::OK){
    $FolderBrowser.SelectedPath
}
else {
    exit
}

While this works as expected, the dialog won’t show up as the topmost window. This could lead to situations where users of your script might miss the dialog or simply complain because they have to switch windows. Even though there is no built-in property to set the dialog as the topmost window, the same can be achieved using the second overload of the ShowDialog method (MSDN: ShowDialog method). This overload takes expects a parameter which indicates the parent windows of the dialog. Since the owning window will not be used after the dialog has been closed we can just create a new form on the fly within the method call:

Add-Type -AssemblyName System.Windows.Forms
$FolderBrowser = New-Object System.Windows.Forms.FolderBrowserDialog
$FolderBrowser.Description = 'Select the folder containing the data'
$result = $FolderBrowser.ShowDialog((New-Object System.Windows.Forms.Form -Property @{TopMost = $true }))
if ($result -eq [Windows.Forms.DialogResult]::OK){
    $FolderBrowser.SelectedPath
}
else {
    exit
}

shareThoughts


Photo Credit: Infomastern via Compfight cc