Mark Minasi's Tech Forum
Sign up Calendar Latest Topics
 
 
 


Reply
  Author   Comment  
wkasdo

Avatar / Picture

Administrator
Registered:
Posts: 241
Reply with quote  #1 
Hi,

For an Azure provisioning activity I need of a large number of jobs (about 200). These can all run in parallel, but obviously not all of them at the same time.

The parallel part is easy: the PowerShell jobs feature fits the bill just fine. The problem is, how do I reliably throttle them, and preferably, get results back while the jobs come in.

I did some research, and found .net runspaces and powershell workflows as alternatives. Neither are as generic as I would like, so I ended up with this: http://blogs.msdn.com/b/powershell/archive/2011/04/04/scaling-and-queuing-powershell-background-jobs.aspx.

After some fiddling I have a working prototype that seems to do what I want. But before I build on it, I'd thought I'd check with the forum if there are better alternatives?

- parallel jobs
- throttling to a maximum concurrency
- get results from jobs the moment they finish
- minimum overhead and complexity.

Thanks!



__________________
[MSFT]; Blog: https://blogs.technet.microsoft.com/389thoughts/
0
Mark Minasi

Avatar / Picture

Humble Proprietor
Registered:
Posts: 175
Reply with quote  #2 
Sounds neat!  I don't have an alternative, though.
0
JeffHicks

Avatar / Picture

New Friend (or an Old Friend who Built a New Account)
Registered:
Posts: 35
Reply with quote  #3 
Personally, I would use a workflow that runs locally. But the solution you found would work just as well. Of course, you've now given me a new shiny ball to chase.
__________________
Jeff Hicks Author ~ Trainer ~ Guru
Cloud and Datacenter Management MVP



0
JeffHicks

Avatar / Picture

New Friend (or an Old Friend who Built a New Account)
Registered:
Posts: 35
Reply with quote  #4 
So I spent some time seeing how you might use a workflow. I start with a list of PowerShell commands like this that is stored in a text file.
#mycmds.txt
#list of commands to run

Get-Service -computer chi-dc01,chi-dc02,chi-dc04 | where {$_.status -eq 'running'} | export-clixml c:\work\dcsvc.xml
get-ciminstance win32_product | export-clixml c:\work\myprod.xml
$p = get-process -computername chi-hvr1,chi-hvr2
$w = dir c:\windows -file -recurse -erroraction silentlycontinue ; $w.count
get-eventlog -list -computer chi-core01 | Select * -ExcludeProperty entries | export-clixml c:\work\core01-logs.xml
get-eventlog -list -computer chi-core02 |Select * -ExcludeProperty entries |  export-clixml c:\work\core02-logs.xml
get-eventlog -list -computer chi-fp02 | Select * -ExcludeProperty entries | export-clixml c:\work\fp02-logs.xml
get-eventlog -list -computer chi-hvr1 | Select * -ExcludeProperty entries | export-clixml c:\work\hvr1-logs.xml
get-eventlog -list -computer chi-hvr2 | Select * -ExcludeProperty entries | export-clixml c:\work\hvr2-logs.xml

Each line is a PowerShell expression and is intended to be as self-contained as possible. I could also have specified a script file. The intention is that these are all of the commands that need to be run, and I don't care what order they run in. I then created this workflow:


Workflow DemoParallelJob {
Param(
[Parameter(Position=0,Mandatory,HelpMessage = "Enter the path to your list of commands")]
[ValidateNotNullorEmpty()]
[string]$InputList,
[int]$Throttle = 4
)
write-Verbose "[$(Get-Date -format T)] Starting $workflowCommandname"
Write-Verbose "[$(Get-Date -format T)] Using commands from $InputList"
#filter out comments and blank lines
$myCommands = Get-Content -Path $InputList | where {$_ -AND $_ -notmatch "^#"}
foreach -parallel -throttlelimit $Throttle ($command in $mycommands) {
    Write-Verbose "[$(Get-Date -format T)] START: $command"
    #$sb = [scriptblock]::Create($command)
    #Write-Verbose $sb
    #Inlinescript { $command }
    Invoke-Expression -Command $command
    Write-Verbose "[$(Get-Date -format T)] END: $command"
    #Start-sleep -Milliseconds (Get-Random -Minimum 1000 -Maximum 5000)
}
Write-Verbose "[$(Get-Date -format T)] Completed $workflowCommandName"
} #end workflow



The workflow is intended to be run locally.


DemoParallelJob c:\scripts\mycmds.txt -Verbose



The commands are processed in the foreach block in parallel and throttled. I added a parameter so you can adjust the throttle limit. As commands complete, new ones are added up to the throttle limit.


VERBOSE: [localhost]:[10:59:29 AM] Starting DemoParallelJob
VERBOSE: [localhost]:[10:59:29 AM] Using commands from c:\scripts\mycmds.txt
VERBOSE: [localhost]:[10:59:29 AM] START: $w = dir c:\windows -file -recurse -erroraction silentlycon
tinue ; $w.count
VERBOSE: [localhost]:[10:59:29 AM] START: $p = get-process -computername chi-hvr1,chi-hvr2
VERBOSE: [localhost]:[10:59:29 AM] START: get-ciminstance win32_product | export-clixml c:\work\mypro
d.xml
VERBOSE: [localhost]:[10:59:29 AM] START: Get-Service -computer chi-dc01,chi-dc02,chi-dc04 | where {$
_.status -eq 'running'} | export-clixml c:\work\dcsvc.xml
VERBOSE: [localhost]:[10:59:29 AM] END: $p = get-process -computername chi-hvr1,chi-hvr2
VERBOSE: [localhost]:[10:59:29 AM] START: get-eventlog -list -computer chi-core01 | Select * -Exclude
Property entries | export-clixml c:\work\core01-logs.xml
VERBOSE: [localhost]:[10:59:29 AM] END: get-eventlog -list -computer chi-core01 | Select * -ExcludePr
operty entries | export-clixml c:\work\core01-logs.xml
VERBOSE: [localhost]:[10:59:30 AM] START: get-eventlog -list -computer chi-core02 |Select * -ExcludeP
roperty entries |  export-clixml c:\work\core02-logs.xml
VERBOSE: [localhost]:[10:59:30 AM] END: get-eventlog -list -computer chi-core02 |Select * -ExcludePro
perty entries |  export-clixml c:\work\core02-logs.xml
VERBOSE: [localhost]:[10:59:30 AM] START: get-eventlog -list -computer chi-fp02 | Select * -ExcludePr
operty entries | export-clixml c:\work\fp02-logs.xml
VERBOSE: [localhost]:[10:59:31 AM] END: get-eventlog -list -computer chi-fp02 | Select * -ExcludeProp
erty entries | export-clixml c:\work\fp02-logs.xml
VERBOSE: [localhost]:[10:59:31 AM] START: get-eventlog -list -computer chi-hvr1 | Select * -ExcludePr
operty entries | export-clixml c:\work\hvr1-logs.xml
VERBOSE: [localhost]:[10:59:31 AM] END: get-eventlog -list -computer chi-hvr1 | Select * -ExcludeProp
erty entries | export-clixml c:\work\hvr1-logs.xml
VERBOSE: [localhost]:[10:59:31 AM] START: get-eventlog -list -computer chi-hvr2 | Select * -ExcludePr
operty entries | export-clixml c:\work\hvr2-logs.xml
VERBOSE: [localhost]:[10:59:31 AM] END: get-eventlog -list -computer chi-hvr2 | Select * -ExcludeProp
erty entries | export-clixml c:\work\hvr2-logs.xml
VERBOSE: [localhost]:[10:59:39 AM] END: Get-Service -computer chi-dc01,chi-dc02,chi-dc04 | where {$_.
status -eq 'running'} | export-clixml c:\work\dcsvc.xml
VERBOSE: [localhost]:[11:00:29 AM] END: get-ciminstance win32_product | export-clixml c:\work\myprod.
xml
162588
VERBOSE: [localhost]:[11:00:37 AM] END: $w = dir c:\windows -file -recurse -erroraction silentlyconti
nue ; $w.count
VERBOSE: [localhost]:[11:00:37 AM] Completed DemoParallelJob


If you had steps that needed to be done in a certain order, you could probably group them into separate lists and then run the workflow several times.

__________________
Jeff Hicks Author ~ Trainer ~ Guru
Cloud and Datacenter Management MVP



0
wkasdo

Avatar / Picture

Administrator
Registered:
Posts: 241
Reply with quote  #5 
Hi Jeff,

Agree, simple and effective. The thing that scared me off is this blog written by Richard Siddaway: http://blogs.technet.com/b/heyscriptingguy/archive/2013/01/02/powershell-workflows-restrictions.aspx. Workflows get translated to SAML which restricts the powershell constructs you can use. For my project I will need to fire off a number of complex PowerShell scripts that I don't fully control. Would those fail?

__________________
[MSFT]; Blog: https://blogs.technet.microsoft.com/389thoughts/
0
JeffHicks

Avatar / Picture

New Friend (or an Old Friend who Built a New Account)
Registered:
Posts: 35
Reply with quote  #6 
I don't know about the scripts. One thing you could try is to run them in an Inlinescript in the command. Anything you run in an InlineScript needs to be self-contained and shouldn't reference variables or other parts of the workflow. There are ways to do it if pressed by I try to avoid them.  I should re-test my workflow with a few scripts and see what happens.
__________________
Jeff Hicks Author ~ Trainer ~ Guru
Cloud and Datacenter Management MVP



0
JeffHicks

Avatar / Picture

New Friend (or an Old Friend who Built a New Account)
Registered:
Posts: 35
Reply with quote  #7 
I tested with some ps1 files in my command list text file and didn't have problems. In the scripts I also did things like import modules and query remote computers. I even had code in my scripts that invoked .NET methods, something you can't do in a workflow.
__________________
Jeff Hicks Author ~ Trainer ~ Guru
Cloud and Datacenter Management MVP



0
wkasdo

Avatar / Picture

Administrator
Registered:
Posts: 241
Reply with quote  #8 
OK,  you convinced me. I'll check it out [smile]
__________________
[MSFT]; Blog: https://blogs.technet.microsoft.com/389thoughts/
0
wkasdo

Avatar / Picture

Administrator
Registered:
Posts: 241
Reply with quote  #9 
This is what I ended up with:

# experiment workflow. Max parallel is silently limited to 16 (on win10)
#
Workflow DemoParallelJob {
    Param(
        [Parameter()]
        [int]$Throttle = 4
    )
    $myCommands = 1..10 | ForEach-Object { 
@"
            write-host "starting job $_,  Starting sleep. " -foregroundcolor cyan
            Start-Sleep -Seconds (3 + $_); 
            .\test.ps1 -nr $_
            write-host "ending job   $_." -foregroundcolor yellow
"@
    }
    foreach -parallel -throttlelimit $Throttle ($command in $mycommands) {
        $sb = [scriptblock]::Create($command)
        Invoke-Expression -Command $sb
    }
}


Findings:
- works with external scripts and commands (test.ps1)
- concurrency silently limited to 16 on Win10. Seeing reports of lower numbers (5) on earlier systems.

So it seems to do what I need (handle external stuff concurrently). Thanks!


__________________
[MSFT]; Blog: https://blogs.technet.microsoft.com/389thoughts/
0
JeffHicks

Avatar / Picture

New Friend (or an Old Friend who Built a New Account)
Registered:
Posts: 35
Reply with quote  #10 
Good to hear.
__________________
Jeff Hicks Author ~ Trainer ~ Guru
Cloud and Datacenter Management MVP



0
JeffWouters

Still Checking the Forum Out
Registered:
Posts: 1
Reply with quote  #11 

You could take a look at the script from Chrissy LeMaire.

She's done an amazing job with runspaces and jobs for writing 1.000.000+ rows per minute to a SQL database.

Should give you a powerhouse of executing code in parallel :-)

But... looking at the code you ended up with Chrissy's stuff may be a bit overkill in this case :-P

0
jsclmedave

Administrator
Registered:
Posts: 501
Reply with quote  #12 
Jeff is right.  For SQL BulkCopy her scripts are fantastic!

I need to email her the answer to our situation which was resolved when we defined the input for a couple of specific Rows as an INT instead of a String.  This was due to the SQL Table being created with a third party product which we had little insight to work with.

Her scripts are definitely worth checking out! 

https://blog.netnerds.net/author/chrissy/

https://mvp.microsoft.com/en-us/PublicProfile/5001321?fullName=Chrissy%20%20LeMaire





__________________
Tim Bolton @jsclmedave
Email: [string](0..20|%{[char][int](32+('527377347976847978324785847679797514357977').substring(($_*2),2))}) -replace ' '  

New to the forum? Please Read this
0
Chrissy LeMaire

Avatar / Picture

Still Checking the Forum Out
Registered:
Posts: 1
Reply with quote  #13 
Aw shux, thanks guys! That reminds me, I should blog about the module and runspaces [wink] I'll do that and update this post.

I prefer runspaces over jobs because they don't launch new powershell.exe instances, and they're so much faster. In the meantime, I thought The Surly Admin had an easy to understand breakdown. Use the MaxThreads to throttle. 

My module, SqlImportSpeedtest, which I wrote as a proof of concept for SQL/PowerShell/Runspace speed launches 500 runspaces for the smaller samples, and 12,500 for the larger one.

The performance blazes even with a crazy number of runspaces. 25,000,000 real-world CSV rows, which generate 12,500 runspaces, can be imported to SQL Server in less than 2 minutes. 

[25million]
0
jsclmedave

Administrator
Registered:
Posts: 501
Reply with quote  #14 
Quote:
Originally Posted by Chrissy LeMaire
Aw shux, thanks guys! That reminds me, I should blog about the module and runspaces [wink] I'll do that and update this post.

I prefer runspaces over jobs because they don't launch new powershell.exe instances, and they're so much faster. In the meantime, I thought The Surly Admin had an easy to understand breakdown. Use the MaxThreads to throttle. 

My module, SqlImportSpeedtest, which I wrote as a proof of concept for SQL/PowerShell/Runspace speed launches 500 runspaces for the smaller samples, and 12,500 for the larger one.

The performance blazes even with a crazy number of runspaces. 25,000,000 real-world CSV rows, which generate 12,500 runspaces, can be imported to SQL Server in less than 2 minutes. 

[25million]



Very Cool!

Welcome to the Forum Chrissy!



__________________
Tim Bolton @jsclmedave
Email: [string](0..20|%{[char][int](32+('527377347976847978324785847679797514357977').substring(($_*2),2))}) -replace ' '  

New to the forum? Please Read this
0
Previous Topic | Next Topic
Print
Reply

Quick Navigation:

Easily create a Forum Website with Website Toolbox.