07-21-2023, 07:18 PM
**Problem:** I have huge amount of sql queries (around 10k-20k) and I want to run them asynchronous in 50 (or more) threads.
I wrote a powershell script for this job, but it is very slow (It took about 20 hours to execute all). **Desired result is 3-4 hours max.**
**Question:** How can I optimize this powershell script? Should I reconsider and use another technology like `python` or `c#`?
I think it's powershell issue, because when I check with `whoisactive` the queries are executing fast. Creating, exiting and unloading jobs takes a lot of time, because for each thread is created separate PS instances.
**My code:**
$NumberOfParallerThreads = 50;
$Arr_AllQueries = @('Exec [mystoredproc] @param1=1, @param2=2',
'Exec [mystoredproc] @param1=11, @param2=22',
'Exec [mystoredproc] @param1=111, @param2=222')
#Creating the batches
$counter = [pscustomobject] @{ Value = 0 };
$Batches_AllQueries = $Arr_AllQueries | Group-Object -Property {
[math]::Floor($counter.Value++ / $NumberOfParallerThreads)
};
forEach ($item in $Batches_AllQueries) {
$tmpBatch = $item.Group;
$tmpBatch | % {
$ScriptBlock = {
# accept the loop variable across the job-context barrier
param($query)
# Execute a command
Try
{
Write-Host "[processing '$query']"
$objConnection = New-Object System.Data.SqlClient.SqlConnection;
$objConnection.ConnectionString = 'Data Source=...';
$ObjCmd = New-Object System.Data.SqlClient.SqlCommand;
$ObjCmd.CommandText = $query;
$ObjCmd.Connection = $objConnection;
$ObjCmd.CommandTimeout = 0;
$objAdapter = New-Object System.Data.SqlClient.SqlDataAdapter;
$objAdapter.SelectCommand = $ObjCmd;
$objDataTable = New-Object System.Data.DataTable;
$objAdapter.Fill($objDataTable) | Out-Null;
$objConnection.Close();
$objConnection = $null;
}
Catch
{
$ErrorMessage = $_.Exception.Message
$FailedItem = $_.Exception.ItemName
Write-Host "[Error processing: $($query)]" -BackgroundColor Red;
Write-Host $ErrorMessage
}
}
# pass the loop variable across the job-context barrier
Start-Job $ScriptBlock -ArgumentList $_ | Out-Null
}
# Wait for all to complete
While (Get-Job -State "Running") { Start-Sleep 2 }
# Display output from all jobs
Get-Job | Receive-Job | Out-Null
# Cleanup
Remove-Job *
}
***UPDATE*:**
**Resources:** The DB server is on a remote machine with:
- 24GB RAM,
- 8 cores,
- 500GB Storage,
- SQL Server 2016
We want to use the maximum cpu power.
**Framework limitation:** The only limitation is ***not** to use SQL Server* to execute the queries. The requests should come from outside source like: Powershell, C#, Python, etc.
I wrote a powershell script for this job, but it is very slow (It took about 20 hours to execute all). **Desired result is 3-4 hours max.**
**Question:** How can I optimize this powershell script? Should I reconsider and use another technology like `python` or `c#`?
I think it's powershell issue, because when I check with `whoisactive` the queries are executing fast. Creating, exiting and unloading jobs takes a lot of time, because for each thread is created separate PS instances.
**My code:**
$NumberOfParallerThreads = 50;
$Arr_AllQueries = @('Exec [mystoredproc] @param1=1, @param2=2',
'Exec [mystoredproc] @param1=11, @param2=22',
'Exec [mystoredproc] @param1=111, @param2=222')
#Creating the batches
$counter = [pscustomobject] @{ Value = 0 };
$Batches_AllQueries = $Arr_AllQueries | Group-Object -Property {
[math]::Floor($counter.Value++ / $NumberOfParallerThreads)
};
forEach ($item in $Batches_AllQueries) {
$tmpBatch = $item.Group;
$tmpBatch | % {
$ScriptBlock = {
# accept the loop variable across the job-context barrier
param($query)
# Execute a command
Try
{
Write-Host "[processing '$query']"
$objConnection = New-Object System.Data.SqlClient.SqlConnection;
$objConnection.ConnectionString = 'Data Source=...';
$ObjCmd = New-Object System.Data.SqlClient.SqlCommand;
$ObjCmd.CommandText = $query;
$ObjCmd.Connection = $objConnection;
$ObjCmd.CommandTimeout = 0;
$objAdapter = New-Object System.Data.SqlClient.SqlDataAdapter;
$objAdapter.SelectCommand = $ObjCmd;
$objDataTable = New-Object System.Data.DataTable;
$objAdapter.Fill($objDataTable) | Out-Null;
$objConnection.Close();
$objConnection = $null;
}
Catch
{
$ErrorMessage = $_.Exception.Message
$FailedItem = $_.Exception.ItemName
Write-Host "[Error processing: $($query)]" -BackgroundColor Red;
Write-Host $ErrorMessage
}
}
# pass the loop variable across the job-context barrier
Start-Job $ScriptBlock -ArgumentList $_ | Out-Null
}
# Wait for all to complete
While (Get-Job -State "Running") { Start-Sleep 2 }
# Display output from all jobs
Get-Job | Receive-Job | Out-Null
# Cleanup
Remove-Job *
}
***UPDATE*:**
**Resources:** The DB server is on a remote machine with:
- 24GB RAM,
- 8 cores,
- 500GB Storage,
- SQL Server 2016
We want to use the maximum cpu power.
**Framework limitation:** The only limitation is ***not** to use SQL Server* to execute the queries. The requests should come from outside source like: Powershell, C#, Python, etc.