I am familiar with the C# SqlBulkCopy class where you can call the ‘WriteToServer’ method passing through a DataTable.
My question is what underlying mechanism in SQL server is used to bulk insert that data?
The reason I ask is that the bulk insert referenced in the Bulk Insert MSDN T-SQL help file requires a data file to import. Does the SqlBulkCopy create a data file?
I would like to understand this stuff to work out whether I can use the bulk insert functionality in SQL.
If I write a SQL statement that prepares all the rows to insert into a particular table (thousands of rows) can I bulk insert them into the destination table? Something like this is how I am doing it now,
INSERT INTO sync_filters (table_name, device_id, road_id, contract_id) SELECT * FROM dbo.sync_contract_filters (@device_id)
And the dbo.sync_contract_filters is a function to generate all the rows to insert. Can this be bulk inserted?
SqlBulkCopy does not create a data file. It streams the data table directly from the .Net DataTable object to the server using the available communication protocol (Named Pipes, TCP/IP, etc…) and insert the data to the destination table in bulk using the same technique used by BCP.
It took 7 years, but we finally have an answer…
Expounding upon Sam Anwar’s answer, I can confirm it is converting the data to a raw byte stream and writing it to SQL as if it were streaming in from a file. How it tricks SQL into thinking it’s reading a file is beyond me.
I wanted to do a bulk insert from inside a query, to speed up a slow clustered index insert. Upon finding your post here, somehow I became disturbingly intrigued, so I spent the past several hours studying it.
The execution path that actually writes data to the server seems to be:
- Your code calls System.Data.SqlClient.SqlBulkCopy.WriteToServer()
- which calls WriteRowSourceToServerAsync()
- which calls WriteRowSourceToServerCommon() to map the columns and WriteToServerInternalAsync() to write the data
- which calls WriteToServerInternalRestContinuedAsync()
- which calls AnalyzeTargetAndCreateUpdateBulkCommand() (This is the answer. Skip to step 14 to read about it.) and CopyBatchesAsync()
- which (CopyBatchesAsync) calls SubmitBulkUpdateCommand()
— inside System.Data.SqlClient.TdsParser:
- which calls System.Data.SqlClient.TdsParser.TdsExecuteSQLBatch()
- which calls WriteString() or similar methods to convert the data into a byte array
- which calls WriteByteArray()
- which calls WritePacket()
- which calls WriteSni()
- which calls SNIWritePacket()
— inside System.Data.SqlClient.SNINativeMethodWrapper:
- which calls System.Data.SqlClient.SNINativeMethodWrapper.SNIWritePacket()
- which extern calls SNIWriteAsyncWrapper() or SNIWriteSyncOverAsync()
Now here’s where it gets tricky. I think this follows, but how I got there is a bit hacky. I opened the file properties on my copy of sni.dll, went to the details tab, and inside the Product Version property I found a reference to a “commit hash” of d0d5c7b49271cadb6d97de26d8e623e98abdc8db.
So I googled that hash, and via this Nuget search I found this Nuget package, whose title includes “System.Data.SqlClient.sni”, which implies the namespace System.Data.SqlClient.SNI, which I found here, but this doesn’t have the right methods and doesn’t actually seem to communicate with a server.
So this is where I ran out of know-how; this is as deep as I could get before it goes into native code I can’t find anywhere. And although I’m not sure what all that other noise up above was…
- Remember Step 4 (WriteToServerInternalRestContinuedAsync()) also calls AnalyzeTargetAndCreateUpdateBulkCommand()
- which concatenates a SQL query inside a StringBuilder named updateBulkCommandText. Line 544 in that last link.
TLDR: Ultimately it appears it just executes an INSERT BULK query (which does not require a file), and does not actually use BULK INSERT (which does). Note these two commands look very similar.
An important note in the Microsoft docs:
Used by external tools to upload a binary data stream. This option is
not intended for use with tools such as SQL Server Management Studio,
SQLCMD, OSQL, or data access application programming interfaces such
as SQL Server Native Client.
Which I interpret as “use at your own risk and don’t expect help”. Which is almost as good as a green light, in all fairness.
SqlBulkCopy can use a datatable, IDataReader or DataRow as a datasource. Take a look at the WriteToServer methods of the class. I’ve found it to be quite a useful mechanism of getting data into SQL Server. I’ve used this in conjunction with CSVReader previously. The previous link gives you an idea of how it works with a class that implements IDataReader.
You might need to play around with the batch sizes to ensure you get the performance you are expecting.
The Data Loading Performance Guide on MSDN is quite a useful resource if you are looking at getting the best performance out of writing large amounts of data quickly. It focuses more on things like BCP and BULK INSERT, but touches on SQLBulkCopy and also provides quite a lot of food for thought (maybe a little too much, but it is a useful reference at least anyway).