Distribution Statistics for Linked Servers

More
1 week 2 days ago #14428 by csmith
SQL Server uses statistics on database tables in the query optimizer to find the best plan. Those statistics are used to calculate an estimated count of rows from a table, taking into account filter parameters. For example, when joining table A with 5 rows with table B with 1,000,000 rows, it would make sense to use the 5 rows to seek to records among the 1,000,000 rather than loading the entire million record table.

From what I am seeing in pgnprps (PostgreSQL OLE DB driver, Server Edition) is that all remote tables have an estimated rowcount of 1, regardless of how many rows are actually in the table.

OLE DB providers can provide statistics that SQL Server will use to as described in Distribution Statistics Requirements for OLE DB Providers .

PostgreSQL provides statistics as described in 14.2. Statistics Used by the Planner , specifically the pg_class and pg_stats tables.

It would be really great if your PostgreSQL OLEDB driver could pass these statistics along to SQL Server to improve the ability of SQL Server's query optimizer to make intelligent decisions about how to pull data. I think it would improve the performance of many queries involving linked tables.

Estimated rowcounts of one for all tables is causing some bad plans when SQL Server is surprised to get millions of rows when it thought there would only be one. Even a bigger default constant of 1000 would be helpful.

Please Log in or Create an account to join the conversation.

More
1 week 2 days ago #14429 by admin
This is great suggestion! We will try to incorporate the statistics in the upcoming build.

Please Log in or Create an account to join the conversation.

Time to create page: 0.122 seconds
Powered by Kunena Forum