SQL BI Manager Roadmap

Polybase and Time Travel
(Temporal Tables)
Stan Geiger
#492 | Phoenix 2016
Platinum Level Sponsors
Gold Level Sponsors
Venue Sponsor
Key Note Sponsor
Pre Conference Sponsor
Silver Level Sponsors
Bronze Level Sponsors
About Me
• Sr. Product Manager with Idera
• Performance Monitoring of Microsoft BI stack
• Backup and Recovery of Microsoft SQL Server
•
•
•
•
Geek Sync Presenter
Blog Contributor
HSSUG presenter
Over 25 years experience
•
•
•
•
BI, Data Architect
DBA
.Net Developer
Data Analyst
Where in the world are we?
BI and analytics
Data warehouse
ETL
… data warehousing has reached
the most significant tipping point
since its inception. The biggest,
possibly most elaborate data
management system in IT is
changing.
– Gartner, “The State of Data Warehousing in 2012”
Data sources
5
The Cool Kid’s Data Warehouse
6
The Data Warehouse of the Future?
•
•
•
•
Diverse Big Data
Workload Centric Approach
Data stored on multiple platforms
Physically distributed data warehouse
•
•
•
•
data warehouse appliances
columnar RDBMSs
NoSQL databases
MapReduce tools, and HDFS.
7
The Data Warehouse of the Future…Its
Here!
8
SQL Server Technology Drivers
PolyBase
JSON Data
Temporal Tables
In Memory Table
ColumnStore Index
PolyBase
10
PolyBase
• Use T-SQL to store data in SQL Server
from Hadoop or Azure as tables.
• Knowledge of Hadoop or Azure is not
required to use.
• Pushes computation to where data
resides
• Export relational data into Hadoop or
11
PolyBase - External Tables, Data Sources & File
Formats
Data Scientists,
BI Users, DB Admins
Your
Apps
PowerPivot
PowerView
12
SQL Server w/
PolyBase
External Table
External
Data Source
Relational DW
Hadoop
External File
Format
PolyBase
Split-Based Query
Processing
Web
Apps
Social
Apps
Mobile
Apps
Sensor
&RFID
PolyBase Scenarios
• Querying
• Run T-SQL over HDFS
• Combine data from different Hadoop clusters
• Join relational with non-relational data
• ETL
• Subset of Hadoop in Columnar Format
• Enable data aging scenarios to more economic storage
• Allows building of multi-temperate DW platforms
• SQL Server acts as hot query engine processing most
recent data sets
• Aged data immediately accessible via external tables
• No need to groom data
• Hybrid (Azure Integration)
• Mesh-up on-premise and cloud apps
• Bridge between on-premise and Azure
13
PolyBase
1. Create external data source (Hadoop).
2. Create external file format (delimited text file).
3. Create external table pointing to file stored in Hadoop.
CREATE EXTERNAL DATA SOURCE hdp2 with (
TYPE = HADOOP,
LOCATION ='hdfs://10.xxx.xx.xxx:xxxx',
RESOURCE_MANAGER_LOCATION='10.xxx.xx.xxx:xxxx')
14
CREATE EXTERNAL FILE FORMAT ff2 WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (FIELD_TERMINATOR ='|',
USE_TYPE_DEFAULT = TRUE)
CREATE EXTERNAL TABLE [dbo].[CarSensor_Data] (
[SensorKey] int NOT NULL,
[CustomerKey] int NOT NULL,
[GeographyKey] int NULL,
[Speed] float NOT NULL,
[YearMeasured] int NOT NULL
)
WITH (LOCATION='/Demo/car_sensordata.tbl',
DATA_SOURCE = hdp2,
FILE_FORMAT = ff2,
REJECT_TYPE = VALUE,
REJECT_VALUE = 0
PolyBase - Ad-Hoc Query joining relational with Hadoop data
Who drives faster than 35 Miles > joining structured
customer data stored in SQL Server with sensor data
SELECT DISTINCT
Insured_Customers.FirstName,
Insured_Customers.LastName,
Insured_Customers.YearlyIncome,
Insured_Customers.MaritalStatus
into Fast_Customers
from Insured_Customers INNER JOIN
(
select * from CarSensor_Data where Speed > 35
) as SensorD
ON Insured_Customers.CustomerKey = SensorD.CustomerKey
ORDER BY YearlyIncome
CREATE CLUSTERED COLUMNSTORE INDEX CCI_FastCustomers ON
Fast_Customers;
15
CHANGE THE CONFIG FILE!!!!!!!!!!!!!!!!!!
The user that is used to access Hadoop has to
be configured.
This block will need to be inserted into the DWEngineService.exe.config file in the
applicationSettings block which should already exist.
<applicationSettings>
<Microsoft.SqlServer.DataWarehouse.Configuration.Properties.Hadoop>
<setting name="HadoopUserName" serializeAs="String">
<value>Anything besides PDW_User</value>
</setting>
</Microsoft.SqlServer.DataWarehouse.Configuration.Properties.Hadoop>
</applicationSettings>
16
Demo
Temporal Tables
• Temporal Table is really two tables.
•
•
Data Table
Historical Table (PERIOD)
• A temporal table can be defined as a table for which
PERIOD definition exists comprising of system columns
• Slowly Changing Dimension
• Data Table is Type 1
• Historical Table is Type 2
• Recover accidental data changes
18
TEMPORAL TABLES
 Slowly Changing Dimensions
• Type 1
• Records will be updated with no history log
• Type 2
• Keep history with a From Date and To Date
• Set existing record as obsolete (with setting
the To Date)
• Create a new record (with new From Date)
• Type 3
• Limited History
• Type 4
• Similar to Type 2 but broken out as 2 tables.
Temporal Tables
• Requirements/Limitations
• Primary Key
• Two columns (start and end date as
datetime2)
• In-Memory tables cannot be used
• INSERT and UPDATE not allowed on
SYSTEM_TIME period columns
• History Table data cannot be changed.
• Regular queries only affect data in the
current table.
20
Temporal Tables
Example:
CREATE TABLE dbo.TestTemporal
(ID int primary key,
A int,
B int,
C AS A*B,
SysStartTime datetime2 GENERATED ALWAYS AS ROW START NOT NULL,
SysEndTime datetime2 GENERATED ALWAYS AS ROW END NOT NULL,
PERIOD FOR SYSTEM_TIME (SysStartTime, SysEndTime)
) WITH (SYSTEM_VERSIONING = ON);
21
TEMPORAL TABLES
 Creating through SSMS
22
Temporal Tables
• The SELECT statement FROM <table> claus
e has a new clause FOR
SYSTEM_TIME with four temporal-specific
sub-clauses to query data across the current
and history tables.
23
• Point in time: AS OF <date_time>
• Exclusive bounds: FROM <start_date_time> TO
<end_date_time>
• Inclusive lower bound, exclusive upper bound:
BETWEEN <start_date_time> AND
<end_date_time>
• Inclusive bounds: CONTAINED IN
(<start_date_time> , <end_date_time>)
Temporal Tables
• For example, if you want to look at the
values active for customer 27 on the first
of the year:
… FROM Customer FOR SYSTEM_TIME AS OF '2015-1-1'
WHERE CustomerID = 27
• If instead you want to see every version of
the users records for that day you could
write:
… FROM Customer FOR SYSTEM_TIME BETWEEN '2015-1-1'
AND '2015-1-2'WHERE CustomerID = 27
24
Demo
www.idera.com
Try any of our tools for free!
Twitter: @MSBI_Stan
Email: stan.geiger@idera.com