What is a Data Governance Policy? Data governance policy is concerned with how an organization collects, stores, accesses and maintains its data. As data is now a core enterprise asset, ensuring it is properly maintained and controlled is critical. When creating a data governance program there are at least 4 core areas to consider: Data … [Read more…]
Text Mining Loch Ness Monster Sightings
Text Mining with RapidMiner for Loch Ness Monster Sightings Text mining involves pulling root words from text in a system. In this example, I pulled all of the Loch Ness Monster sightings from 2000 to 2015 from the Official Loch Ness Monster Website into an Excel spreadsheet. Then using the Text Processing extension processed the … [Read more…]
What is RapidMiner
RapidMiner Overview If you are searching for a data mining solution be sure to look into RapidMiner. RapidMiner is an open source predictive analytic software that provides great out of the box support to get started with data mining in your organization. They offer a free desktop software version to get you started. The basic … [Read more…]
Azure Data Catalog
Azure Data Catalog Now available in public preview is the Azure Data Catalog. The Data catalog provides an enterprise data repository to enable end users self service data discovery. The data catalog assists IT and business users by allowing a collaborative solution to publish documented data sets. Users can access the data via an Excel … [Read more…]
SQL Server 2016 Stretch Database
SQL Server 2016 Stretch Database One of the features in SQL Server 2016 that you will want to explore is the ability to store your historical data in the Azure cloud. By leveraging the Stretch Database feature your applications can silently migrate data to the cloud without having to change your existing code. Stretch Database … [Read more…]
Machine Learning Data Sets
Machine Learning Data Sets When starting out with data mining and machine learning you will need to have access to sample data in order to learn the technology. Often times leveraging data outside of your corporate data sets makes it easier to learn. There is a large collection of sample data sets available for use … [Read more…]
SQL Server Polybase (2016)
What is SQL Server Polybase? Coming in SQL Server 2016 is the ability to query relational tables as well as data stored in Hadoop or Azure Blob storage. This technology will allow you to continue to leverage SQL Server for your data warehousing and access data stored in different formats and technologies to combine data … [Read more…]
HASSUG BI Developer Presentation on 7/14/2015
I presented on how to become a BI developer at the Houston Area SQL Server User Group (HASSUG). This session resulted from feedback we received in the user group on differences between database developers and BI developers. If you are interested in seeing the presentation content it is uploaded here: Becoming a bi developer from … [Read more…]
Leveraging SQL Server Database Schema
Leveraging SQL Server Database Schema Data warehouses typically pull data from various sources and combine the data into a common repository. Keeping up with the source systems and what data is being pulled from each system can be challenging. One method that can be used to easily identify the source and type of data in … [Read more…]
CDO Summit Toronto June 4th
I am presenting the lunch keynote at the Toronto Chief Data Officer Summit on June 4th. My session is on how to leverage predictive analytics to reduce customer churn. I will be going over how I leveraged a decision tree model to create a customer risk score for active customers. Then discuss the challenges and … [Read more…]