🔥 Upgrade to AI Insights Extended - Get 55% Off

The Promise, And Problems Of Bayes DB

Published 12/09/2013, 01:08 AM

A friend forwarded me a link yesterday to something pretty amazing, it’s called BayesDB. Basically, it is a database architecture that detects predictive relationships between variables. Here’s how the guys who developed it at MIT describe their creation.

BayesDB, a Bayesian database table, lets users query the probable implications of their data as easily as a SQL database lets them query the data itself. Using the built-in Bayesian Query Language (BQL), users with no statistics training can solve basic data science problems, such as detecting predictive relationships between variables, inferring missing values, simulating probable observations, and identifying statistically similar database entries.

This is another huge step in the process of allowing regular developers and entrepreneurs to use data science to improve their products and understand relationships in their data. It sits along a very pronounced curve of infrastructure becoming more pre packaged and cheaper, allowing more people to succeed at building things, which is why you are currently seeing a technology BOOM.

At Estimize, we do a lot of quant work to understand the relationship between attributes of an analyst and their estimates, to accuracy. Which attributes give us a better confidence that they will be more accurate in the future. It’s extremely important to giving us confidence that if we have an open community, we can identify those analysts who deserve to be weighted more highly than the rest, regardless of their biographical background (unless that is a correlated factor).

But there are issues I want to bring up regarding BayesDB and it’s use.

When doing good data science, or quant finance, two things are extremely important to keep in mind. You need to start out with a hypothesis, you should not just be throwing all of the attributes into a database like BayesDB and allowing it to spit out the correlated factors. Data science is called data science, because it’s supposed to be data science, which means hypothesis, test, measure, conclusion.

And second, you need to do in and out of sample testing to make sure that you are not curve fitting or data snooping. Certain factors may be correlated over the course of the whole data set, but what if those factors chanced throughout the history of that time series? Do you have any confidence that they won’t change going forward? You need to take a portion of the time series, put it through BayesDB, then take the other portion of the series, put it through BayesDB, and see if the correlations hold. It’s always good to split this up two ways as well, take the first half of the time series and split it from the second half, and then also do a cross section of data from the whole time series.

Without having a hypothesis regarding correlated factors and why they are correlated, as well as doing in and out of sample testing to make sure you aren’t data snooping, BayesDB is a dangerous tool that can lead to bad conclusions. I hope that at some point they are able to build in the ability to do in and out of sample automatically without having to load two different sets and compare.

But let’s just marvel at how awesome this thing is to begin with. Hopefully another major step on the way to a smarter more predictable world.

Full Disclosure: Nothing on this site should ever be considered to be advice, research or an invitation to buy or sell any securities, please see the Disclaimer page for a full disclaimer.

View all comments (0)0

Latest comments

US 30

48,551.00

+92.9

+0.19%

US 500

6,838.80

+11.4

+0.17%

Dow Jones

48,458.05

-245.96

-0.51%

S&P 500

6,827.41

-73.59

-1.07%

Nasdaq

23,195.17

-398.69

-1.69%

S&P 500 VIX

15.74

+0.89

+5.99%

Dollar Index

98.07

+0.050

+0.05%

Name	Last	Chg. %	Vol.
LULU	204.97	+9.60%	20.26M
MOS	26.21	+4.05%	9.83M
GE	299.81	+3.95%	9.38M
CMG	36.14	+3.64%	28.32M
BALL	50.91	+3.45%	3.60M
LIN	416.24	+3.21%	4.73M
AJG	255.38	+3.13%	2.24M

Name	Last	Chg. %	Vol.
AVGO	359.93	-11.43%	95.59M
GLW	88.32	-7.97%	9.69M
ANET	124.76	-7.17%	8.51M
APH	129.24	-7.08%	13.26M
CEG	351.98	-7.03%	3.78M
MU	241.14	-6.70%	25.94M
FSLR	254.80	-6.61%	2.97M

Trending Stocks

Name	Last	Chg. %	Vol.
NVDA	175.02	-3.27%	204.27M
TSLA	458.96	+2.70%	95.66M
AVGO	359.93	-11.43%	95.59M
ORCL	189.97	-4.47%	55.20M
AAPL	278.28	+0.09%	39.53M

Install Our AppScan QR code to install app

Risk Disclosure: Trading in financial instruments and/or cryptocurrencies involves high risks including the risk of losing some, or all, of your investment amount, and may not be suitable for all investors. Prices of cryptocurrencies are extremely volatile and may be affected by external factors such as financial, regulatory or political events. Trading on margin increases the financial risks.
Before deciding to trade in financial instrument or cryptocurrencies you should be fully informed of the risks and costs associated with trading the financial markets, carefully consider your investment objectives, level of experience, and risk appetite, and seek professional advice where needed.
Fusion Media would like to remind you that the data contained in this website is not necessarily real-time nor accurate. The data and prices on the website are not necessarily provided by any market or exchange, but may be provided by market makers, and so prices may not be accurate and may differ from the actual price at any given market, meaning prices are indicative and not appropriate for trading purposes. Fusion Media and any provider of the data contained in this website will not accept liability for any loss or damage as a result of your trading, or your reliance on the information contained within this website.
It is prohibited to use, store, reproduce, display, modify, transmit or distribute the data contained in this website without the explicit prior written permission of Fusion Media and/or the data provider. All intellectual property rights are reserved by the providers and/or the exchange providing the data contained in this website.
Fusion Media may be compensated by the advertisers that appear on the website, based on your interaction with the advertisements or advertisers.

Popular Searches

Please try another search

The Promise, And Problems Of Bayes DB

Latest comments

Trending Stocks