SQL Weirdness
In todays 'WAT?' SQL story, this query:
SELECT DISTINCT EXTRACT(YEAR FROM date) AS Year FROM Documents ORDER BY Year;
takes about half a second and returns a list of years.
This query:
SELECT (SELECT Documents.publication FROM Documents WHERE id=WordFrequencies.doc) AS publication,(SELECT Documents.title FROM Documents WHERE id=WordFrequencies.doc) AS headline,(SELECT Documents.sentiment FROM Documents WHERE id=WordFrequencies.doc) AS articlesentiment,doc,word,count,(SELECT Words.sentiment FROM Words WHERE Words.word=WordFrequencies.word) AS wordsentiment,date FROM WordFrequencies WHERE doc IN (SELECT doc FROM Sentences WHERE YEAR(date) IN (2000, 2003, 2004, 2005,2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017) AND raw REGEXP 'trump') AND partofspeech='ADJ';
takes a bit less than 15 seconds, the list of years is the output from the first query.
But this query never seems to end:
SELECT (SELECT Documents.publication FROM Documents WHERE id=WordFrequencies.doc) AS publication,(SELECT Documents.title FROM Documents WHERE id=WordFrequencies.doc) AS headline,(SELECT Documents.sentiment FROM Documents WHERE id=WordFrequencies.doc) AS articlesentiment,doc,word,count,(SELECT Words.sentiment FROM Words WHERE Words.word=WordFrequencies.word) AS wordsentiment,date FROM WordFrequencies WHERE doc IN (SELECT doc FROM Sentences WHERE YEAR(date) IN (SELECT DISTINCT EXTRACT(YEAR FROM date) AS Year FROM Documents ORDER BY Year) AND raw REGEXP 'trump') AND partofspeech='ADJ';
I suspect that there is some bit of optimisation that I am unknowingly expecting mysql to have that it doesn't have.
It is easy enough to have a simple python script to do the first query and then insert the result into the second one, but I am very confused about why it is necessary.
re: SQL Weirdness
@inmysocks is there a reason you're not using JOINs at all?
re: SQL Weirdness
@zatnosk mainly because I am not very good with SQL.
re: SQL Weirdness
re: SQL Weirdness
@inmysocks my best tip is to put as many restrictions in the ON clauses as possible - although I'm not 100% on where the optimizations happen best.