SQL Weirdness 

In todays 'WAT?' SQL story, this query:

SELECT DISTINCT EXTRACT(YEAR FROM date) AS Year FROM Documents ORDER BY Year;

takes about half a second and returns a list of years.

This query:

SELECT (SELECT Documents.publication FROM Documents WHERE id=WordFrequencies.doc) AS publication,(SELECT Documents.title FROM Documents WHERE id=WordFrequencies.doc) AS headline,(SELECT Documents.sentiment FROM Documents WHERE id=WordFrequencies.doc) AS articlesentiment,doc,word,count,(SELECT Words.sentiment FROM Words WHERE Words.word=WordFrequencies.word) AS wordsentiment,date FROM WordFrequencies WHERE doc IN (SELECT doc FROM Sentences WHERE YEAR(date) IN (2000, 2003, 2004, 2005,2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017) AND raw REGEXP 'trump') AND partofspeech='ADJ';

takes a bit less than 15 seconds, the list of years is the output from the first query.

But this query never seems to end:

SELECT (SELECT Documents.publication FROM Documents WHERE id=WordFrequencies.doc) AS publication,(SELECT Documents.title FROM Documents WHERE id=WordFrequencies.doc) AS headline,(SELECT Documents.sentiment FROM Documents WHERE id=WordFrequencies.doc) AS articlesentiment,doc,word,count,(SELECT Words.sentiment FROM Words WHERE Words.word=WordFrequencies.word) AS wordsentiment,date FROM WordFrequencies WHERE doc IN (SELECT doc FROM Sentences WHERE YEAR(date) IN (SELECT DISTINCT EXTRACT(YEAR FROM date) AS Year FROM Documents ORDER BY Year) AND raw REGEXP 'trump') AND partofspeech='ADJ';

I suspect that there is some bit of optimisation that I am unknowingly expecting mysql to have that it doesn't have.

It is easy enough to have a simple python script to do the first query and then insert the result into the second one, but I am very confused about why it is necessary.

re: SQL Weirdness 

I suspect that the sub-query in a sub-query may be doing it.

re: SQL Weirdness 

of course, now that I look at it, I could just leave out the date selection in the subquery because at the moment I am checking all of the years represented anyway so I don't need to filter them...

re: SQL Weirdness 

Don't worry everyone, I am a professional....

re: SQL Weirdness 

@inmysocks is there a reason you're not using JOINs at all?

re: SQL Weirdness 

@zatnosk mainly because I am not very good with SQL.

re: SQL Weirdness 

@inmysocks I skipped the year-check, but otherwise I think it should select the same data based on joining tables together.
The fanciest part is the INNER JOIN that's only used to filter the primary table.

Now that I think of it, that INNER JOIN might explode your result rows... but the other stuff should be faster than multiple subqueries selecting from the same table

Follow

re: SQL Weirdness 

@zatnosk it went from 15 seconds to 0.006 seconds.

I just have to be careful about the table sizes with the joins.

re: SQL Weirdness 

@inmysocks my best tip is to put as many restrictions in the ON clauses as possible - although I'm not 100% on where the optimizations happen best.

Sign in to participate in the conversation
Awoo Space

Awoo.space is a Mastodon instance where members can rely on a team of moderators to help resolve conflict, and limits federation with other instances using a specific access list to minimize abuse.

While mature content is allowed here, we strongly believe in being able to choose to engage with content on your own terms, so please make sure to put mature and potentially sensitive content behind the CW feature with enough description that people know what it's about.

Before signing up, please read our community guidelines. While it's a very broad swath of topics it covers, please do your best! We believe that as long as you're putting forth genuine effort to limit harm you might cause – even if you haven't read the document – you'll be okay!