Nazis make peaceful interaction impossible, on that basis that they want to eradicate us.
Thus violence against them is inevitable.
I'm so glad that we agree. :)
re: SQL Weirdness
of course, now that I look at it, I could just leave out the date selection in the subquery because at the moment I am checking all of the years represented anyway so I don't need to filter them...
SQL Weirdness
In todays 'WAT?' SQL story, this query:
SELECT DISTINCT EXTRACT(YEAR FROM date) AS Year FROM Documents ORDER BY Year;
takes about half a second and returns a list of years.
This query:
SELECT (SELECT Documents.publication FROM Documents WHERE id=WordFrequencies.doc) AS publication,(SELECT Documents.title FROM Documents WHERE id=WordFrequencies.doc) AS headline,(SELECT Documents.sentiment FROM Documents WHERE id=WordFrequencies.doc) AS articlesentiment,doc,word,count,(SELECT Words.sentiment FROM Words WHERE Words.word=WordFrequencies.word) AS wordsentiment,date FROM WordFrequencies WHERE doc IN (SELECT doc FROM Sentences WHERE YEAR(date) IN (2000, 2003, 2004, 2005,2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017) AND raw REGEXP 'trump') AND partofspeech='ADJ';
takes a bit less than 15 seconds, the list of years is the output from the first query.
But this query never seems to end:
SELECT (SELECT Documents.publication FROM Documents WHERE id=WordFrequencies.doc) AS publication,(SELECT Documents.title FROM Documents WHERE id=WordFrequencies.doc) AS headline,(SELECT Documents.sentiment FROM Documents WHERE id=WordFrequencies.doc) AS articlesentiment,doc,word,count,(SELECT Words.sentiment FROM Words WHERE Words.word=WordFrequencies.word) AS wordsentiment,date FROM WordFrequencies WHERE doc IN (SELECT doc FROM Sentences WHERE YEAR(date) IN (SELECT DISTINCT EXTRACT(YEAR FROM date) AS Year FROM Documents ORDER BY Year) AND raw REGEXP 'trump') AND partofspeech='ADJ';
I suspect that there is some bit of optimisation that I am unknowingly expecting mysql to have that it doesn't have.
It is easy enough to have a simple python script to do the first query and then insert the result into the second one, but I am very confused about why it is necessary.
SQL has been momentarily defeated, somehow.
Or perhaps it beat me considering I have no idea why these queries are finishing very quickly while the others that got the same data took like a minute each the other day.
Either way, I pulled out the data and we can see trends in the data that aren't visible without processing. So SUCCESS!!!
After redesigning the schema to have a bunch of duplicate information in different tables and rebuilding the whole thing over night the queries are taking much less time.
Like half a second instead of minutes.
So that was worth it.
I still don't really approve of this 'sql' thing.
semi-related note:
The checks in these two if statements
if index % 500:
foo
and
if index % 500 == 0:
foo
are the logical negations of each other, at least in languages where any positive non-zero number evaluates to true.
So remembering that '==0' is important.
I don't exist!
I may be the same inmysocks you see on mastodon.social.... Maybe.
Whatever pronouns you feel like? I would be amused if you alternated.