of course, now that I look at it, I could just leave out the date selection in the subquery because at the moment I am checking all of the years represented anyway so I don't need to filter them...

Show thread

**inmysocks (version: awoo)** @inmysocks@awoo.space · May 02, 2019, 09:55

**inmysocks (version: awoo)** @inmysocks@awoo.space · May 02, 2019, 09:55

May 02, 2019, 09:55

inmysocks (version: awoo) @inmysocks@awoo.space

re: SQL Weirdness

I suspect that the sub-query in a sub-query may be doing it.

Show thread

**inmysocks (version: awoo)** @inmysocks@awoo.space · May 02, 2019, 09:50

**inmysocks (version: awoo)** @inmysocks@awoo.space · May 02, 2019, 09:50

May 02, 2019, 09:50

inmysocks (version: awoo) @inmysocks@awoo.space

SQL Weirdness

In todays 'WAT?' SQL story, this query:

SELECT DISTINCT EXTRACT(YEAR FROM date) AS Year FROM Documents ORDER BY Year;

takes about half a second and returns a list of years.

This query:

SELECT (SELECT Documents.publication FROM Documents WHERE id=WordFrequencies.doc) AS publication,(SELECT Documents.title FROM Documents WHERE id=WordFrequencies.doc) AS headline,(SELECT Documents.sentiment FROM Documents WHERE id=WordFrequencies.doc) AS articlesentiment,doc,word,count,(SELECT Words.sentiment FROM Words WHERE Words.word=WordFrequencies.word) AS wordsentiment,date FROM WordFrequencies WHERE doc IN (SELECT doc FROM Sentences WHERE YEAR(date) IN (2000, 2003, 2004, 2005,2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017) AND raw REGEXP 'trump') AND partofspeech='ADJ';

takes a bit less than 15 seconds, the list of years is the output from the first query.

But this query never seems to end:

SELECT (SELECT Documents.publication FROM Documents WHERE id=WordFrequencies.doc) AS publication,(SELECT Documents.title FROM Documents WHERE id=WordFrequencies.doc) AS headline,(SELECT Documents.sentiment FROM Documents WHERE id=WordFrequencies.doc) AS articlesentiment,doc,word,count,(SELECT Words.sentiment FROM Words WHERE Words.word=WordFrequencies.word) AS wordsentiment,date FROM WordFrequencies WHERE doc IN (SELECT doc FROM Sentences WHERE YEAR(date) IN (SELECT DISTINCT EXTRACT(YEAR FROM date) AS Year FROM Documents ORDER BY Year) AND raw REGEXP 'trump') AND partofspeech='ADJ';

I suspect that there is some bit of optimisation that I am unknowingly expecting mysql to have that it doesn't have.

It is easy enough to have a simple python script to do the first query and then insert the result into the second one, but I am very confused about why it is necessary.

**inmysocks (version: awoo)** @inmysocks@awoo.space · May 02, 2019, 08:48

**inmysocks (version: awoo)** @inmysocks@awoo.space · May 02, 2019, 08:48

May 02, 2019, 08:48

inmysocks (version: awoo) @inmysocks@awoo.space

@theoutrider ah, it's Thursday again.

**inmysocks (version: awoo)** @inmysocks@awoo.space · May 01, 2019, 14:51

**inmysocks (version: awoo)** @inmysocks@awoo.space · May 01, 2019, 14:51

May 01, 2019, 14:51

inmysocks (version: awoo) @inmysocks@awoo.space

@ekaitz_zarraga @eider@mastodon.eus don't worry, I will.

**inmysocks (version: awoo)** @inmysocks@awoo.space · May 01, 2019, 14:46

**inmysocks (version: awoo)** @inmysocks@awoo.space · May 01, 2019, 14:46

May 01, 2019, 14:46

inmysocks (version: awoo) @inmysocks@awoo.space

@ekaitz_zarraga yes, I was using that to pull out the data manually instead of automating it, which worked enough to test everything. Thank you!

**inmysocks (version: awoo)** @inmysocks@awoo.space · May 01, 2019, 14:17

**inmysocks (version: awoo)** @inmysocks@awoo.space · May 01, 2019, 14:17

May 01, 2019, 14:17

inmysocks (version: awoo) @inmysocks@awoo.space

SQL has been momentarily defeated, somehow.

Or perhaps it beat me considering I have no idea why these queries are finishing very quickly while the others that got the same data took like a minute each the other day.

Either way, I pulled out the data and we can see trends in the data that aren't visible without processing. So SUCCESS!!!

**inmysocks (version: awoo)** @inmysocks@awoo.space · May 01, 2019, 10:37

**inmysocks (version: awoo)** @inmysocks@awoo.space · May 01, 2019, 10:37

May 01, 2019, 10:37

inmysocks (version: awoo) @inmysocks@awoo.space

@theoutrider I was going to ask what you are talking about, but then I realised that no explanation could improve upon it.

**inmysocks (version: awoo)** @inmysocks@awoo.space · Apr 26, 2019, 16:57

**inmysocks (version: awoo)** @inmysocks@awoo.space · Apr 26, 2019, 16:57

Apr 26, 2019, 16:57

inmysocks (version: awoo) @inmysocks@awoo.space

@suetanvil@mastodon.technology I have started using it as a filesystem and am doing all of the interesting bits in python.

You give good advice/insight. Thank you.

**inmysocks (version: awoo)** @inmysocks@awoo.space · Apr 26, 2019, 16:55

**inmysocks (version: awoo)** @inmysocks@awoo.space · Apr 26, 2019, 16:55

Apr 26, 2019, 16:55

inmysocks (version: awoo) @inmysocks@awoo.space

@ekaitz_zarraga I was trying to use stuff like this:

SELECT DISTINCT EXTRACT(YEAR FROM date) AS Year, (SELECT COUNT(id) FROM Documents WHERE EXTRACT(YEAR FROM date)=Year) FROM Documents WHERE EXTRACT(YEAR FROM date)=Year;

It may be terrible, I am learning just how little I know about sql queries.

I gave up and started just using it as a data store to pull things from for processing in python, I have had much more success with that.

**inmysocks (version: awoo)** @inmysocks@awoo.space · Apr 26, 2019, 09:50

**inmysocks (version: awoo)** @inmysocks@awoo.space · Apr 26, 2019, 09:50

Apr 26, 2019, 09:50

inmysocks (version: awoo) @inmysocks@awoo.space

So screw doing complex sql queries, I am going to use python scripts to combine results for anything more complex than using WHERE statements on a single table.

**inmysocks (version: awoo)** @inmysocks@awoo.space · Apr 26, 2019, 09:26

**inmysocks (version: awoo)** @inmysocks@awoo.space · Apr 26, 2019, 09:26

Apr 26, 2019, 09:26

inmysocks (version: awoo) @inmysocks@awoo.space

Sql is confusing.

SELECT COUNT(id) FROM Documents WHERE EXTRACT(YEAR FROM date)=2000;

runs in 0.03 seconds

SELECT DISTINCT EXTRACT(YEAR FROM date) AS Year FROM Documents ORDER BY Year;

runs in about 0.1 second

Every way I have come up with to combine the two takes minutes.

**inmysocks (version: awoo)** @inmysocks@awoo.space · Apr 26, 2019, 08:51

**inmysocks (version: awoo)** @inmysocks@awoo.space · Apr 26, 2019, 08:51

Apr 26, 2019, 08:51

inmysocks (version: awoo) @inmysocks@awoo.space

After redesigning the schema to have a bunch of duplicate information in different tables and rebuilding the whole thing over night the queries are taking much less time.
Like half a second instead of minutes.

So that was worth it.

I still don't really approve of this 'sql' thing.

Show thread

**inmysocks (version: awoo)** @inmysocks@awoo.space · Apr 25, 2019, 19:30

**inmysocks (version: awoo)** @inmysocks@awoo.space · Apr 25, 2019, 19:30

Apr 25, 2019, 19:30

inmysocks (version: awoo) @inmysocks@awoo.space

@zatnosk I am moving in that direction.

**inmysocks (version: awoo)** @inmysocks@awoo.space · Apr 25, 2019, 17:50

**inmysocks (version: awoo)** @inmysocks@awoo.space · Apr 25, 2019, 17:50

Apr 25, 2019, 17:50

inmysocks (version: awoo) @inmysocks@awoo.space

@zatnosk my old CS professor always said that if you had the same piece of data in two places in your database you were doing something wrong.

But I guess strange is a lot cheaper now than it used to be.

**inmysocks (version: awoo)** @inmysocks@awoo.space · Apr 25, 2019, 17:34

**inmysocks (version: awoo)** @inmysocks@awoo.space · Apr 25, 2019, 17:34

Apr 25, 2019, 17:34

inmysocks (version: awoo) @inmysocks@awoo.space

I am being a terrible database person and duplicating data all over the place because these queries are taking way too long.

Who cares if I can have the date for all of these entries in a single place, storage is cheap and I am impatient.

**inmysocks (version: awoo)** @inmysocks@awoo.space · Apr 25, 2019, 11:55

**inmysocks (version: awoo)** @inmysocks@awoo.space · Apr 25, 2019, 11:55

Apr 25, 2019, 11:55

inmysocks (version: awoo) @inmysocks@awoo.space

My finely crafted SQL query has been running for about 3 minutes so far.

My previous record for how long a query would take is 50 seconds.

**inmysocks (version: awoo)** @inmysocks@awoo.space · Apr 25, 2019, 11:54

**inmysocks (version: awoo)** @inmysocks@awoo.space · Apr 25, 2019, 11:54

Apr 25, 2019, 11:54

inmysocks (version: awoo) @inmysocks@awoo.space

re: Florian no

@theoutrider I would really rather not consider that.

**inmysocks (version: awoo)** @inmysocks@awoo.space · Apr 23, 2019, 22:26

**inmysocks (version: awoo)** @inmysocks@awoo.space · Apr 23, 2019, 22:26

Apr 23, 2019, 22:26

inmysocks (version: awoo) @inmysocks@awoo.space

I made the mistake of asking if anyone wanted to help with finding bugs in Bob.

Next time I ask for something like this from people who use my software just punch me, it would be less painful.

Show older

I don't exist!

I may be the same inmysocks you see on mastodon.social.... Maybe.

Whatever pronouns you feel like? I would be amused if you alternated.

Joined Apr 2017

Awoo.space is a Mastodon instance where members can rely on a team of moderators to help resolve conflict, and limits federation with other instances using a specific access list to minimize abuse.

While mature content is allowed here, we strongly believe in being able to choose to engage with content on your own terms, so please make sure to put mature and potentially sensitive content behind the CW feature with enough description that people know what it's about.

Before signing up, please read our community guidelines. While it's a very broad swath of topics it covers, please do your best! We believe that as long as you're putting forth genuine effort to limit harm you might cause – even if you haven't read the document – you'll be okay!

inmysocks (version: awoo) @inmysocks@awoo.space

Trending now

Resources

Developers

What is Mastodon?

awoo.space

More…