Figured I should make a Devski post. It's been quite a while since the last one.
I haven't really been less transparent, I just haven't accomplished much.
My life got really busy about six months ago, between a new job, house church, and tai chi classes.
About a month ago, I finished up the code to convert and load user data from SQL. If you remember some instability early one week last month, that was why. Despite being far less data, the user data has about twice the database columns and tables as publications, and proportionally, we saw about twice the issues pop up. Unfortunately, I knew I wouldn't have time to fix them, so I ended up reverting the changes.
The biggest issues we saw involved the hubwheel dots, and the number of people who actually shared the post, not being the same. This was caused by vote data being duplicated in like four places, and the SQL conversion not saving in all the duplicate places correctly and atomically.
If you're beyond, like, a sophomore computer science student, you know duplicate data is capital-E Evil. This kind of thing is why it takes a considerable effort for me not to criticise the Hacker News source (from which Hubski was forked), its language, and the author of both.
So, I reverted the users-in-SQL work I spent several months of free time on, and started working on removing the duplicate vote data. Fortunately, one of those duplicate places was publications, which we converted to SQL last Summer.
I've been at OSCON all week, so I spent most of the last five evenings doing that. It's pretty much de-duplicated now. The only duplication left is 'vote' data, versus 'shared by' data. But all the 'votes' are in one place, as are the 'shares', and just this evening I changed publication 'score' to pull from the 'shared' data. Votes and shares aren't quite the same, so it's going to be a lot more work to combine them, and I don't think it's as big an issue.
If you noticed some slowness in feeds earlier this evening (Thursday), that was me. I made the score load immediately from SQL when needed, and pushed it live. Turns out, the score was being loaded unnecessarily often (like so much other data). I saw the slowness, figured out what functions were being called too much, and did some higher-order-function magic to fix it.
So yeah, lots of vote/share de-duplication. Should be mostly done now.
Next on my list is moving password hashes into SQL, and API app code to log in and create tokens. API logins will then let us make private user data API endpoints, e.g. for a user's personal feed. So, with logins, we'll be able to add API endpoints for all data, one at a time.
Somewhere in the middle of that, I might try to apply the SQL user data migration again, if I'm at a point where I know I'll have time to fix the issues that pop up.
As always, questions welcome.
Don't say this nearly enough, I appreciate your efforts rob. Thanks for putting in so much effort to keep this place going and improving.
It's always painful having to revert to a previous version when things go badly. Do you have a development environment and test suite in place? If you need help with the SQL or just a sounding board I've been developing databases and bespoke systems for a long time. I don't know Arc (and sorry but I'm not willing to spend time on it as I don't think it's going anywhere) but I do know Racket quite well.
Development environment, yes. Test suite, no. Test suite and CI would be ideal, but there are more important things that come first. Thanks for the offer. The SQL and schema side isn't complex or difficult though, it's changing the Arc code to use it. Which sounds easy, but there's duplicate data and unnecessary iteration at every turn. We're also slowly moving Arc code to Racket. The API is on github. We'd certainly welcome any pull requests. Right now, the big thing someone could do, would be to add API endpoints and SQL queries or views to return JSON data for all the information on a given webpage, or section of a page.
I'll have a look at that github repo over the weekend. Is that schema.sql file current? Are you using MySql/Mariadb/Postgres ... something else? edit: OK it's clear after looking at the source code.
Really informative blog post.Really looking forward to read more. Cool. i follow you now dear
Well... it looks like you've given me (and the rest of hubski) a few more reasons to buy you beer or whiskey or rye or whatever you drink... when we meet up in a few weeks with ButterflyEffect and determinedkid.