Hubski development progress update: internal api good, Arc bad, external api sooner.

a thoughtful web.

Good ideas and conversation. No ads, no tracking. Login or Take a Tour!

Hubski development progress update: internal api good, Arc bad, external api sooner. · 11

rob05c · 3387 days ago

Thought I’d let you all know where I am. The good news is, I have the internal API basically complete (for publications) and working, and tested relatively thoroughly. The bad news is, I’m having trouble interfacing it with Arc. Some things are unusably slow, sometimes it doesn’t work at all. Part of the problem is that the main Arc app assumes publication access is cheap, and requests far more data than it needs. That needs to change, but that’s a big project I don’t want to get into yet.

So, my plan has changed, to make the Arc app call SQL directly.

Does this mean my work was wasted? No. The code to convert files to SQL is still needed, and the internal API work will directly translate to the external API. The SQL → JSON → s‑expression functions I wrote for the API are also useful for reading directly from SQL.

Why use an internal API at all? Service-oriented architecture has myriad benefits. Primarily, scalability, robustness, and portability. I’d be happy to discuss it in more detail, or you can ask Google. So, the current plan is to read and write from SQL directly, because Arc. Later, when the entire app is converted to Racket, we can re-apply the internal API and work on making everything more service-oriented.

More good news: this probably means the external API will be done faster.

My current plan:

⑴ Implement reading directly from SQL, into publications stored in memory.

• we want to make all data requests come immediately from the source (SQL/API), and then later use a caching solution like memcached if necessary. But for now, because the app requests more data than it needs, continuing to store in memory avoids performance issues.

• after the livestream, I now have the main app loading SQL, but it’s slow. I don’t know why. Current guesses: my slow network; poor database indices; simply too much data, need a single bulk query; Arc–Racket communication is slow. I’m hoping it’s the first. The last is worst-case scenario. in that case, this is going to be hard, because Arc itself has no database functions. Bulk query is ugly, but doable. We’ll see.

⑵ Perform the conversion to SQL

• the code for this is done, it just needs run on the live data, once the app reading from SQL is implemented.

⑶ Add functions to the internal API, for external requests

⑷ Various other things necessary for a complete external API, like login for private mail.

⑸ Fix Arc app to only request the data it needs from SQL, when it needs it.

⑹ Change Arc app to not cache anything in memory.

⑺ Convert Arc app to Racket, one function at a time.

⑻ Change converted main Racket app to use the internal API.

Right now, converting the main app to SQL is my #1 priority. That opens countless doors, not least of which the external API. Steps 3–7 may occur in any order. But I’m likely to prioritise the external API. Everyone wants it, including me.

tweet · print · htmlmarkup tips · 0

user-inactivated · 3386 days ago · link ·

I was looking at this site last night, and realized: this website is the result of human beings who've dedicated money, skills, time to making it happen. I continued to look at the site, realizing how well designed it is, the nice little user-friendly tricks, all the tiny little details that may seem simple to the user, but I know took creativity and skill to make happen.

So: thank you all.

+discuss+discuss

–

rob05c · 3386 days ago · link ·

mk thenewgreen insomniasexx forwardslash

+discuss+discuss

StJohn · 3387 days ago · link ·

Count me in if you need alpha/beta testers for the external API! I've hacked together a screen-scraping script to get comments off Hubski, but I'd love to switch that over to use an API instead.

+discuss+discuss

ll · 3386 days ago · link ·

I am really interested about service oriented architecture. It seems especially useful for my current project use case. Implementing an analytics API for the customer will be served via JSON to a mobile app and a client that pulls json from the website will be really clean.

It's crazy how you're doing so much yourself. What do you do at your pay-the-bills job? You seem to be an excellent developer.

+discuss+discuss

–

rob05c · 3386 days ago · link ·

Thanks. I'm a software engineer; I currently work in the power engineering industry, doing application and systems level development, mostly in C++. I taught myself functional programming. The company I work for is in too small a city for the talent pool to use functional languages. They have been moving toward SOA, though.

+discuss+discuss

–

ll · 3386 days ago · link ·

I can absolutely relate on the small talent pool problem. I'm part of a startup located in a college town. The university is pretty good, and has some pretty good Computer Science/Computer Engineering classes, but most of the students are not going to be great employees from the start.

It seems there is a whole trend of universities trying to get their students to be great interviewees so that they can work for the Googles and Facebooks of today. The kids will know the theory of the fundamentals(complexity, general OOP design, information theory), but will lack any skill to work on real projects.

Whenever I interview a potential employee that's still a student I notice the trend of students not really being ready to work, but rather the students being ready to go through more training at a company. It's like the 4 years of university are barely enough to get them to write a hello world in Java or a merge sort in Python from memory and fake their way through pseudo code of a Red-Black Tree, and then some big name company will pick them up and finish their training while having them work on a fancy button or keyboard shortcut on a page.

+discuss+discuss

paxprose · 3387 days ago · link ·

I wish there were two of you so one could keep an adequate log of all the shit you have to do throughout your conversion process. I'm sad to have missed your livestream.

Enhancing the performance of your queries is a lot of guesswork without the right tools. I need to hop in feet first in some PostgreSQL but that sweet Cassandra is calling my name.

Does the Arc SQL adapter support the use of stored procedures? Have you tried wrapping a view around one of the bulkier tables to filter the results a little and maybe weed out some unnecessary/problem columns? Is your dataset normalized?

I know a bunch of questions isn't going to help; but holy snark I love me some DB discussion.

+discuss+discuss

–

rob05c · 3387 days ago · link ·

Is your dataset normalized?

It's BCNF, if you consider processed data unique, and consider null a value. But it stores processed data, which ought to be computed at runtime. For example, text, md, searchtext. The application has to be changed to fix that. That will come later.

weed out some unnecessary/problem columns?

There are no unnecessary columns at this point. We just converted the data to SQL. I did try removing the massive searchtext table from the query. Didn't help.

Does the Arc SQL adapter support the use of stored procedures?

No, and I will avoid them, unless absolutely necessary for performance. Code should be in the application.

Right now, it's looking like it just needs to do bulk loading. It's not the network. When the app starts, we load ~250k publications. With 13 tables, that's 3 million queries. Bulk loading makes it 14 queries. Even for several hundred megabytes of data, that many queries far outweighs the cost of the data itself. I've faced this problem professionally before, and seen a 20× speedup in a similar situation.

It also took me 3 months to do a large OOP system. This will not take 3 days. LISP ≫ C++.

+discuss+discuss

–

paxprose · 3387 days ago · link ·

I can't wait to see how you finalize your approach. I have one co-worker nodding his head violently at the prospect that business logic should be handled by the application and enhanced by interfaces as needed while another swears by handling everything in the db.

I lean toward the latter; but I know nothing.

Thanks for the updates, man. Godspeed.

+discuss+discuss

–

rob05c · 3387 days ago · link ·

another swears by handling everything in the db.

Codinghorror summarises my feelings well.

Have you ever worked on a system where someone decreed* that all database calls must be Stored Procedures, and SQL is strictly verboten? I have

I also have, and his experience and conclusion both parallel mine.

+discuss+discuss

user-inactivated · 3385 days ago · link ·

main Arc app assumes publication access is cheap, and requests far more data than it needs

I like how you assume it "assumes" things. It's an endearing behavior programmers have, to see their creation as something that's thinking rather than merely executing lines of code. I'm not saying it as a snide, for what's human thinking but execution of complex search and save algorithms?