Medium Community

Medium publishes lots and lots of articles on a daily basis, but total quality across all articles is a constant and that’s a huge problem for them, or it should be.

First red flag: to contribute as a reader, i.e. to comment on an article, you can’t do it right away on the web but only by means of their mobile app.

I installed their app some time ago, and I routinely comment on those bits I think I can contribute to. Just yesterday I submitted a one-liner to replace a coding example of 15 lines.

Second red flag: there’s no button anywhere on the app that you can press to add a comment.

In fact, after much swear, maybe you happen to accidentally highlight a word of the article, and a little menu pops up, with the option you already thought you were tricked with for installing their otherwise useless app.

Third red flag: you need to highlight the text you want to comment about (which is a nice-to-have feature), but then the author can freely reword that piece of text and your comment vanishes.

After getting a thankful response from the author of the article about my one-liner, I went back to the published text, and the only change was from “this is one of the cleanest methods” to “there are many methods”, but no trace of my comment and code.

So, why did I bother in the first place if my contribution gets so easily lost?

Of course I get why he did that, and even if the net result is that his article keeps on low quality, my point is not that. What I think is wrong here is the medium of Medium, which simply makes comments hard to submit and ultimately useless.

Software Architecture

I’ve performed many roles in my career, but software architecture is what I like most, and I’m pretty good at it.

My recipe for architecting a solution is this. First I study the current processes: I often need to feel the pain of working with them daily. So I work, learn, and discover improvable areas. Then I envision a new architecture that organically fits every piece together, allows simpler processes, and reuses as many building blocks as possible. Then I design a migration path, a chain of reachable steps to perform one at a time, without disrupting production.

Software architectures don’t last long without alterations, not even the best ones, because companies grow along many dimensions. It’s only a matter of time for those changes to go beyond what the architect foresaw. And technical debt ensues.

Software engineers and companies alike need to understand that embracing change (agile anyone?) includes architecting and re-architecting as often as needed.

Web Duplicates


A logic test from a job selection process:

Web search engines A and B each crawl a random subset of the same size of the Web. Some of the pages crawled are duplicates – exact textual copies of each other at different URLs. Assume that duplicates are distributed uniformly amongst the pages crawled by A and B. Further, assume that a duplicate is a page that has exactly two copies – no pages have more than two copies. A indexes pages without duplicate elimination whereas B indexes only one copy of each duplicate page. The two random subsets have the same size before duplicate elimination. If, 45% of A’s indexed URLs are present in B’s index, while 50% of B’s indexed URLs are present in A’s index, what fraction of the Web consists of pages that do not have a duplicate?

Solution

Be:

  • C_X the number of elements of X
    • X = { 1, 2, 3, 2 } -> C_X = 4
  • D_X the number of elements of X with a duplicate in X:
    • X = { 1, 2, 3, 2 } -> D_X = 1
  • N_X the number of elements of X without a duplicate in X
    • X = { 1, 2, 3, 2 } -> N_X = 2
  • XY the union of X and Y

  • X1 the set X before eliminating duplicates

  • X2 the set X after eliminating duplicates


Data:

  1. Web search engines A and B each crawl a random subset of the same size of the Web
    • C_A1 + C_B1 = C_W
  2. The two random subsets have the same size before duplicate elimination
    • C_A1 = C_B1
  3. Assume that duplicates are distributed uniformly amongst the pages crawled by A and B
    • D_A1 = D_B1
  4. Further, assume that a duplicate is a page that has exactly two copies – no pages have more than two copies
    • C_A1 = 2 * D_A1 + D_AB + N_A1
    • C_B1 = 2 * D_B1 + D_AB + N_B1
  5. A indexes pages without duplicate elimination whereas B indexes only one copy of each duplicate page
    • C_A2 = C_A1
    • C_B2 = C_B1 – D_B1
  6. 45% of A’s indexed URLs are present in B’s index, while 50% of B’s indexed URLs are present in A’s index
    • D_AB (in B2) = 45/100 * C_A2
    • D_AB (in A2) = 50/100 * C_B2
  7. N_AB / C_W = ?


Example (without taking into account Data:6):

  • A1 = 1 1 3 3 5 5 7 9 11 13 15

  • B1 = 2 2 4 4 6 6 7 9 10 12 14

  • A2 = A1

  • B2 = 2 4 6 7 9 10 12 14


Solution:

  1. N_AB = N_A1 + N_B1 — by definition.

  2. N_A1 = C_A1 – 2 * D_A1 – D_AB — because of Data:4.

  3. N_B1 = C_B1 – 2 * D_B1 – D_AB — because of Data:4.

  4. N_B1 = C_A1 – 2 * D_A1 – D_AB — because of Solution:3 + Data:2 + Data:3

  5. N_AB = 2 * C_A1 – 4 * D_A1 – 2 * D_AB — because of Solution:1 + Solution:2 + Solution:4

  6. C_B2 = C_A1 – D_A1 — because of Data:5 + Data:2 + Data:3

  7. 45/100 * C_A2 = 50/100 * C_B2 — because of Data:6

  8. 45/100 * C_A1 = 50/100 * ( C_B1 – D_B1 ) — because of Solution:7 + Data:5

  9. 45/100 * C_A1 = 50/100 * ( C_A1 – D_A1 ) — because of Data:2 + Data:3

  10. D_A1 = 10/100 * C_A1 — because of Solution:9

  11. N_AB = 200/100 * C_A1 – 4 * 10/100 * C_A1 – 2 * 45/100 * C_A1 — because of Solution:5 + Solution:10 + Data:6 + Data:5

  12. N_AB = 70/100 * C_A1 — because of Solution:11

  13. N_AB = 70/100 * 50/100 * C_W — because of Solution:12 + Data:1 + Data:2

  14. N_AB / C_W === 35% — because of Solution:13