Monday, 22 December 2014
Sunday, 7 December 2014
Quora haqathon 2014
Quora haqathon today, from 11am to 7pm - Pacific standard time! Features 9 problems mixed with tradition algorithm tasks, machine learning and system programming tasks. Link to site.
Ontology
Linearize the tree - each query reduces to "in question q[x...y], how many of them start with prefix p?". Offline query + Partial Sum+ Trie. Linear time.
Wombats
Maximum closure.
Labeler
Use training set to calculate \(\text{Pr}[q_i \in t_k | w_j \in q_i ] \) for all question \(q_i\), topic \(t_k\) and word \(w_j\). Improve using bi-gram.
Duplicate
Use \( \text{Pr}[w \in \text{question_text}_i \text{ and } w \notin \text{question_text}_j ]\) as classifying criteria - 60% accuracy. Consider also \( \frac{\text{view_count}_i }{ \text{view_count}_j } \) improved it to 70%.
Ontology
Linearize the tree - each query reduces to "in question q[x...y], how many of them start with prefix p?". Offline query + Partial Sum
Wombats
Maximum closure.
Labeler
Use training set to calculate \(\text{Pr}[q_i \in t_k | w_j \in q_i ] \) for all question \(q_i\), topic \(t_k\) and word \(w_j\). Improve using bi-gram.
Duplicate
Use \( \text{Pr}[w \in \text{question_text}_i \text{ and } w \notin \text{question_text}_j ]\) as classifying criteria - 60% accuracy. Consider also \( \frac{\text{view_count}_i }{ \text{view_count}_j } \) improved it to 70%.
Subscribe to:
Posts (Atom)