@mikiobraun

@mikiobraun Twitter Memorial

19,827 tweets · 2008–2024 · 1046 threads

2014

RT @GuyInYourMFA: My mother doesn't seem to understand that being an experimental novelist IS my "day job."
Is this always the same "Top 15 predictions for Data Science" article? Because anything else would be crazy.
Replying to @harper
cyber attack. On the cloud. Whole new world of mind boggling metaphors.
Replying to @harper
there was a Germany comedy group who would mispronounce cyber as "saiban". That kinda stuck in my head ever since.
Replying to @squarecog
@aphyr only this way we'll be able to ensure fair distribution of seats for all.
Replying to @hintjens
of course. The money. Is it that easy to make allies these days? Obviously.
Replying to @squarecog
@aphyr let's disrupt this! This market has been over-regulated for much too long!
RT @seanjtaylor: Today I updated @drewconway's data science Venn diagram for 2014. http://t.co/F3due1yXbx
Media
Replying to @noelwelsh
well, my kids don't do that. Which only means I spent up to 20 minutes each morning to get them out of bed.
And that was the last time I had to get up at 6am for the next two weeks. #yeah 🎄🎅🎆🎍
@semibogan now that you mention it, I've met enough people who believe the solution to organisational or com. issues is technology.
@semibogan yeah. So many good ideas. Brrr. Luckily I haven't met someone like that in person yet.
@semibogan as someone said on me yesterday "Forget Big Data or IoT. If there is one mega-trend, it's the dissolution of privacy"
Replying to @mfcabrera
HU Berlin: Marius Kloft, Potsdam: Tobias Scheffer, Gilles Blanchard.
Replying to @mfcabrera
TU Berlin: Klaus Robert Müller, Klaus Obermayer, Manfred Opper (ML), Volker Markl (IR/databases)
Replying to @munterluggauer
that would be great actually. Still, heard of many application interviews for professors where it still was important.
RT @syhw: @mikiobraun Yes, the chairs announced they'd cut at 8% next year (as a joke) when announcing the NIPS experiment results! :-)
Replying to @syhw
Ah I see. Well, 8% is probably too little, right? In any case, given the personal importance of getting published this is pretty bad.
RT @syhw: @mikiobraun That's not true! If the acceptance rate was at 8%, the reviewers would almost all agree. Papers in the 8-22.5% cut ar…
Replying to @munterluggauer
I guess people who've been to workshop know that it's a contribution, but I doubt some professor hiring committee will agree
RT @AaaLee: Half the papers at #NIPS2014 would be rejected if the review process were rerun. Awesome explanation by @ecprice http://t.co/f7
RT @vsbuffalo: John von Neumann's quote about the merit of empirical ideas in mathematics is terrific: http://t.co/KRL2Segh31
Media
RT @lawrennd: Markov Chain Montreal Canadiens at #nips2014 Variational Workshop. One of best workshops I've attended! http://t.co/0d1aoe17Mh
Media
Replying to @ChrisDiehl
I didn't attend, those retweet are all I got, too. ;) That Google paper on technical debt of ML looked pretty interesting.
RT @JustinBasilico: Slides from my talk on building machine learning software at #Netflix from the #NIPS2014 SW4ML workshop http://t.co/QT9
RT @jaumebp: "Every single feature in your model is an opportunity for disaster" :p Technical debt of machine learning talk #NIPS2014
RT @jaumebp: Now "Machine Learning: The High-Interest Credit Card of Technical Debt" by Google at the software engineering for ML workshop …
RT @KyleCranmer: Pierre Baldi talking about impressive results of Deep Learning in HEP @DanielWhiteson @PeterSadowski #NIPS2014 http://t.co…
RT @jaumebp: Happy to hear about all the efforts on model interpretability in the facebook machine learning platform #NIPS2014
RT @DeepLearningHub: Neural Networks with explicit memory from Facebook's team of Jason Weston, Sumit Chopra, Antoine Bordes http://t.co/q
RT @suchisaria: Student attendance at #NIPS2014 has almost doubled since 2011 2014: 1017 2013: 784 2012: 621 2011: 573 Via @lawrennd and…
RT @MrChrisJohnson: "This is how you win ML competitions: you take other peoples' work and ensemble them together." Vitaly Kuznetsov #NIPS2
RT @KyleCranmer: Winners and organizers of #HiggsML @kaggle challenge at #NIPS2014 @GaborMelis @TimSalimans @balazskegl http://t.co/kwTuR1k
RT @lzamparo: Hannes Breitschneider's take home message: deep learning on DNA sequences works. #NIPS2014
RT @eelstretching: Andreas Müller actually talking about software engineering at the software engineering workshop. More, please #NIPS2014
RT @AnaderiRu: Moment of glory for winners of Highs Boson challenge at #NIPS2014 http://t.co/vsuew6kMGL
Media
RT @iskander: Last sessions of the #NIPS2014 semantics workshop was awesome (and full of heated controversy). Do recurrent deep models capt…
RT @mohammad_sabah: Awesome experience at the Deep Learning workshop in #NIPS2014, interacting with some of the best ML thought leaders & g…
RT @lzamparo: Ganguli: non-linear forward propagation through deep networks is like chaos in phase space. #NIPS2014
RT @wendykan: "computer vision is solved, right? no, this pic w/ auto labeling, humans get so much more info than labeling" -Olga Russakovs…
RT @_onionesque: "Again please note that the algorithm that we use (REINFORCE) is very old, by Williams, from 1992." #NIPS2014
RT @jaumebp: Ben Hammer talking about the Kaggle platform at the Challenges of Machine Learning workshop #NIPS2014
RT @suchisaria: Only standing room in the ML 4 healthcare/genomics workshop! Big diff from 5 yrs ago. #NIPS2014 #ML4CHG My talk on comp phe…
RT @haldaume3: really liked @jacobeisenstein 's NLP+ML talk at #nips2014 -- rah rah for discourse processing! didn't tweet during it cuz it…
RT @MrChrisJohnson: "Ensembling many models generally improves accuracy significantly but with diminishing returns after 5 to 10 models" Je…
RT @RandomlyWalking: #DeepLearning folks should check our workshop poster on scheduled demonising autoencoders today https://t.co/OqxB7skMV
RT @RandomlyWalking: #NIPS2014 time travellers should check out our poster yesterday on super efficient geometric Hamiltonian Monte Carlo h…
RT @AndrewYNg: Baidu's Bryan Catanzaro, creator of cuDNN speaking #NIPS2014. Fast deep learning+GPU primitives! http://t.co/n2z3YN1dsl
Media
RT @Reza_Zadeh: Jeff Dean telling us how @Google trains their neural networks #NIPS2014 http://t.co/hbGHUQReGL
Media
RT @Reza_Zadeh: Distributed machine learning workshop in full swing! Packed house! #NIPS2014 http://t.co/EbQOhH0OMt
Media
RT @iamed2: Distributed workshop was about 1/2 full for first session. Second session has Deep Learning, so room is packed. #NIPS2014
RT @gzco: Bengio, Ng, Hinton & Blunsom this AM @ #nips2014 deep learning wkshp. Lock the doors for 10 yrs & we'd have Anathem http://t.co/t
RT @mohammad_sabah: #NIPS2014 Awesome talk by Geoffrey Hinton on using specialized NNs for parallel training and smaller models for deploym…
RT @MrChrisJohnson: Its like I'm at a music festival with my favorite bands playing at the same time but I still need to choose a stage. #N
RT @MrChrisJohnson: "For deep representations of longer sentences it doesn't make sense to pack down to a single max value so use k-max poo…
RT @robot_MD: Deep learning workshop has more attendees than NIPS in 2001. Packed room of 600+! #NIPS2014
RT @naga86: Spark 1.1 considerably improves over 1.0 in many ML tasks. #NIPS2014 #DistributedML
RT @lukasvermeer: I don't always take pictures of posters, but when I do, I post them online accompanied by a witty comment. #NIPS2014 http…
RT @t3kcit: Alekh Agarwal presented a faster lasvm - style kernel svm by dropping the bias term. Implemented in vw. #NIPS2014 http://t.co/F
Replying to @RandomlyWalking
only if you manage to do a JavaScript injection in your profile description I guess... ;)
Replying to @RandomlyWalking
also hard to summarize in 140 characters. Is someone working on LaTeX integration in Twitter?
Replying to @superglaze
OK, OK, skip national television, let's directly go international! ;)
RT @nsaphra: In power iteration, project principal components onto a cone & normalize to enforce any convex constraints. #NIPS2014 http://t…
Replying to @RandomlyWalking
interesting. You're saying it takes time to get through a paper before it can be endorsed on Twitter?
Replying to @alung
I only saw one tweet. But during hopfield's talk there were a dozen or so.
RT @thektokolwiek: Simple, right? #NIPS2014 solved all problems ever :D http://t.co/Nr8xzPmL5L
Media
RT @MrChrisJohnson: RNN model for high-res images adaptively selects regions instead of full image. Scales better than ConvNets #NIPS2014 h…
RT @MrChrisJohnson: Mallows model for ranking keeps popping up at #NIPS2014. Here's the original 1957 paper "Non Null Ranking Models" http:…
RT @RandomlyWalking: Kearns reminisces about time as NIPS program chair 1997 "last PC to receive hard copies in my office" #NIPS2014
RT @thektokolwiek: NIPS history in pictures during the Posner Lecture by Michael Kearns about Games, Networks, and People #NIPS2014
And that concludes my morning #NIPS2014 retweet binge. Also following practically everyone who tweets with that hashtag.
RT @MrChrisJohnson: "Pianos produce consistent spectral and temporal responses from each key but every piano is diff so must be learned uns…
RT @RandomlyWalking: Or if not at #NIPS2014 (or dislike talking to us) see our work on energy disaggregation and prior knowledge here http:…
RT @RandomlyWalking: "The real star of this work is the Gumbel distribution." Maddison et al award winning paper at #NIPS2014
RT @maierhein: John Hopfield #NIPS2014: Great talk without slides. PowerPoint crash in 45min talk on emergent computational dynamics http:/…
RT @lzamparo: Hopfield decides to wing it after repeated PowerPoint failure. Transition was seamless. Impressive. #NIPS2014
RT @RandomlyWalking: Networks to recognize dynamic patterns. Firing "bump" progresses along topology as pattern progresses. Hopfield Posner…
RT @sabraham: Me: conference. Canadian Border Guard: #nips2014 ? Me: yeah CBG: Sorry, we've already hit our nerd quota for the year
RT @johnplattml: Nice #NIPS2014 talk: optimal set in k-arm bandit that fulfills combinatorial constraints, in offline test. http://t.co/VxB
Replying to @RandomlyWalking
I can imagine. It's unclear why this is still an issue in 2014. Keep up the good work. I can only attend from afar...
DHL guy today "So you're Mr. Braun who is ordering all that stuff!" Dude, do you even eCommerce? You cover the last mile. THIS IS YOUR FUTUR
RT @balazskegl: John Carlos Baez: "ML-based and physics-based models run neck and neck in El Nino prediction" #NIPS2014
RT @_onionesque: Pretty much the only reason I still use a G+ account :P (John Baez) just starting to talk! #NIPS2014 http://t.co/MMBO0XYaph
Media
RT @thektokolwiek: Looks like #NIPS2014 changes into a climate conference - now we need some deniers to crush ;)
Replying to @pjozefak
well you can always sell the hope that this time you might move up to the inner circle.
RT @shima__shima: #nips2014 a new exponential growth law of # of attendees http://t.co/Nhlu9Nw1K3
Media
RT @RandomlyWalking: Fragile co-adaptation in deep learning. Sometimes fixing early layers is bad. Yosinski et al. #NIPS2014
RT @_onionesque: Great talk by Ilya Sutskever on the already well known paper on sequence to sequence learning #NIPS2014 http://t.co/VxCH65
RT @philipquick: Attending #NIPS2014 in Montreal. More deep learning papers than you can shake a stick at.
RT @JesseDodge: "The deep learning hypothesis: Anything humans can do in .1 seconds, 10 layer deep neutral networks can do." #NIPS2014 #hed
RT @RandomlyWalking: Fascinating #NIPS2014 invited talk about the grid, power systems, and opportunites for data science by Arunava Majumda…
RT @RandomlyWalking: "People say solar generation isn't predictable. Utility scale solar is very predictable. They put it in the desert." M…
RT @thektokolwiek: Cool talk by Arun Majumbar about shifting paradigms in electricity networks #NIPS2014
RT @gxr: Awesome talk on large-scale maximum inner product search via asymmetric locality sensitive hashing. Simple and much more accurate.…
RT @RandomlyWalking: Yurii Nesterov talking about sparse updating strategy for convex functions with sparse gradients at #NIPS2014
Pleasantly surprised at very effectively organized workgroup discussions at the EU workshop.
Replying to @peter_c_william
sure sounds like a lot. Probably depends on what qualifies as big data relates. DBAs? They were probably very inclusive.
At EU workshop on big data skills. Jacques Bughin from McKinsey says in 70% of use cases, big data makes a significant impact.
Replying to @nfusi
always. Have they also tried turning of the air condition or raised the temperature?
RT @seanmcgregor: 25.9 percent of accepted papers at #NIPS2014 were not accepted in a controlled experiment. This is near the theoretical m…
RT @acornthea: on ai being dangerous... 'maybe they're expecting too much of us' =) #NIPS2014
RT @mohammad_sabah: #NIPS2014 In 2001, when NIPS moved to Vancouver, there were 600 participants. In 2014, Deep Learning Workshop has over …
RT @evelgab: #NIPS2014 is growing super-exponentially! Over 2200 people this year. #machinelearning for world domination
Replying to @mdreid
what? How? Isn't the urge to tweet so much bigger when it's engaging and interesting??
Man, only about 0.1 tweets per minutes from #NIPS2014. Am I following the wrong hashtag or what?
Replying to @pavlobaron
it sure does, doesn't it ;) but there also be someone from the KIT and few other research institutions. Still, interesting mix.
RT @lzamparo: @mikiobraun I'm expecting Larry & Sergei to arrive by hydrofoil #NIPS2014
That one time where I worked on a ML lib in JRuby/Java/jblas doesn't seem so crazy given the current Spark Scala/Java/Python/Py4J stack ;)
Turning off all notification sounds for chat and social apps did a lot to improve my peace of mind. The notification LED is more than enough
Not sure I like all the connotations of the term "embarrassingly parallel". Because that's good, right? Because it's simple to scale, right?
A friend of mine admitted he actually believed Docker was software for containers on ships. #whenmarketingistoostrong
. @gschmutz is doing a good job pointing out the configuration and reliability issues frameworks are saving us from having to deal with.
Replying to @purbon
@pautasso Hehe. To be fair, I think the reason was that Javascript runs almost everywhere.
Stream Processing framework for Javascript and node.js by @pautasso. Because why not? ;)
Replying to @louisdorard
not sure. I remember most examples were more on the level of first writing tests for stuff like SVMs. Hm need to check again.
Replying to @louisdorard
well, I once had half a ML library based on JRuby and jblas, too ;)
Interesting talk by @jpcik on RDF stream processing. I find many of my own ideas there, only formalized better ;)
Replying to @louisdorard
for some reason he chose ruby, which no one else uses, though. Also not sure if he used TDD for high level stuff (evaluation)
Replying to @louisdorard
I skimmed it. Looked interesting. He first explains a TDD approach and then implements a number of basic ML algorithms
Hello Bern! Workshop in a lounge of a huge soccer stadium. That's a first. ;) First slide I see is about the Lambda architecture, of course.
Replying to @SwiftOnSecurity
@alansaid aw, c'mon. We giggle. Occasionally. When no one sees it.
Considering putting together a master slide deck of all my realtime data analysis slides so I can just improvise talks in realtime.
Replying to @beaucronin
yup. Kinda reassuring that all the marketing budget and brand value in the world can't make people buy overpriced products.
You know you're getting old when doctors start using the phrase "a person your age" a lot.
Second half of our data science workshop. Great crowd, everyone very focused. #data2day
RT @stereimann: nice #datascience lab by @mikiobraun, Jan Müller & Paul von Bünau @ #data2day: detect handwritten numbers w/ python http://…
. @hintjens on starting an OSS: "make an empty project, write a README with the goal, then tweet that this goal cannot be reached"
@TLDR_App hey guys, what happened to your service. Looks like you stopped at the end of April...
Not sure if I got that right ;) "The notion of Big Data as centralized data is broken" (@hintjens) #data2day
Interesting case study by Volker Janz from @innogames on their event tracking infrastructure. REST + kestrel + Storm + Hadoop/Spark/Hive.
Replying to @eoinhurrell
@UltimateHurl @JanSimonG you mean for dealing with person sensitive data?
Refreshing perspective on security and Big Data to become more aware of the sensitivity of data you're dealing with by @JanSimonG #data2day
"Big Data and security actually go well together because they are both datacentric." (@JanSimonG) #data2day
Replying to @pavlobaron
well, my professor at TU has actually started to say that he has been doing Big Data over since '96. ;)
Robust stability is also a security issue because availability is also a security concern. #data2day
Interesting, @JanSimonG stresses that security is not just about encryption, but first you need to understand what needs to be secure.
Replying to @nraychaudhuri
it does. Smaller than last week, of course, but a very interesting mix.
It seems I confused people in my talk because I would keep calling it "machine learning" instead of "data science." 😜
Good morning TXL 😪. It's kinda scary how alive Berlin is already at 5am. This can't be healthy.
There's nothing like when the different narratives marketing has created suddenly interact in unforseen manners. And then it all makes sense
Coming back from more industry related events to university is always quiet a culture shock.
@moellus @sofasamurai hey was soll auch passieren. ist ja nicht so dass mein Phone danach gebrickt wäre oder so ;)
So a friend of mine, who has the same phone model, got his Kitkat update two weeks ago. HEY SAMSUNG, WHERE'S MY UPDATE?
Incredible, after six revisions, our fast cross-validation paper has finally been accepted. New personal record!
Replying to @rmetzger_
@vkalavri @ApacheFlink just saying, databricks/Cloudera us huge here. Would definitely help getting the word out.
Seeing this a lot now, first creating models in a notebook and then directly deploying from there. #StrataHadoop
Media
Replying to @rmetzger_
@vkalavri @ApacheFlink have you considered going to Strata in London in May? Deadline for CfP is on Monday!
Pretty impressive set of advanced ML demos by @graphlabteam's Shawn Scully. From image recognition to recommendations. #StrataHadoop
Replying to @rmetzger_
@vkalavri @ApacheFlink I'm always saying "Spark and Flink" in the same sentence ;)
Impressive turnout at my talk. Thanks to all the listeners and to #StrataHadoop for having me!
Words of truth. Raw data may be large, but machine learning often is more compute than data intensive. #StrataHadoop
Media
RT @jnebrera: Hahaha, I fully agree with this slide from @mikiobraun at #StrataHadoop http://t.co/8Ftfe5GXob
Media
RT @bigdata: Standing room only for @mikiobraun streaming analytics talk at #StrataHadoop http://t.co/wfQPQMriuz
Media
Yay, ROC curves in @jrdntgn talk on soccer worldcup predictions at Google. #StrataHadoop
Third day at #StrataHadoop Barcelona. Today will be busy, office hours at 11:50, talk at 13:45 plus track host in the data science track.
Replying to @lzamparo
it actually covers a number of algorithms and then also discusses how to use them properly using the TDD approach.
. @pacoid closes pointing to approximative algorithms. If you want to learn more, come to my talk tomorrow! #StrataHadoop
Replying to @John4man
@treycausey @seanjtaylor awesome! 😅 do you also have Big Commander Data?
RT @IgorBrigadir: BOW DOWN BEFORE THE ALMIGHTY Z-SCORE! RT @mikiobraun "Whoa, formula!" http://t.co/eO4aZq8FA9
Media
Replying to @alung
it's very cool, but really doesn't reduce the number of moving parts in the tool stack.
Replying to @alung
they've definitely picked up on that customer segment. It's kinda scary to be honest. ;)
Replying to @alung
Hehe, yeah. But it's also interesting how the different products are evolving. Notebooks are HUGE it seems.
Replying to @alung
yeah, my first Strata, but I've been to a bunch of these more industry centric conferences before.
Two open source projects which look interesting I learned about at #StrataHadoop: Zeppelin and ggplot for Python.
It seems databricks and friends think the solution to the lack of viz libs in Java are IPython notebooks and integration as in Spark.
Interesting talk by Rob Smith from IBM how they're combining notebooks with Ipython, Spark, and REST endpoints for interactive analytics.
Great finally meeting @huitseeker, @louisdorard, @sean_r_owen in person, and a few new people like @nraychaudhuri, and @kimknilsson!
Replying to @huitseeker
yes, definitely. I'll arrive in the afternoon today, so it should be no problem.
Replying to @svershin
Yeah, but that is mostly the teaching side of science, not the research side. Well, there are many aspects, need to blog about it.
Replying to @svershin
I mean as a company you can run things differently and change things on that level. But in academia it's a global system.
Replying to @svershin
that's true. Not sure results from business PMP translates to science easily.
BTW I'll be at #strataconf in Barcelona the next three days, so if you want to meet and talk, let me know.
Replying to @Frank_Scholten
thinking of nothing in particular, but e.g. Spark's mllib or Mahout's move towards a matrix algebra DSL.
RT @louisdorard: For those of you who can't be at #papis2014... We're video taping the sessions! http://t.co/fhW9zK8lvY
Media
Replying to @svershin
to me it seems like the structure is more or less fixed. It's also hard to change things individually.
Replying to @svershin
and then there's the whole thing how we can ensure progress on a global level. Is peer review still the best way, and so on
Replying to @svershin
I agree that at the core it's exploratory, but that does not mean there are no decisions to make.
Replying to @svershin
I was thinking about more higher level decisions. What the next steps are, how to handle multiple projects, etc
@jonbros usually I don't do that. Maybe it's just a coincidence, it felt like there was a new flood of these notifs... .
What's with all those "yeah, we're using cookie's too" notifications? Some change in regulation?
I think the real question is what percentage of Twitter's ad revenue comes directly from NewRelic ;)
RT @briancavalier: TIL: every production system running on Node is using an unsupported version of v8 w/no possibility of bug fixes: https:…
Replying to @eoinhurrell
@UltimateHurl @treycausey but that's not how it is between us guys, right? ;)
@mjmitchell86 hey, still waiting for that inbox invite? The button finally appeared in my app. ;)
Replying to @clairikine
I can't believe he actually used the word pogrom. I wonder what kind of world he lives in.
The EU commission seems to be planning to invite me to a workshop to identify "the big data skills mix". #WHATISHAPPENINGHERE
Replying to @wattersjames
can't say I'm surprised. Although 96% sounds like waaaaay too much.
RT @sarahmei: Academic research on software engineering has been a decade behind current practice since I've been practicing. Some contribu…
The only way out seems to quit this job and start a web comic about academia. Because if I started that there'd be no way back.
Replying to @markusandrezak
@geertbollen yeah. Also no contradiction I guess. After all, end users aren't their real paying customers.
Replying to @markusandrezak
@geertbollen since then my feeling is that they are great engineers but don't get end user products.
Replying to @markusandrezak
@geertbollen it felt like they were mostly concerned with internal dependencies and had no idea about the impact on the user
Replying to @markusandrezak
@geertbollen they sort of burned me when they forcefully integrated everything with G+.
Replying to @fhuszar
my father once have me a pocket calculator as a birthday present "because you're a scientist".
Pretty interesting talk by @markusandrezak on strategic thinking beyond agile this morning. #gotober
Pretty interesting talk by Chad Fowler talking about how he brought Wunderlist's infrastructure back on track with microservices. #gotober
Replying to @fs111
I must admit I don't even know what the bottleneck is. Num of channels? Num of concurrent users?
Replying to @pavlobaron
I mean when you have some example which requires a cluster to run because it's too large.
Replying to @pavlobaron
oh yeah. But I guess if you want to demo your cool Cloud Big Data Solution you have to. :(
Replying to @fs111
yeah never been there myself but from what I heard Internet is always flawless at CCC.
Replying to @octonion
yeah I can get that. Companies usually mean a lot of infrastructure. Supporting at best, choking at worst.
Replying to @GOTOber
yeah, sorry, didn't mean it was your fault. Not even sure is WiFi was designed for this.
Replying to @octonion
I'm sometimes playing with the thought of moving in the other direction.
Looks like actually getting to #gotober might be a bit of a challenge given the S Bahn strike.
Somehow dragged myself to #gotober's speaker dinner. Now they are making us come to stage individually to say a few words.
Replying to @lojikil
one site also had a confusing sequence of pages: "click here if you really want to unsubscribe" - "here if that was a mistake"
Now that was a needless excursion into sickness... At least it looks as if I'll be able to attend GOTO Berlin tomorrow... .
Unfortunately, illness prevents me from making the trip to London and speak at codemesh.io today. :(
Just had a boy with a spiderman costume, Darth Maul mask, and a minecraft pick axe come by trick or treating. #mashup
OH: "I'm an engineer. Anyone who can't stand the truth, customers, managers, should be kept as far as possible from me" ;)
How do mathematicians mentally visualize 7 dimensional spaces? Easy, they take an n dimensional space and set n = 7. #scnr
RT @ds_ldn: @mikiobraun beers & random startup chat w/ @quesada on Eur VCs:"Shouldn't be @streamdrill raising many $M in SValley ? Berlin V…
Replying to @HEPfeickert
alright, I settled for some local variety of sugary something. Here's to all submitted proposals!
Media
So that project proposal is on its way. But as ever so often after such a long process, feeling of accomplishment is marginal at best.
Replying to @mlsec
had to look up the exact definition of weasel word. This isn't going well ;)
Replying to @lojikil
do you happen to know someone who is deep enough into this tech he could make that happen?
Replying to @lojikil
GSM text message level integration for notifications, that's my dream. And batteries which last 5 days.
Replying to @lojikil
should also be backed into the cloud and mobile OSs to relieve each and every app from checking it's status all the time.
Replying to @lojikil
I think the biggest issue is just how Email is misused as a general purpose notification system today. Soo many stuff to handle.
Replying to @lojikil
IMHO Google is not really that good at providing a good UX. Thread view was pretty good in Gmail.
Replying to @lojikil
TBH I find it somewhat hard to grasp what is going on. Postpone looks nice, you can also enter a place (eg when I'm home).
Replying to @lojikil
like Gmail on steroids. More automatic labelling. You can pin messages (star?), mark them as done (archive?), postpone.
Garr, so what's exactly the mental model for Google Inbox? Labels? And Pins? And Snoozed? And Done? Is it more like a comb? With whistles?
But then again, what the heck is happening behind the scenes Google Inbox? Where are my emails? How does this interoperate with Gmail??
First cool feature of Google's Inbox: It suffices to have the invitation in your inbox to activate Inbox.
Replying to @eoinhurrell
@UltimateHurl and don't get me started about professors who have become managers and graduate students doing the ground work.
Replying to @eoinhurrell
@UltimateHurl where in reality, it's almost always a collaborative effort.
Replying to @eoinhurrell
@UltimateHurl IMHO the biggest fault is depicting scientists as megalomaniac loner types who are singlehandedly fighting the establishment.
Replying to @eoinhurrell
@UltimateHurl so basically they're saying "we know it paints the wrong picture, but in a way people get what we're doing & nobody got hurt"
Replying to @eoinhurrell
@UltimateHurl when I talked to other scientists, they said, well at least they give the public a feeling of where their tax money went.
Replying to @eoinhurrell
@UltimateHurl yes. I found the way scientists are depicted in the media, in particular in publications like Wired, somewhat distorting.
Replying to @twiecki
ah, sorry, I think they are all already set. No idea whether there's a mailing list collecting these venues.
Talk schedule for this fall: GOTO Berlin (Nov 7), Strata Barcelona (Nov 21), data2day Karlsruhe (Nov 26-28), DBTA Workshop Bern (Dec 3)
Replying to @drewconway
poor guy. But wait, doesn't the graph show that at the end of day, the amount of travelling was identical?
My daughter's about to master addition with carry for two digit number. Next up: polynomial division over quotient rings. #nerddad
Replying to @beaucronin
@amplab definitely. Even their framework, Flink looks a lot like Spark, but goes deeper towards query optimization.
Well, before each research project, first organizational issues. It's a German research project after all... #BBDC
IMHO one of the biggest challenge in the BBDC will be to bring the value of noise and approximation to databases.
The Berlin Big Data Center will heavily rely on and work with @ApacheFlink. This should give a huge boost towards scalable data analysis.
Kickoff meeting for the Berlin Big Data Center. This project is going to be interesting, ML, database, and infrastructure people together.
Replying to @JoergM
@thinkberg @pvblivs for a second there I was honestly thinking you wanted to look at pictures from Yosemite park. #nonappleguy
They are so diluting their key brand identity pushing out that many new devices like this!! ;)
Replying to @Bediko
"Aufschieben gilt als schlechte Arbeitsgewohnheit." Das ist ja wohl der deutscheste aller Wikipediaartikel. #gehtjagarnicht
I wonder whether there's something like a lifetime limit on talks you can listen to. Wondering for a friend ;)
Replying to @peter_c_william
yeah. I think using such terms is usually ok, but every once and again I like to remind me what it actually means.
How about we drop the "killer" modifier for something more positive. How about "savior" as in "savior app", "savior feature?" ;)
Replying to @mleich
hehe, you can also achieve Zero Inbox that way. "Mark All". "Delete" ;)
A friend of mine checks in to a hotel, asks for Wi-Fi. "What device?" "Well, notebook, tablet, phone, PS Vita." They just give him a router.
Replying to @mhausenblas
hehe. But I honestly thought they were the other way round. Good we sorted that out ;)
Replying to @mhausenblas
yes! ;) although I think we'll meet in Karlsruhe the week before, too.
RT @Quesada: I will be in London 28-30 Ping me if you want to meet for drinks/food/coffee/chat.
If you ever give a talk in an academic setting in Germany, don't be alarmed if people knock on the tables instead of clapping at the end.
Becoming more functional? Yes. More reactive? I thought proactive was preferred over reactive... ;)
Replying to @munterluggauer
I'm beginning to find the term "machine" somewhat archaic I have to say
Replying to @munterluggauer
because it implies your battery is so old it will only last for half a day? ;)
Replying to @munterluggauer
smartphones are just a server operating system on an ARM processor not knowing when and how ot stop
Replying to @munterluggauer
I think GSM was a highly engineered thing of beauty with minimal energy requirements whereas
Replying to @munterluggauer
I guess it just does. In retrospect, phones whose battery last for a week look like a technical marvel now.
Umlaut inferral would be an awesome feature for touch keyboards. As would be composite detection in German.
So if you want to chat with me about real-time big data at #strataconf, come to my "office hours" at 11:50, on Friday Nov 21.
In addition to my presentation at #strataconf in Barcelona, I'll have Office Hours on Friday, Nov 21, 11:50.
Replying to @mluebbecke
and still, my kano feels much more sluggish than what I remember. I blame this on frameworks.
Adello's CTO Uhlig at #DataDays: when you split up the value chain, you can't do end-to-end optimization anymore. I think he's right.
@bkkkk @iamdevloper same can be said about so many areas nowadays. Too many frameworks bullying you around. :(
RT @iamdevloper: Starting a basic website in 2014: 1. Install Node 2. Install Bower 3. Pick CSS framework 4. Pick responsive approach … 4…
Replying to @nikete
so if you choose that path you have to be ready to deal with not scoring too high officially and the pressure involved with that.
Replying to @nikete
or put differently, the you're often evaluated in terms of grant $$$s acquired, students who finish masters and Ph.D. and so on.
Replying to @nikete
that is right. Manfred Opper is another example. But that's not the way the game should be played, I guess ;)
Here's @StephanNoller talking about the Targetometer, partly powered by @streamdrill #DataDays
Media
RT @ChrisDiehl: How Big Data is Unfair - An excellent article by @mrtz https://t.co/TyOUFQG3W8 ht @bhpascal
Always put in some travel money for an advisory board. Let's you meet with your buddies (see networking) #ProfTips
Next up, the black art of sustaining your research group: stipends, consulting, and shared positions. #ProfTips
Relentlessly network because often calls will expect groups of researcher to apply with a joint research project. #becomingaprof
Constantly track calls from all the major funding agencies to look for programs which loosely match your research profile. #becomingaprof
Once you have secured a few positions, sustaining them will become a huge responsibilty. Not scientifically, but socially. #becomingaprof
First things first: yes, you need a research group to work an many ideas at the same time and to build up your legacy. #becomingaprof
Replying to @IgorCarron
@mdreid Hehe. Sounds worst a try. How would we go about this? I guess a hashtag is the right way. How about #becomingaprof 😉
So far, ello looks mostly like a study in Hipsteresque design and the potential of eventual consistency.
RT @HRFortmann: The insane explosion of the technology Landscape. Those are the warning signs for Media Agencies. #webit http://t.co/DUtt3e
I have to say what I always liked about #datadays are the large number of panel discussions.
Replying to @mdreid
sooo, you say you were into hype cycles before they were mainstream? ;)
Replying to @cowbs
@gridinoc and I bet it's fully configurable via XML! #neversolvethesameproblemtwice
And this is how publicly funded projects end. With a report. Including "III.4. work which didn't lead to results". Really?
Replying to @dosinga
ha. As if the university had money to buy phones with displays for everyone. ;)
Replying to @ian_soboroff
@ChrisDiehl I agree. Somehow money needs to be collected to pay for devs and infrastructure.
Replying to @clairikine
yeah. I usually read it as "we're late because everything took a bit longer". ;)
Replying to @octonion
all in all, I think they went commercial before the technology had a decent UI.
Replying to @octonion
followed by odd behavior trying to "be smart" with hostname resolution and other stuff.
Replying to @octonion
I can't even say what makes it so horrible, but it starts with entirely unhelpful stack traces for faulty config.
Replying to @octonion
and still, half the nodes "forget" their hostname and register with their IP address instead. No idea why.
Replying to @octonion
definitely. I wanted to do it at least once. But I think once is also enough.
It seemed like especially YARN was doing a lot of IP reverse lookups where it should just have sticked with the hostnames in the configs.
Alright, turns out what broke my Hadoop-from-scratch-install was the way /etc/hosts was set up. BUT WHY DO YOU LOOK THERE IN THE FIRST PLACE
Replying to @gridinoc
that London hedge fund again? ;) has been some time. I wonder what happend to them..
Replying to @tyldurd
yeah I mostly worked with the streaming API. 400 kw and capped at a few 100s tps makes for a few GB pet day.
I have to admit I'm only at page 46. Everything's cool so far. Although admittedly pretty SVesque.
So my wife made me read The Circle. Because of the danger of social media and all that. #didisaydanger #imeandream
Joy's of grant proposal writing, Part 2: Finding out that the system you put your data in last week was the wrong one.
Reading spark sources to find out where it pulls that faulty hostname from. Gotta love open source. #sarcasm
Replying to @DZoneInc
.@DZone I would call it heterogeneous distributed computing, for lack of a better term. #DZBigData
Replying to @DZoneInc
.@DZone I think IoT poses new challenges for distributed comp because it needs to move closer to the devices. #DZBigData
Replying to @deanwampler
@bendzone in a way, I think BD has been commercialized too fast (compared to MySQL or Linux), so you need professional support
Replying to @alung
@alansaid @KirkDBorne @sarveshgupta89 ah, sorry, my mistake. You might also have to set the temp directory to somewhere BIG! ;)
RT @alung: @mikiobraun @alansaid @kirkdborne @sarveshgupta89 sort --parallel=15 | uniq, are we talking big data, or what ?
Replying to @benballjr
.@bendzone setting up Hadoop and friends from the sources can be quite painful. And still you need to know what you want to do. #DZBigData
Replying to @alansaid
. @alansaid @KirkDBorne @sarveshgupta89 oh yes! And 'sort | uniq' for count distinct, of course! ;)
RT @alansaid: @mikiobraun don’t forget ’cut’ and ’uniq’ ;) @KirkDBorne @sarveshgupta89
Replying to @Strategy_Gal
.@BigDataGal @gatr1126 I think tools are not mature enough such that the "what to compute" and the "how" are sufficiently decoupled.
Replying to @KirkDBorne
.@KirkDBorne @sarveshgupta89 ok, grep with 'wc -l', of course ;) #DZBigData
Replying to @Strategy_Gal
@BigDataGal @gatr1126 it seems to be something about abstract concepts versus "getting your hands dirty", unfortunately.
Replying to @Strategy_Gal
@BigDataGal @gatr1126 I agree. Unfortunately in academia, Big Data and machine learning are still somewhat disjoint.
Replying to @sarveshgupta89
.@sarveshgupta89 splunk ("Google for logs") might also be worth looking into.
Replying to @sarveshgupta89
.@sarveshgupta89 Spark itself provides about the same functionality as Ruby or Scala operating on lines of a file, but can scale. #DZBigData
Replying to @HPE_Ezmeral
.@mapr yes, definitely, I think this is the main value big vendors like you and @cloudera provide. #DZBigData
Replying to @benballjr
.@bendzone 1. whether or not you really need big data. 2. That big data will solve your problems out of the box. ;) #DZBigData
.@hrcerqueira I'd like to say that especially the open source stuff gives you engines, gears, and a transmission, but no car. #DZBigData
.@hrcerqueira OTOH, even if you use Hadoop and friends you'll have to do a lot of building yourself. #DZBigData
.@hrcerqueira distributed computing *is* technically challenging, so to use available tech is certainly preferable to starting from scratch
Replying to @SeanGoldbergCS
.@gatr1126 for "Big Knowledge" we'd need to develop matching scalable analysis methods. #DZBigData
Replying to @SeanGoldbergCS
.@gatr1126 IMHO Big Data is currently pretty much focused on scalable computing and storage infrastructure. #DZBigData
Replying to @DZoneAlec
.@DZoneAlec the question is also always how much time for analysis you can endure. For real-time, hundreds of MB might already be too much.
.@cabosworth but in each case it might be quite a challenge to figure out what exactly to do with it. #DZBigData
.@cabosworth I only know the term to stand for the "pools of data" you're not using yet. I guess there's lot of potential in principle
Replying to @ryandzone
.@ryandzone I usually do my prototyping in Python or R, and then prefer Scala for the real thing, sometimes even for the proto. #DZBigData
Replying to @JPoliachik
. @Jpoliachik not really familiar with it, but I guess stats based langs are currently more important than symbolic langs. #DZBigData
Replying to @mpron
for real-time, a few thousand events per second can already be challenging, in particular if you are aggregating over long intervals
Replying to @mpron
depends on the data analysis you're interested in. But I'd say anything which doesn't fit on one server.
In case you're wondering, I'll be participating in @DZone's Big data Q&A for the next hour.
RT @DZone: Get your #BigData questions answered by @KirkDBorne @BigDataGal @spyced @deanwampler @mapr @mikiobraun #DZBigData http://t.co/DC
Also, if you order now in Germany, you won't get your unit before Jan 1, 2015. All that predictive modelling, and now this... . :(
Replying to @beaucronin
it has probably something to do with the way the scala console works, but nevertheless...
Replying to @beaucronin
oh yes I agree. Still in this case the launcher got all the errors and still dropped to the console, although nothing worked.
Replying to @cartazio
just too much of the ol' bad Java EE practices lingering. XML config files, log output as status, stacktraces as error messages.
Replying to @cartazio
Yeah, Java's making quite a return to the spotlight riding that Big Data wave. #mixedmetaphors
RT @HEPfeickert: Happy Postdoc Appreciation Week @marktibbetts @AstroKatie @claranellist @sethzenz @ellipsix! Now go do more science. http:…
Nice move Spark shell, first dumping two screens worth of stack traces on me and then going to the prompt as if nothing's wrong.
Final stage of grant proposal writing: reducing a 50p document into the information required in the agency's forms.
Replying to @ayirpelle
yeah, as a developer, I find them indispensable, too, but I wonder whether it's not TMI for almost every end-user.
Replying to @huitseeker
IMHO, too many people just leave the log outputs they used for development in and leave it at that.
Replying to @huitseeker
well I think classic Unix got it right mostly. At least when it comes to the console.
I could write a book about how log output isn't an adequate user facing status indicator.
Ok, it seems HDFS thought the disk was full. But hey, let's pretend it's a network connection/replication issue.
Oh, interesting, single node setup with Hadoop, one node is running, but I cannot copy to HDFS because it does not accept data it seems.
Alright, xth time to recompile Spark to fix some odd runtime invocation problems... . Cutting down on modules to compile... .
W00t, there's an IoT OS called Spark? spark.io So one day you can run Spark spark.apache.org on Spark?
An Integrated IoT Platform-as-a-Service | Particle
Particle helps the world's most innovative companies power their connected machines, vehicles, and products.
www.spark.io
Apache Spark™ - Unified Engine for large-scale data analytics
Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.
spark.apache.org
Given all the legal issues around Uber it's quite ironic that the name stems from the German word "über" (over, above).
Man, bare Hadoop is just so... bare. Service silently crashing on startup due to faulty config? Why tell, it's all in the logs!
@zenzenzen a looooot of trust's going in in your feed right now... Some app tweeting on your behalf?
Replying to @roidrage
I once took the train to go from Toronto to Niagara Falls for a day. That was pretty weird. And took forever.
Replying to @eoinhurrell
@UltimateHurl as crazy as it sounds, good point! I probably didn't say clearly enough how this is a "logical" extension of Big Data ;)
Talked about Internet of Things at our univ group retreat. They: "whatever happened to Big Data LOL" WE'RE HYPE CYCLING TOO FAST!!!
RT @streamdrill: RT @thinkberg: Check out @nuggad's targetometer, Hall 8 B021with some help from @streamdrill http://t.co/Z47m7sazAC #DMEXCO
A friend just pointed out that only Apple can pull it off to market a device with a screen of slightly less than 6in as "6 plus".
So @thinkberg is at dmexco, and I'm somewhere in Brandenburg at my university's group retreat. Quite a contrast.
@muratk3n and it's getting worse each year. But I think it accurately reflects the complexity.
Replying to @markusandrezak
Hehe. It's not like you cannot hot swap in an update of the store. I mean technically... ;)
When I say thing like "Big Data" and "Internet of Things" to people outside of my profession, they think I'm joking.
Another point in favor of "thought bubble" instead of "cloud" is that the Bubble is actually filled with information. #ha
Also, how do you store stuff in the cloud? It's just air with suspended liquids floating in the sky.
Maybe we should've called the cloud the "thought bubble" and later just "bubble" for short. As in "Backup your pics in the bubble".
@markus_breuer don't glass and fly? Dass es kurz nach halb zehn ist? Ich weiss es auch nicht.
I was just told to go straight to the brands website in a clothing store because they don't stock the full catalogue anyway. #web2.0
I don't get smartwatches: (1) I have my phone in my hand all the time anyway. (2) watches need new batteries only once every two years.
Replying to @robanhk
yep. Although that at least related to research in some way. But yeah, you never trained to teach.
Replying to @beaucronin
definitely. Luckily there are still enough who don't loose sight of research.
Same article: "Another 95 percent don't think they ever use cloud computing, even though they're actually doing a lot in the cloud." #yep
Replying to @BenBlack
@superglaze reminds me a significant amount of people think heavy weather affects cloud computing.
The oddest things about working in academia is that more and more you have to do stuff you have no training in. Like legal, admin, or HR.
My notifications:calls ratio is so high, I have my phone on silent (not even vibrate) most of the time - and frequently miss calls.
Replying to @alung
now if you'd convince yourself you can give even more value to the user by dynamic timelines at half the cost, that's hard to resist
Although delivering only tweets with some form of engagement is an excellent way to cut down on infrastructure load. Or not deliver anything
If Twitter's really going the road of dynamic timeline with selectively distributed tweets, it's time again for a Twitter alternative.
@muratk3n maybe for the generations which haven't seen Star Wars in the first place ;) And by Star Wars I mean Episodes IV-VI ;)
@muratk3n there's just so much of all that complexity you can put into 2h. But I also felt characters were less "cosmic" than in the comics.
@muratk3n that was crucial to make sense of the movie. I guess it does. ("oh look, there's Cosmo!")
@muratk3n a friend and I sort of binge read a lot of that cosmic marvel stuff in the weeks before the movie. We always wondered whether
@albert_swart exactly. And then in my early twenties I learned that you actually get better by practicing. Took a long time ;)
@albert_swart luckily all that music theory was much more interesting than practicing that one piece for weeks. Never looked back.
@albert_swart so when I was sixteen I thought ok let's try jazz, you're only improvising, right? No need to practice.
@albert_swart mostly jazz. Originally I learned classical piano but I never enjoyed to practice pieces.
Wohoo, for the first time I got approving looks after switching to guitar for one song. 🙋🎸
Replying to @roidrage
yeah. Unfortunately, the deadlines themselves are usually not negotionable.
RT @mleich: The proverbial silver bullet in academics is to state that there is no silver bullet. #vldb2014 #phdworkshop
Replying to @markusandrezak
Hehe. Suddenly I feel urge to set up appointment with my dentist for my yearly check-up ;)
Replying to @markusandrezak
uh, free OptiClean sample inside. I guess whatever that is, that should be your smallest worry.
RT @yarapavan: Learn the rules like a pro, so you can break them like an artist. PABLO PICASSO via @AdviceToWriters
The only reason I'm still here is that I'm waiting to get to the buffet to make up for the missed lunch.
Replying to @fhueske
but I guess if you're commuting to Mitte that tells a lot about you. ;)
Replying to @fhueske
Hehe. What I find so funny about the U9/S1 is that they sort of originate in the same area.
Replying to @cartazio
nope. I never managed to integrate sparse matrices into jblas. Sparse matrices seemed too specialized in different uses.
Replying to @cartazio
interest in high perf linalg is still strong. I'll be talking about linalg on the JVM at Amazon in Berlin next month.
Replying to @cartazio
oh. Of course. My bad. See, it's already starting with the different communities ;)
Replying to @cartazio
sorry, really wasn't sure what pl was referring to ;) but yeah, any piece of structure to help people get an understanding is good
Replying to @cartazio
not sure about the "nicely" part, but it sure helps to talk to people ;)
Replying to @HEPfeickert
TBH being able to distinguish between what you know and what the audience can possible know at slide N seems very hard.
Replying to @HEPfeickert
I think I can safely say to have been done both. With mixed results ;)
RT @HEPfeickert: @mikiobraun Yes. There are technical presentations, and then there are presentation disasters.
Replying to @HEPfeickert
oh yes. And there are many talks you are expected to attend, and it doesn't always match.
Replying to @HEPfeickert
actually, I was more concerned as a part of the audience because I suffered through so many talks ;)
Replying to @HEPfeickert
and sometimes it becomes so specialized only a handful of people can know what you are talking about.
Replying to @HEPfeickert
yeah, I agree. I think when it gets very technical, that puts a lot of constraints on how much you can adapt.
Replying to @HEPfeickert
so it's really not a good idea to talk to all of the for an extended period of time.
Replying to @HEPfeickert
it seems rather implausible that the audience will have a similar level of background knowledge.
Note to self: next time make sure someone told the speaker the usual length of the seminar talks.. #slideseventyfive #andcounting
Apart from the subject of the general human condition, the very premise of giving a talk seems wrong to me sometimes.
Learning to write papers is hard. But there seems to be no shortcut for the struggle to present some original piece of thought to the world.
Replying to @mrt1nz
thanks for the pointer. It seemed to me like a logical thing to do, but hadn't seen it so much so far.
RT @Kurt_Vonnegut: Another flaw in the human character is that everybody wants to build and nobody wants to do maintenance.
Replying to @dlieuOfTwit
@sscdotopen ah well, time to subscribe to the mailing list, maybe.
Replying to @dlieuOfTwit
@sscdotopen Hm. Now I'm seeing a lot of akka.remote.EndpointAssociationExceptions... .
Replying to @dlieuOfTwit
@sscdotopen alright, got it to work with 1.0.1, too! Thanks! BTW, the tutorial still links to 0.9.1. Maybe you can fix that?
Replying to @dlieuOfTwit
@sscdotopen thanks. I think I used the latest version of Spark (1.0.2). Will try with 1.0.1.
Replying to @heiko_braun
yeah. Well ok, I didn't mean to imply I was particularly good at writing ;) also 140 characters!
Replying to @heiko_braun
erm, no offense, but I meant actual writing skills, like putting down a line of thought such that others can follow.
Put differently, every well written piece of docs means you have to spent less time explaining, and you have more time for coding! ;)
Replying to @dominik
but where are they going? The mysterious land of sunsetted APIs? I bet it's sunny there. Sounds nice.
Replying to @dlieuOfTwit
@sscdotopen it cannot find the SparkContext class. Odd, eh? Anyway, enjoy your vacation!
Replying to @dlieuOfTwit
@sscdotopen no rush, this can wait. I'm probably doing something dumb. Spark starts up, but mahouts spark-shell crashes b/c
Replying to @ChrisDiehl
but sometimes I wished we had methods which actually understood the data they're working with.
Replying to @Mandar_2812
you mean because that would imply Twitter has entered a steady state? I think you're right.. . Something is wrong.
The downside of knowing too much about Data Science is that one is acutely aware of its short-comings.
Looks like I've hit peak followers. Unfollows and new followers seem to keep the balance right now ;)
RT @octonion: This is known as the Bayes classifier. The major goal of machine learning is to approximate the Bayes classifier as closely a…
RT @octonion: If you knew the conditional probability of class C given information X, P(C|X), you can do no better than guess C with maximu…
Replying to @jaykreps
I see. It's not that easy it seems. Last resort: put a note in the docs, maybe?
Replying to @mhausenblas
@sscdotopen And, did you get it to work? Oddly, Mahout's spark shell complains it cannot initialize the SparkContext???
Replying to @shaunmcgirr
looks like I have to head back into an actual store. Sometimes it pays of to have an actual human look after these things.
It's nearly impossible to buy original smartphone accessories on Amazon.de these days. Nothing but fakes... :(
For example, didn't know about "Tiered Compilation" in Java 7: Best of client + server JIT compile modes.
Totally forgot I owned "Java Performance: The Definitive Guide". Now this is a highly informative - and painful - read. ;)
Replying to @jaykreps
hm. Or maybe $HOME/kafka-logs or something like that. I'd think that /tmp can be expected to be somewhat space constrained.
Replying to @jaykreps
honestly, I probably shouldn't have piped at whole days worth of tweets into it. After a few GB, my root partition was full.
@muratk3n a professor once told me, getting the right alg was a matter of months, getting it fast could be taken care "over the weekend".
Oh, cool, Kafka's example config log all the data to /tmp. Thanks for overrunning my root partition, guys.
Replying to @sscdotopen
hm. Mir scheint, spark-shell ist gerade broken, "java.lang.NoSuchMethodError: org.apache.spark.HttpServer.<init>..."
TIL that setting CDPATH breaks virtually all spark/hadoop/mahout/[BIG DATA FRAMEWORK] scripts.
Replying to @meltomene
I'm at the ML chair (Klaus-Robert Müller). All kinds of students do their master here, mostly to get a degree in computer science
I have a master student working on implementing one of my learning algs on Spark/Flink and the impedance mismatch is hilarious. And sad.
Replying to @eoinhurrell
@UltimateHurl ok, maybe it's the talking part. Somehow today ended up being 5h of back-to-back meetings discussing... stuff.. . #cough
Replying to @eoinhurrell
@UltimateHurl preferable the guy who's presenting the topic. >;^)
Replying to @eoinhurrell
@UltimateHurl at least one guy should know what he's talking about, that's my rule ;)
Today I realized that most of research is spent discussing stuff you don't really understand. Ok, make that no-one.
Replying to @izendejas
you mean the quality of the recs? Or really the service as a company?
Replying to @ggmemoryhole
@Laroquod I saw someone say that they probably intended to catch just "T" but didn't check the state of the modifier keys.
TIL not all people consider the fact that Hank Azaria is voicing Apu (and many others) on the Simpsons common knowledge.
Replying to @alung
I liked the idea, too. I had a student who scraped their charts when it was still possible to work on trend detection and stuff.
Replying to @meltomene
I know ;) but it always felt like they stopped evolving further after the acquisition.
Replying to @fs111
someone claimed they don't check for modifiers. So they think they're just catching T. "Just" a bug (hopefully).
Replying to @ggmemoryhole
@Laroquod you're not alone, my friend. We're just not mainstream.
There is a silent storm of keyboard affine Twitter users who are irritated by Twitter's hijacking of Firefox shortcuts. The rest: What?
Replying to @fs111
of course, these are mostly "power users". No mention of it in the mainstream media! ;)
Replying to @fs111
yeah. I know. Ctrl+n as well it seems. If you search for "Twitter shortcuts" you see all kinds of people complaining about this.
Replying to @superglaze
what an awesome bug. As always, it's an intricate interplay of innocent bystanders.
Everybody is bitching about Yahoo, but anyone remembers last.fm? CBS bought it for $280M. It could've been the Spotify before Spotify!
@treycausey actually, it was based on jblas, so it was more of a Java/JRuby/Fortran combination (although I didn't write any Fortran myself)
@treycausey I was using Java for the more high performance parts, and JRuby for all the syntactic sugar.
@treycausey So I better don't upgrade to 4.4. *If* they ever release that for the S3 >;^)
@treycausey as someone said, they needed to grow so fast, probably not all rockstars by now... .
@treycausey wow. No such problems on my end, luckily. Just this notification counter which wouldn't reset, but that's also ok-ish now.
Still, everything seems very compute oriented, meaning you still need some store backend to store/query results.
Is there some cross-over happening between reactive programming a la akka and "functional streaming" a la scalding?
Replying to @cartazio
alright, starts here in Germany on Aug 28. I'll mark that date down! ;)
Replying to @munterluggauer
immerhin 160k Haushalte einen ganzen Sonntag ohne Internet. In 9 Monaten wissen wir mehr! ;)
I might have been a bit too optimistic about the motivational powers of a whole block having no Internet. #stillunlinked #kabeldeutschland
The only good thing about cable Internet is that it always affects a whole block. So at least there is some incentive to fix it. 😞
RT @kerfors: #BigData #DataScience in 1939 -> "Staff sorting 4M used tickets from #London Underground to analyse line use." http://t.co/fJ3
RT this if you know 2^0... 2^16 by heart. #8bitmind #sixtyfivethousandfivehundredthirtysix #80s
One day I would like to know what makes people so angry they yell out at people walking while looking at their smartphone.
Those poor students who think the right rebuttal will change the reviewers verdict... . Journal, maybe, but conference, never.
RT @pcalcado: Is it me or there’s a pattern: 1) Startup engineer builds some cool infra 2) Starts blogging/presenting a lot 3) Goes on t…
Replying to @Quesada
@pcalcado Let's say I'm at solidly at 2) now. Time for the next step ;)
RT @diodesign: TIL in 2014 this is still a thing: Oracle, mate. You made $10.9bn in profit last year. Kick Ask to the curb http://t.co/ONUm
RT @berlinbuzzwords: "Real-time personalization and recommendation with stream mining" - a talk by @mikiobraun at #bbuzz 2014 https://t.co…
Pretty impressed by @dataScienceRet demos. Many of those would have been perfectly fine in-house pilot projects.
Replying to @DRMacIver
actually I wonder whether anyone is NOT using a restricted subset of C++.
Now this looks like the first perfectly fine summer day since the heat wave hit. #berlin ☀🐤😌
Replying to @weballergy
public appreciation of the whole field is improved, though, so I can hardly complain.
Replying to @weballergy
yeah, that's how humans work. Although I always wonder whether that's academically desirable.
Replying to @eoinhurrell
@UltimateHurl yeah. And like in the case of DeepMind even misleadingly as they have allegedly been working more on reinforcement learning.
I'm just imagining pitching a deep learning startup to German VCs. "Have you any market validation for this?" #scnr
For what it's worth he mentioned me by name in his deep learning Ph. D. defense at least ten times, so I guess I can deep learn, too.
Hey, I've a super stealth deep learning researcher sitting next door. DM me if you're interested. Offers starting at $10M please.
Deep learning has got to be the most financially profitable marketing ploy, er, piece of technology out of AI research yet.
I'm not implying he should have known. But it shows how separated Big Data and machine learning research still is.
We had a guy from Gatsby over giving a talk today. He had heard neither from Spark nor Storm. This is reality.
Replying to @cartazio
I mean because if there was it would hardly lead the work life balance charts. 😜
Replying to @cartazio
I think we just found the ultimate proof that there I much less science in data science than the name suggests.
Replying to @DRMacIver
be glad you don't draw comics. I always think they must have the worst ratio.
RT @bigdata: The higher purpose of doodling: nice segment from @CBSSunday. Reminded me of the cool diagrams by @mikiobraun http://t.co/8aV0
Replying to @bigdata
@CBSSunday yeah, it actually does help me think or visualize most of the time.
RT @caseyjohnston: Former FB data scientist says informed consent "will have an incredible chilling effect on social systems research." htt…
@bastianventhur that was back when in-memory computations were just called computations as well, I reckon.
Now only if they could get rid of all that Java-EE-esque frameworking layers upon layers of abstractions in Android...
Anyone still remembers the Ackermann function? And it's inverse, α? For all practical values of n, α(n) < 5. #nerdfact
One of the perks of living in Berlin is that by the time I dropped off my kids and got to the U-Bahn I already walked like 3km. #huffpuff
Last time till end of August to have to get up at 6am to get my kids to school/kindergarten. #yeah
RT @bigdata: Live from #berlin tomorrow: @cratedata a Super Simple #realtime #bigdata backend, a FREE webcast with @jodok http://t.co/U6H8V
Replying to @jane_fel_reed
oh, that's from last year, @thinkberg accidentally reposted a few links.
TIL public funding assumes you'll spend 18% of your time being unproductive anyway. #probablynotprocrastinatingthough
Those two weeks of summer where you'd really need AC in Berlin, you really need it. Especially during talks. 😰
Replying to @mpompery
@zeit_geist not if you want to Keep It Simple and work without scaling out
Replying to @mpompery
@zeit_geist or just reluctantly use disk when you can't hold all the data in memory anymore. ;)
Replying to @mpompery
@zeit_geist oh yeah. Tell me all about it. But still I think there's a psychological difference when you use caching to speed up disks
In realtime processing, you don't cache in memory to speed up disk access, but use disks to handle data beyond main memory limits.
Fun fact: "L.A." is how German students of math generally refer to linear algebra? #notcityofangels
Replying to @superglaze
I also don't think Satya's sense of stylen has ever been a topic of discussion either.
Replying to @dominik
hey, ping is an important part of web round trip times, in particular for websites with lots of pics! #notreallyhelping #iknow
Hm. Apparently I had already played around with Go - On November 13, 2009. #fiveyearsalreadyQQQ
@poussevinm but the server which hosts the blog was completely down. Maybe some browsers had cached the page & tracking JavaScript?
Replying to @fhuszar
hehe. It was. It wasn't pretty. Disappearing root partitions and all. Thanks to @thinkberg for fixing it!
Hm. My blog was down last week for a few days and yet Google Analytics claims I had pageviews... #odd #isntit
@addelindh given how strong Bayern Munich is and how many players from the national team play there, it's inevitable.
RT @streamdrill: I’ve added the stock mentions demo from play.streamdrill.com/vis to the Twitter client examples: github.com/thinkberg/stre…
StreamDrill.com is for sale | HugeDomains
Add more credibility to your site - get a premium domain today. Straight-forward shopping experience.
play.streamdrill.com
streamdrill-client/examples at master · streamdrill/streamdrill-client
streamdrill client library and examples. Contribute to streamdrill/streamdrill-client development by creating an account on GitHub.
github.com
Replying to @Quesada
oh yeah. Love his descriptions. But will our children know the color of a dead channel?
NewRelic's apparently dying to tell me about devops. Every frigging day for the past two weeks. #targetting
As Ulf Brefeld of TU Darmstadt says: I don't believe in collaborative filtering, recommendation is about understanding what the user wants.
They also presented some interesting collaboration between Zalando and TU Darmstadt and Dortmund on sequential recommendation.
Yesterdays recommender stammtisch was pretty good. I definitely need to tell my students to attend such meetups to learn about real-world ML
Except for my talk actually interesting talks on recsys approaches besides collaborative filtering.
And now for three talks at the recommender meetup at Zalando. I'm number two after the food break. #hungry
RT @felixsalmon: An amazing infographic about what’s at stake in Thursday’s World Cup matches: http://t.co/jsWxY7r3KF ^ @bengreenman
Media
@muratk3n yeah. And I think they had towels in the entrance areas of the houses with which you could dry your head.
RT @MSalt69: And I thought my job title slightly understated my actual role... http://t.co/GyLbBWbxqR
Media
Eversince Amazon.de switched to Hermes, I'm not feeling the love anymore. #whattheheckaretheydoing
Had to wipe my kindle fire b/c I forgot my parental ctrls passwd (nice UI, BTW, Bezos). But nothing rocks like a freshly installed Android.
Ok, I meant to unlink my consciousness from my brain 😜. Conscience can get in the way, too, of course.😓
On an unrelated note, if you're the slightest but into funk, check out Live in San Francisco by @Soulive
TBH, some of these days I wish I could unlink my conscience from my brain while it takes care of the some tasks at work.
Important update Re: recommender stammtisch on Thu, you can also come at 6pm and watch the game there (only if you stay for the talks!)
Replying to @lc0d3r
@mhausenblas I don't think they have the facilities for that, unfortunately. But I plan to blog about that.
Replying to @superglaze
you probably didn't expect to write such a headline ever, right? ;)
Alright kids, if you don't like your writing assignments, keep in mind that one day your project proposal will mean real €$¥s.
Alright, getting ready for #GERGHA. First game of Germany I'm going to watch, actually.
Replying to @yoavgo
actually most of the time when it's claimed to be scalable, they also say it's "embarrassingly parallel".
Oh great, Skype is disconnecting all the time. If you ever do a PhD, make sure all of the examiners can be physically present. .
Replying to @markusandrezak
oh yeah. Not belittling the effort. But telling the audience there'll be 60-90mins of questions is just cruel ;)
Alright, attending a Ph. D. defense. We were just told that the procedure will take at least two hours. #whathaveisignedupfor
Unlike what you'd expect, almost all ML papers which have "scalable" in the title have actually not been scaled out using Hadoop et al.
In ML, "scalablity" is usually achieved through better algorithms which stream data/use approximations, such that resource usage is bounded
In Big Data, "scalable" usually means being able to parallelize the workload on a cluster, in ML it means "can deal with large data sets".
While giving a session on "Scalable ML" at the @dataScienceRet yesterday, I realized that "scalable" means quite a different thing in ML.
Saw one of the new Surface Pro 2 yesterday in a shop. They are incredibly bulky, about 1.3cm deep and weigh 900g. #fail
RT @rbranson: @coda now that we've taken reductionist apps all the way to their logical end, we are now free to truly innovate.
Replying to @contenthunter
then it's a bit surprising that you cannot store a full disk worth of data, because you don't have enough mem for the index.
Replying to @contenthunter
yeah, certainly. I think people are used that indices are on disk, too, only cached in memory.
Replying to @contenthunter
ok, but wouldn't that restrict the amount of data per shard by the available RAM? Or is it more like caching and it just
Replying to @andrew_clegg
OK, it also says they know have a staff of 70 people. So I guess money helps, too.
Replying to @andrew_clegg
Wikipedia confirms. Which is actually pretty cool. It also says the pilot took three months to complete, now it's five days.
Replying to @andrew_clegg
actually it was a sharpie, my smartphone camera, Inkscape (vectorization) and gimp (crop & postproc) before.
Math joke time! How does a mathematician fence in a herd of Zebras? He puts the fence around him and defines himself as outside.
Just in case anyone wonders how I put the hand drawn figures in my talks: current setup is Autodesk Sketch Pro on a tablet + gimp for crop
Replying to @munterluggauer
nah, I think you can forget about that paper. But for many it's still the starting point.
Every time I read that Map Reduce for Machine Learning on Multicore paper, I'm less impressed.
Nice chatting with @noelwelsh over the #ScalaDays lunch break about Scala, startups, A/B testing, realtime, and German bureaucracy ;)
RT @ManningBooks: Hey #ScalaDays! Save 44% on #scala, #akka, #playframework and #reactive eBooks with code scaladtwcf at http://t.co/OdkWpx
Replying to @dosinga
yeah, certainly. But I heard stuff that implied indices can't even been flushed to disk.
@mattangriffel nice writeup about your time in Berlin! Being from Berlin, I could relate very well ;)
I've been hearing that "Big Data DB X doesn't scale because all the indices need to be in memory" a lot lately. Any experiences with that?
Replying to @ChrisDiehl
@gappy3000 @drewconway hehe, I also often call them frequentian or Bayesianist :D
Replying to @ChrisDiehl
@drewconway I'm not a Frequentist, but I know a Bayesian who'd be very mad I said that ;)
Alright, now that I'm slowing finishing all those stuff I committed to, time to think about what to change in the future. ;)
Hello Heidelberg, today I'll be at the programm committee meeting of the upcoming data2day conference.
RT @robotnik: @alexismadrigal I grew up reading 80s cyberpunk: I was promised a crumbling hypercapitalist surveillance dystopia. Guess what?
RT @bigdata: Here's the paper that @mikiobraun mentioned today: count-min sketch for Clustering Massive-Domain Data Streams http://t.co/7pI
Alright, that was my @OReillyWebcasts. Thanks again for listening and hosting the event! /cc @bigdata
Also, the webcast will be done before the world cup opening ceremony. It wasn't planned this way, I'm just lucky. 🏃⚽
Replying to @purbon
yes, I think so. There is a rather heavyweight sign up process, but it seems you can watch them later, too.
Oh really @twitter, now you're making extra round-trips in the app to translate each tweet to German? Using bing? Where do I turn that off?
Looks like spending the 31+°C day in an air-conditioned car on the Autobahn wasn't so bad after all. ☀😰
Authro 1 already on vacation, Author 2 flying in this evening, Author 3 embarked to the US to start an internship. NIPS stories... .
Ok, once I let libreoffice produce PPTs instead of PPTXs, everything was reasonably smooth. #butwellitsalso2014
Alright, so let's see how PowerPoint likes my libreoffice slides for the upcoming O'Reilly webcast.
Replying to @cartazio
@mdreid please, I have to know whether you managed to meet or not!! #twittersuspense ;)
Replying to @cartazio
only that programming allows you to write down that plot so that others can use it, too, without all that firsthand experience.
Replying to @cartazio
so I guess in the realm of programming related experiences it's a first, although hard to explain to someone who doesn't code.
Replying to @cartazio
but it's the first time I'm not building the actual framework at the same time.
Replying to @cartazio
even if I start from scratch I tend do put together the same pieces. That's kind of to expect.
I might have stopped myself to write that big analytics framework, but my mind has started to form a set of patterns to the same effect.
TIL you can have overloaded constructors in Scala with def this(...). #itsbeenwhatfouryears
RT @t3kcit: #openaccess #fail by @mitpess. Another sign that open access and publishes don't go together so well. http://t.co/Yq1ZR98CIx
Replying to @InkmiHq
@codemonkeyism @thinkberg dann wären sie schon am Hauptbahnhof äh am Flughafen äh - Breschniev!
Replying to @roidrage
@codemonkeyism Isn't that the classical "adding people to a late project makes it later" by Fred Brooks?
RT @zaxtax: Any sufficiently advanced data averaging algorithm is indistinguishable from learning.
And now back to academic talks: "And then you see that, we'll, OK, this might be a bit non-trivial."
In case you're wondering, I can't recommend giving that amount of talks. It probably helps if the material is all already prepared.
Replying to @MLnick
So most of the times I'm actually using a blocking queue with finite capacity so that I see when I'm being too fast on the front.
Replying to @MLnick
The problem with systems like Akka has always been that you don't really see when one of the mailboxes is filling up.
Replying to @MLnick
We've used Akka in the past to decouple different stages of the pipeline, but currently it's all very direct.
Replying to @michalrut
yeah. But these are worst case bounds. In reality, the error will likely be smaller.
Replying to @michalrut
yeah, I'll send you the link to the paper later. On the move right now.
Replying to @michalrut
no, the error bounds hold for any error. But it helps if you have a skewed distribution and capture most of the active items.
RT @berlinbuzzwords: "Real-time personalization and recommendation with stream mining" by @mikiobraun, 15:20 to 16:00 at Maschinenhaus. htt…
Replying to @rherbrich
but look what I found ;) bought it used, haven't really played it yet.
Media
Replying to @cartazio
that one was for xing, a German LinkedIn clone, so I guess they want people for social network analysis and similar stuff.
MapR's @mhausenblas on how Apache Spark let's you created batch and speed layer systems in an integrated fashion
Media
Interesting talk by @peterbourgon on how SoundCloud uses CRDTs to count social activity reliably in a distributed fashion.
Lesson's learned on technology transfer for TruSkill for the Xbox classic. #bbuzz @rherbrich
Media
Replying to @DRMacIver
@propensive here in Berlin, the vote on what to do with the Airport Tempelhof was on the same day, boosting turn-out it seems.
Heading out to #bbuzz. Great weather as always. Looking forward to @rherbrich's (Amazon ML, formerly FB, MS research) keynote.
So after I tortured the database guys with SGD last time, now they'll teach us parallel data processing fundamentals
Media
So that concludes my 12h of teaching duties for this week's @dataScienceRet #aaaandimdone
Ok, here's one example: Want to know whether training/test data sets have the same distribution? Try to predict if a point is train/test.
Replying to @nikete
when I come across such a thing it's often straightforward (my academic speaking) applications of algs which nevertheless work.
Academia systematically promotes a blind spot for simple solutions that just work. #cantpublishthatstuff #isntthatknown
DHL informs me my repaired smartphone has been delivered to one of my neighbors. #gottago #unfortunatelynot
Next week I'll be at @berlinbuzzwords. Holler if you want to chat about realtime streaming analysis.
Replying to @amiorin
Every year, when the good weather starts, I'll be out for a few weeks. :(
Recently learned that non-tech people don't know that Amazon is powering the cloud. So what about the Frequentist / Bayesian divide?
@muratk3n for example, our babysitter doesn't even turn the TV off when she leaves. She just assumes we will be watching, too.
@muratk3n although there still seem to be people who just sit down in front of the TV in the evenings and watch whatever is on. #crazyIknow
Meh. Why is this month so busy. It's always like this. Really gets in the way of appreciating the summer ;)
Which makes me wonder, is the term TV show still a thing? Seems to be pretty consumption device centric notion.
Replying to @syhw
I always thought learning invariances and concepts is an important direction. But yeah... .
Replying to @syhw
Right now it takes a human with experience to know what reps work or not. NFL aside, getting to that for some domains would be great.
Replying to @syhw
yeah, on the contrary. Finding good representations from data is largely unsolved, I would say.
Replying to @syhw
last time I said that in public people were like "but... why are we learning all this stuff??"
Replying to @peter_c_william
.@peter_c_william OK, I should qualify: methods from a set which is above a certain threshold for expressive power.
Also, IMHO supervised learning is all in the representation. Get that right, and it doesn't really matter which method you use to learn.
Gave my first set of lectures at @dataScienceRet yesterday. Really a nice set of highly motivated, bright people.
RT @MartinSFP: Beats/Apple, Twitch/YouTube, Twitter/SoundCloud. All this unresolved M&A is making me tetchy.
Mysteriously, my Kindle Fire HD sluggishness has significantly improved. I'd like to think my little rant on Twitter did the trick, but...
Got my first set of slides down for tomorrows stint at @dataScienceRet. Let's see how that pans out.
Replying to @mpompery
@zeit_geist but I agree. It's all pretty useless. Heroes, Rockstars, Killers, etc. None of those sound like good team players.
And almost ever time you want to type in something it feels like it first has to load the keyboard and all dictionaries from scratch.
Kindle Fire HD's email program is also a nightmare. Sometimes it just sits there for minutes going blank.
Don't get me wrong, it's a nice piece of hardware, and once you're inside an app, it's mostly ok. But the Amazon interface is superslow.
When reviews mentioned that the Kindle Fire HD sometimes feels a bit sluggish, they weren't exaggerating. It just freezes for minutes.
Amazon's CTO Werner Vogels: "At Amazon we have the Institutional Yes. If you're opposed to an idea, the burden is on you to prove why."
There's a whole social thing going on with people asking you "why haven't you used [trendy but unwieldy framework they don't even know]?"
Some time during the last decade, coding has turned from building cool efficient stuff to plugging one framework into the other.
The #AWSSummit was good, but the venue was pretty awful. Narrow corridors, insufficient space. Didn't scale, unfortunately ;)
Replying to @samklr
IMHO one often looses a lot of performance by just stacking together frameworks. One should consider a more integrated approach.
Approaches to realtime still focus mostly on building infrastructure for the different aspects, leading to pretty bulky systems.
Actually pretty good in-depth overview of Amazon Kinesis by Mark Bate: mostly a distributed transport layer for streaming data. #AWSSummit
Keynote running late. I personally think that's never a good idea. People get itchy, schedule gets corrupted. Aaaaand I need coffee...
Installed some smallish footprint Twitter client on my old phone specially for #AWSSummit. (Twicca, still about 4.8MB)
Today I'll be attending the #AWSSummit in Berlin. Looking forward to the thing, although I'm not sure what to expect. ;)
Replying to @msmeissn
ok, at least more solid than what the Google folks are pushing out I guess... .
Replying to @msmeissn
thanks. Very mildly comforting. Linux kernel and firmware are rock solid, right? ;)
RT @msmeissn: @mikiobraun Google Play Services gets updated, so I guess most of the userland... But well, the kernel and the various firmwa…
RT @msmeissn: @mikiobraun (a) yes (b) your apps get however updated more regulary than once a year (and hopefully get security fixes this w…
Linux gets security updates every few weeks, Android devices get updates once a year, if at all. Should I be concerned?
Looks like it's time again to produce self-consistent artifacts for a project proposal. One of the sources is a table in 2pt Arial.
@moellus @golem ja, alles ganz schrecklich. Eines Tages werde ich über Android ragebloggen.
@moellus @golem ah, moment, das LTE hat 2GB RAM und darf doch (alles gut) (also wenn meines mal repariert ist)
Replying to @debasishg
The Data Science Day will be put up soon. I'll tweet the link when it's up.
Replying to @dnouri
@policecar I have colleagues who work on BCI. All the stuff he said about that was much much too optimistic. "Dropbox your brain"
Takehome message concerning real-time streaming from yesterday's #DSDay, technology space very fragmented, no one size fits all.
Pretty interesting Data Science Day meetup yesterday, now on to my third talk today: large scale learning from an infrastructure perspective
RT @auerbach: Someday, when I retire, I'm going to correct all the folks who have said incorrect things in Internet comment threads.
Replying to @lojikil
yeah, towards the end, broad overgeneralizations were uttered faster than my brain could question.
Ok, at least when he talked about latest advances in brain computer interfaces, I knew he was factually wrong...
Somehow this talk went from CAD for microchip design to full-blown futurism. Maybe being German my skepticism is too strong.
RT @drewconway: Love @mikiobraun's description why approximation is important for streaming data: don't need every point, noise, scaling is…
Replying to @jherritz
@drewconway that's hard to say in general, but will definitely check @MIOsoft out.
"Every customer says they have data. They seldom have useful data, unfortunately" #DSDay
6th Data Science Day about to start, hosted by Zalando. And of cause everyone is hiring. ;)
One talk down, two more to go: tomorrow at the 6th Data Science Day, Friday with the people from Volker Markl's group (of @stratosphere_eu)
Enterprise is when the sales agreement is thicker than the technical documentation. >;^)
I suddenly had the revelation that my three year old smartphone under 4.0 feels just as sluggish as my old Nokia E61i. Well done, Google!
Replying to @Shpanier
I should've add that it's gotten quite cold and cloudy the last two days here in Berlin, so having so much sun is a but surprising
Replying to @Shpanier
ah, you know how academics work, they'll say it has all been done before!
Replying to @kazuhito
hi! You're in Berlin? We had dinner a few years back in Tokyo with Taku Fujita. Do you remember?
RT @berlinbuzzwords: This year we've conducted interviews with our Berlin Buzzwords Speakers. You can find all interviews here: http://t.co…
When Batman stole the Millenium Falcon's hyperdrive and it ended up swallowed by the asteroid worm only dads laughed. #legomovie
On the other hand, without all the social apps on the phone (only about 300MB for apps), battery life is much better ;)
Although it's also interesting seeing today's apps trying to cope with a 320x480 screen size ;)
Had to hand in my Galaxy S3 today because the power button was broken. Now I'm stuck with 2011 technology for the next month.
Looks like I committed to about eight presentations over the next month and a half. Well...
Google+? Wasn't that this hypothetical social network they cancelled because it would've been crazy to retrofit the whole company around it?
@zenzenzen hard to say without restricting what you can do in map and reduce steps. One question is also what can be parallelized well.
Replying to @fhuszar
their statement reads as if they never really figured out how to make ends meet... :(
There are just too many events which have "Data Science" in their title. I don't even know what's what sometimes ;)
Gaah. My phone broke and my old one barely has enough memory to hold the updates to he Google apps. #firstworldproblems
TBH I never found the claim that the kernel trick is all about efficiency compelling. Learning is costly in num examples, not num dims.
RT @berlinbuzzwords: Enjoy the interviews with our #bbuzz speakers @mikiobraun, @beerops, @jericevans & @peterbourgon: http://t.co/jS4pPy7c
Replying to @viktorklang
I'm a big fan of Scala otherwise! Good thing you fixed that nuisance. (And I'll stop bitching about that, too ;))
Replying to @viktorklang
I know ;) First we waited for all the dependent libs to catch up and then we realized we didn't really need to move on, so yeah
RT @viktorklang: @mikiobraun No, minor releases are both backwards AND forwards Binary Compatible (from 2.10 and forward).
Replying to @viktorklang
so hopefully no libraries with special builds for each minor version of Scala anymore. I'd be very glad to see that!
Replying to @viktorklang
glad to hear that, once we make the transition from 2.9 we should be safer then... .
Replying to @viktorklang
ok, from 2.10 to 2.11 I would sorta expect this, but even from 2.10.1 to 2.10.2 you get these problems.
Given the persisting issue with Scala binary incompatability across versions, it's wise to stick with Java libraries. #cantberight
Replying to @jane_fel_reed
yes, but it could be that 713 occurred before but has been removed. The proof says that it's better to start with the
Replying to @mpompery
@zeit_geist well, we're happy with 2.9. And use mostly Java libs. So everything's good ;)
Replying to @mpompery
@zeit_geist yeah, but that is a bit too fast. Because you can only upgrade after all your dependencies have upgraded. We're still at 2.9
Replying to @jane_fel_reed
As shown in the paper, this corresponds to a worst case of how often you have seen the elem but missed it, so you get a bound
Replying to @jane_fel_reed
713 is not yet in the table, but it is full. So the elem with the least count is replaced but its count is used to init 713
Scala 2.11 and still every point release breaks most library dependencies. I don't even know why...
According to crunchbase, gnip had about $6.5M in funding so far. Don't know what Twitter paid in the end, but it must've been lucrative.
Replying to @fhuszar
where secondsync is clearly an acqui-hire while gnip seems to be allowed to proceed as before
RT @fhuszar: New wave of social tech acquisitions: klout ⊆ lithium the data republic ⊆ kantar media feedmagnet ⊆ bazaarvoice secondsync ∪ g…
The way Twitter has been forwarding people to Gnip for any form of elevated access I'm really not surprised they eventually bought them.
Replying to @amiorin
@itschrstn @minodes I will! I'll put them somewhere and post the link to the meetup page
We're the first to use technology X, all the others are still stuck with Y. #startuptropes
What we need is something like tvtropes for startups, a collection of story archetypes users are used to. Starting with "X is like Y for Z".
Ups, I might have too many slides for tonight. *flips through slides* Nah, these are _fast_ slides, no need to worry ;)
Replying to @Henrikop
If you just want to share some files I found it to be already pretty mature. Even the apps look nice enough.
That Windows in a VirtualBox has better knowledge of my battery status than xubuntu on the hosting Linux...
And once again, first I thought the results looked odd, but then saw that's just how the data is. #alwaystrustyourdata
Replying to @mdreid
very good, you share a long ASCII code and are good to go. Server not even required, just to make sure files are always online.
Moved from Dropbox to BiTtorrentSync - very nice so far, Mac/Linux/Win + Android app. Add a server and you're good to go.
Replying to @zaxtax
I guess you could translate it as center although that would be Zentrum in German. Mitte is just such an ordinary word.
That there exists a neighborhood which is just called "the Middle" is one thing I love about Berlin.
Replying to @cartazio
doing the whole "I repeat", "Roger that", "can you read me" over radio doesn't translate well to whatsapp 😉
TIL if you hook your thinkpad up to a power supply with insufficient watts it will throttle your CPUs and pretend the batteries are charged.
RT @antoine_roux: @roidrage @codemonkeyism the future of the logging libraries is unfolding now. So much expressiveness!
RT @antoine_roux: @roidrage @codemonkeyism don't tempt me. You know in Java these are valid characters. My Eclipse at the moment... http://…
RT @roidrage: Good heavens, why didn’t I think of this before? Emojis in log files for emotional context! status=😂 or status=😱
Replying to @eoinhurrell
@UltimateHurl I've nothing against the data structure per se, but I think I tend to allow too much stuff on there instead of saying no. ;)
Replying to @chrshmmmr
next possible steps are you come by to chat, give a talk on some previous work, and so on.
Replying to @chrshmmmr
you can send me or my prof Klaus-Robert Müller an informal inquiry.
TIL how acqui-hire visa's work: L-1 visas let's you transfer if your company HQ is (now) in the US (thanks, @thinkberg via @dosinga)
Replying to @chrshmmmr
we're always looking for new people. It's more the other way round that we're creating positions for good people ;)
I think I finally understood that the problem with those evergrowing to-do-lists is simply that you don't really want to do most on it.
That post took quite some time to write. It's a complex field. Tried to break it down to simply decision trees, but it's not that easy.
Sometimes I think ebooks work in spite of the technical shortcomings just because the power of thought is so strong.
I might be misreading this but doesn't the H-1B visa cap make acqui-hires of non-US startups very... impractical?
Replying to @MalteLandwehr
irgendwann wird's soweit sein. Und dann muss ich vorbereitet sein! ;)
Hehe, Amazon app store has the Dropbox app but claims it's incompatible with Kindle Fire devices. Just let me sideload the apk... there!
Replying to @pavlobaron
our only hope is that once it's ubiquitous, it will loose it's appeal. Anyone remember "2000"?
Well, well, why am I not surprised that the Dropbox app apk can be downloaded from the dropbox site. #kindletips
@Knights22 @marsty5 I mean all this theory for distributed systems exists, but probably too hairy for the average app dev.
@Knights22 @marsty5 linkedin pulse didn't let me upvote an article. Bookmarking worked. Pocket is nice. I like it.
@Knights22 @marsty5 I mean local actions which could be stored and transmitted later. Almost no app does that, throws error instead.
Sometimes I think the whole resilient distributed systems theory has been created for mobile devices. And then I realize nobody cares.
Once you have a tablet without 3G you realize how many apps assume they're always connected. HELLO!, you're on a portable device!
Replying to @sw17ch
@s1m0nw tear down and rebuild! Now we know how it should be done! #famouslastwords
Replying to @twiecki
cool! Then happy birthday to you, too, and welcome to the April birthday cluster!
Replying to @ChrisDiehl
@josh_wills thank you. *husky voice* I still remember the days when birthdays where new and fresh...!
Replying to @InkmiHq
@codemonkeyism yeah. I also think such overgeneralizations don't reflect the complexity of real life systems.
Replying to @dberkholz
:D you're so right. Also, no place to put things on top on flat screen TVs!
Replying to @jonoandre
or other low-level langs which compile to assembler. Rust looks interesting, too in that bracket.
Replying to @jonoandre
don't have played around with it enough to have a real opinion there. But I always felt that we need a contender to C...
Replying to @jonoandre
I guess what I'm trying to say is that I find it harder and harder to get excited about new programming languages...
And whenever you start a new programming language, you'll have to start over at least with tools and community from scratch.
That let's create a new programming language never gets old, does it? IMHO language is only one aspect of tools, libraries, and community.
RT @syhw: @mikiobraun My heuristic: pick just enough "PhD" to defend, just enough "relevant" to not despair, and maximise "interesting"! :)
Replying to @syhw
although even if you consider all convex combinations, it still holds that you can't max out all three.
Replying to @padjiman
yeah I know. ;) Just something that popped up in a discussion and then everyone was like "well, that's not that untrue, you know"
@SaturnDE Storno von Online zur Abholung bestellt geht nur in der Filiale? Das ist ein bisschen sehr umständlich :(
RT @berlinbuzzwords: Data Stream Mining Hackathon! @retresco and developers from @avantgardelabs and @streamdrill invite you to join! http:…
Riding the bus with @thinkberg while he's fixing some issues with perl on his server remotely via smartphone. #so2014
Trying out Kindle Fire + Autodesk's Sketchbook pro as replacement for my paper & pen + camera + inkscape + gimp diagram workflow.
When it comes to real-time big data, it's much more natural to talk in terms of bytes per second of new data than amount of data on disk.
One thing I don't get is that even discussions of real-time big data start with petabytes stored somewhere. How did they get ther?
Replying to @DRMacIver
Cool. You already know what you'll be working on? (Or am allowed to tell? ;))
@muratk3n typical PR dick move, the man just misrepresents and reinterprets reality to fit his agenda.
@moellus Ich vermute, das liegt am Fahrtwind, wird auch je schiefer desto näher an der Strecke!!
If @Pocket ever does the "we're excited to join Big X, looking forward to great new products", I'll be really pissed. It's just so good. ;)
Replying to @roidrage
it'd be interesting to see how hard it will be to bring most of the team to the US, visa issues and all.
Checking your email before you get out of bed was much harder in the desktop era where there were no laptops or smartphones yet.
Replying to @noelwelsh
yeah, man. Especially when it comes to social media Germans are extremely skeptical.
Replying to @noelwelsh
yeah. The worst thing about this is that you keep meeting the wrong people and they're quite discouraging.
User impact, media trends, real-time Twitter interest maps, word correlation clouds. It was all there. Guess marketing is the key.
What sort of bugs me about all this is that we had the tech ready with @twimpact years ago but only met people who said "That's nonsense!"
Somehow, all that online ad technology is really some crazy remix of web technology and marketing speak and concepts.
RT @Chris_iks: Anna Cremers präsentiert neue Zielgruppen basierend auf Daten unserer Partner @Nielsen, #Sinus, #PostDirekt, @streamdrill. #…
Replying to @msmeissn
@thinkberg so haben wir alle was davon! Also wir und die. Also... 😰
So apparently one of the collaborators deleted the Dropbox shared folder to "log out". Hilarious. Good thing there is history after all.
Replying to @syhw
yeah, you may convert one, but in this situation I'm in, there's just too many of them 😞
Replying to @syhw
once you get to p-level (professors and postdocs 😉) it's all Word, shared folders, and email.
Dropbox as the collaboration platform for people who don't know git. No history, no conflict resolution, no merges. 😭
Replying to @syhw
@nikete I'm always like "there's got to be a different theory which derives these update equations more easily!"
Replying to @syhw
@nikete I agree ;) Although the length of the expressions you tend to get halfway through the derivation is sometimes mind boggling.
Replying to @syhw
@nikete no offense, but I have yet to meet a Baysian who only cares about what it looks like on the code level (KRR = GP mean anyone?)
Replying to @nikete
@syhw so I guess a "proper" Bayesian Naive Bayes would first define some priors on the stuff you need to estimate.
Replying to @nikete
@syhw yeah, INAB* but I guess a true Bayesion would take offense at the way, you count occurrences. (*I'm not a Bayesian)
RT @GeorgeMonbiot: New research suggests no state of grace: for 2m years humankind has been the natural world’s nemesis. My column: http://…
I vowed not to learn jazz on guitar and now I'm practicing arpeggios and already don't know what to play over a one chord funk song.
Another piece of advice from the same book: "German electricity is among the most reliable in the world, Germans just like candle light".
Advice in a Japanese travel book about Germany: "If someone offers you a drink and you decline, YOU WILL NOT GET A DRINK!"
Replying to @cartazio
today seems to be one of those days where they need a bit of support.
Replying to @jeffbigham
I finished my PhD eventually, but I know exactly what you're talking about.
Replying to @jeffbigham
well, yeah, no judgement on my part. What I meant was they should continue as long as they are still wanting to finish.
Telling PhD students to hang in there, things will eventually sort out as long as you don't give up.
RT @gridinoc: re-decentralise the web, @timberners_lee on walled gardens http://t.co/SvRsrupGzp
Media
Replying to @aCraigPfeifer
because it's really hard to beat in terms of JavaScript complexity...
Replying to @chrshmmmr
Thought about that, too. But even in the same wifi network, smartphones seem so much slower.
Replying to @chrshmmmr
uuuh, finding a whole treasure trove of fast web site relates resources I was unaware of. Thanks for the pointer!
Replying to @chrshmmmr
I see. So it's really just the raw computing power in the rendering engine? How much is ARM apart from Intel? Factor 10?
Shoehorning a server OS like Linux onto an ARM powered box with lots of gfx hasn't turned out so well in terms of JavaScript perf after all.
Just try opening Google+ in your smartphone browser in desktop mode and you know what I mean.
I find it interesting how you can do almost anything via web on the desktop, but have to use apps on mobile b/c of bandwidth and latency.
@haiqus that's harsh... I think we're out of the woods. Actually, winter was quite Ok this year. Two weeks of about -10°C, that was it.
@haiqus probably need to root the weather first, don't know if an exploit for Chicago already exists ;)
@markus_breuer Hm, mal wieder Zeit 4sq zu installieren? Hatte das eigentlich gerade runtergeworfen...
@muratk3n also, I recently looked at the docs and realized they're are just man pages. I think they developed the OS on Suns. Now I can tell
@muratk3n I'm not surprised! ;) I try to remember the C compiler I used. Pretty sure it wasn't Aztec or lattice...
I have to admit I'm still somewhat suspicious of some modern C features because the PD compiler I learned C with on my Amiga lacked them.
RT @torbenbrodt: @mikiobraun will speak about machine learning on streams next month at @bigdatabeers berlin http://t.co/keuracnC60
Pretty sure I heard the Google flu predictions didn't work the second time already a few years ago. Why now?
RT @DRMacIver: Turns out studying probability and thinking you can do statistics is like studying Hilbert spaces and thinking you can do qu…
Still processing the massive interest in data-able (hehe) talent at last week's CeBIT trade fair. Maybe we should've named the co Data Inc.
OH: "We're just setting up a prototype with Storm. In three months we should know more". THREE MONTHS? #realrealtime
Replying to @noelwelsh
@posco hello, how may I help you? ;) Classify what from the tweets?
Replying to @random_walker
. @random_walker that's just like the Netflix competition. Always taken as an example for data challenges but had severe legal problems.
RT @random_walker: Google Flu Trends is everyone's favorite example of benefits of data mining. But its predictions are surprisingly bad ht…
Replying to @alexott_en
those were there, too. In fact both SAP and Telekom had huge parties - invite only unfortunately.
Replying to @clairikine
well they sell food... . Calling it "lunch" is probably a bit of a stretch ;)
Still decompressing from yesterday's CeBIT event. That was an impressive display of boss's bosses and their bosses.
Replying to @aCraigPfeifer
@kristamonster we have elektronische Datenverarbeitung, though (EDV), but that's more 80a style.
Replying to @MrChrisJohnson
we found that combining that with more batch oriented systems gives you the best of both worlds with little overhead.
Replying to @MrChrisJohnson
currently, we're tracking about 10M user profiles at 20k/s on a single machine.
Replying to @MrChrisJohnson
using that class of algorithms allows you to focus on active users already with one machine.
Replying to @MrChrisJohnson
which I could imagine being relevant for you, too, if you want to base recommendations on realtime user behavior.
Replying to @MrChrisJohnson
yeah, I seem to be focused on ads right now, because we're doing a project for real time ad targeting. ;)
Replying to @MrChrisJohnson
is certainly something which can strength spotifie's overall position (although you're probably more focusing on paid users)
Replying to @MrChrisJohnson
yeah, I liked the music => music recommendation. Not that I love ads, but I think having good or even great ad placement..
RT @MrChrisJohnson: @mikiobraun We've had recommendations for a while. One hope with this acquisition is that we can leverage new signals …
Replying to @MrChrisJohnson
I think part of the problem is also that there is really very little diversity in the ads, at least here in Germany. ;)
RT @JesselynRadack: #Snowden is “willing to provide testimony to US Congress on unconstitutional mass #surveillance.” NO ONE HAS ASKED. htt…
Replying to @eoinhurrell
@UltimateHurl @xamat yeah I've suspected that, too. But if people actually get relevant ads, you can charge more and people are happy, too.
Replying to @eoinhurrell
@UltimateHurl @xamat having suggested tracks was probably more of a business decision than something they were actually keen on doing.
RT @andyrtd: "Creating Business Value from Data" - Expertenrunde auf der CeBIT, Do, 13.3., 13:30-14:15, Halle 8 #Bitkom #Teradata #CeBIT2014
Replying to @eoinhurrell
@UltimateHurl @xamat oh yeah "you've been listening to Jazz and Funk. How about Peter Maffay's latest album?" Be glad if you don't know him.
Wait, does that mean Spotify didn't have proper recommendations so far? That would at least explain their horrible ad targeting so far.
Replying to @xamat
and it's about time! They have so much potential for ad targeting and recommendation which is still untouched.
RT @berlinbuzzwords: Looking forward to "Real-time personalization and recommendation with stream mining" a talk by @mikiobraun #bbuzz
Replying to @eoinhurrell
@UltimateHurl hehe, well as I said, it probably turned out ok for them moneywise...
@treycausey @UltimateHurl I wished just one of them said "sorry, we were running out of cash, but we got a good offer for our team"
Replying to @eoinhurrell
@UltimateHurl if only I wasn't suspecting some never intended to run a profitable business in the first place...
Replying to @eoinhurrell
@UltimateHurl well I guess that's just he new economy's way of going bankrupt.
This is why I don't want to try new services anymore. Good that it worked out for you. #grrr #vizify
Media
Twitter analytics really feels like they crunch the numbers once per night tops. Isn't Twitter totally real time? Don't they have Storm?
Finally getting some data into Twitter card analytics. Which is nice. But also so massively non-realtime... . Must be me.
Replying to @chrshmmmr
@stephenroller ah, this is just like talking to my father, can't tell whether he's being serious or laughing on the inside. ;)
Replying to @chrshmmmr
@stephenroller my father's really into esoterics, meaning I've seen my share of stuff. But this... .
Replying to @chrshmmmr
@stephenroller Just took a look at /r/singularity. Holy Maloney. *closes tab*
Replying to @chrshmmmr
if they only could understand what the state of the art in machine learning is...
Holler if you remember coding on screens with 4:3 aspect ratio. When 1280x1024 was super hires. #whatisretina
@moellus @thinkberg @zalez Ok, also erst mal in der Chinaimporthalle Zeitmaschine lokalisieren.
Replying to @chrshmmmr
what I like about that sentence that it is unarguably true, and yet totally vacuous.
"egrep is the same as grep -E. fgrep is the same as grep -F. rgrep is the same as grep -r" #wtf
Can't even say the reviewers are factually wrong, I just don't agree with measure of relevance.
Today the mismatch between what I considered interesting solid work and what appears to be publishable showed again. Time to move on.
Replying to @clairikine
ah, sorry the line was "gotta get my groove on"... . Still can't find the song this is from, though...
Replying to @jstanier
what's up with all the rainbows, must be the 5th in my timeline today ;)
RT @snipeyhead: Posts generated by a Markov chain trained on the King James Bible + Structure and Interpretation of Computer Programs http:…
Replying to @mdreid
@ravi_mohan but it gets a bit easier to pick up some things, so there's a chance of an uphill struggle.
I had a dream I lost all my followers. Then my daughter woke me up, saying she had a nightmare. All I could think was "you have no idea". ;)
They really did just about everything which is technically possible and never questioned morals, did they. #gchq #yahoo
Another reason why to do the startup in Berlin: we have the fewest holidays in all of Germany. Like today 😓 #koellealaaf
RT @GooglePoetics: maybe if my heart stops beating it won't hurt this much maybe if we never wake up we can see the sky maybe if… (cont) ht…
RT @appcode: Tip of the day: Detecting unreachable code is easy with AppCode. Can your IDE do the same? http://t.co/ap5XXr6W7n
Media
Replying to @alper
or not relying on some superfuzzy fully automated system to "identify" violations. But apparently they don't care enough.
Replying to @alper
Hm. Specifically in the case of YouTube I think they could make a much better job of fighting abuse. Like having an appeal process.
Replying to @alper
you mean because no one would buy music if there was no law protecting it?
Replying to @alper
I don't have a problem with copyright, but how the music industry is more about rights management and less about art anymore.
Replying to @alper
also, the main problem is that Google and GEMA don't find an agreement. GEMA per se is not the sole culprit here IMHO.
Replying to @alper
in the case you mentioned it really seems to be about removal due to copyright claims, not that darned "not available in Germany".
Replying to @alper
to be fair, Google is doing that in a for of preemptive compliance to copyright holders, not the GEMA.
@junglebarry already back in 2000, I heard people from Vodafone joking about the obscene price per byte people pay for SMS.
Replying to @JobMonsterIT
.@JobMonsterIT in der Tat. Ist das ein Trick? Wer den Fehlerquote knackt, bekommt den Job? 😆
RT @ufried: so true! just way too familiar, seen it way too often. RT: "@SeiryokuZenyo: Too busy to improve? http://t.co/9qnU8TPuUn"
Media
A friend of mine installed Telegram and directly got a message who in his address book is also using it. Instant uninstall. #privacyfail
Replying to @cartazio
we also wouldn't do projects which are completely unrelated. As always, more traction would be fine, of course.
Replying to @cartazio
it's OK, we've identified a core engine which works for a lot of things, so customization efforts are manageable.
Replying to @cartazio
just saying you might not have that luxury right now in that area because the interesting stuff is pretty rare.
Replying to @cartazio
yeah well ;) I can understand that it's hard to work it out, in particular if you're used to proximity.
Replying to @nikete
I agree. But I also still believe that heat is more reasonable metric than click-through-rate. ;)
Replying to @mdreid
I tend to think the main difference advantage of our bodily sensors is that evolution had a long time to figure out what's vital.
RT @mdreid: @mikiobraun I won't believe that until you state precisely how you quantify and measure the very richness of life and run contr…
Replying to @chrshmmmr
not necessarily. Depending on how it goes you can also try to become big (like Cloudera right now) before you exit.
Just saying if you say disruptive technology is the main focus, you should be ready to deal with non-standard teams or locations.
RT @chrshmmmr: @mikiobraun Heyhey, don't forget about vanity metrics to bring back the colour.
Gotta say Microsoft's coffee shop at Unter den Linden is pretty St. Oberholz-y. Except for the heavy M$ branding.
RT @DRMacIver: "What's a data scientist?" "It's someone who has multi-classed in programmer / statistician"
Just remembered how I told someone about git years ago. He: "oh really? What would Linus Torvalds say about it?" - "He wrote it!" ;)
Replying to @alper
@janl I mean was it a technological problem or more of a social problem to get people agree on anything.
Replying to @alper
@janl yeah I can imagine ;) maybe you can blog about it sometime, though.
Replying to @alper
@janl I don't care which stack. Open protocols, that's how the Internet was built.
Replying to @Nico
dadurch konnten die sich auch als "kostenloses SMS" vermarkten. Der Rest war Netzwerkeffekt und Glück.
Replying to @Nico
meiner Meinung nach war der Clou bei WhatsApp immer, dass die Telefonnummer die id, so dass die Anmeldung supersimpel war.
The ability to talk about something one does not fully understand is among the human mind's most impressive and annoying features.
Replying to @InkmiHq
@codemonkeyism at least you'll have it easier finding some risk averse investor.
At least the press can't claim anymore that Facebook is dying because the young people prefer services like WhatsApp. #itsamarketingplay
According to Wikipedia, WhatsApp had 400M active users in Nov 14. So that makes about $50 per user.
RT @shamir: Facebook will use 35% of its cash for the deal. And Sequoia will make 3.5B for $60M investment. Well done! #WhatsApp
Anyone can normalize a vector. The art is to compute the normalization factor from elsewhere and be right (up to machine precision).
People who neither know source revision control systems nor attachments just mail the changed LaTeX code.
Anyone know whether people have considered the limit distribution of recommendation systems? (If one followed all recommendations?)
One thing about getting old is that you one day realize you own stuff which is 10 or 20 years old.
In retrospect, installing Windows in a VM because I started to need it more often for my work was a warning sign.
Replying to @mfcabrera
only recommended books I know, haven't yet read Murphy's book. But feel free to mention it in the comments.
@treycausey right, now I remember someone telling me some people fit models and compare likelihoods and that's it.
Replying to @beaucronin
@gasnerpants sure, sounds very interesting. Mail or the comments on the blog for starters?
Replying to @furukama
@peteskomoroch sometimes one sees stuff one wasn't event aware of. "Ah, I see. But no, that's not how it is done" kind of stuff.
Replying to @furukama
@peteskomoroch not that convinced for MOOCs. Nothing beats someone experienced sitting down with you and looking at your stuff.
Replying to @peteskomoroch
don't really know how to change that except for the hard way, doing solid work.
Replying to @peteskomoroch
but yes, I guess the "let someone have a look at the data and figure out what to best do" might make execs nervous.
RT @peteskomoroch: @mikiobraun I agree, higher leverage tools bring more risk along with ROI. A single analyst or flawed tool can cause maj…
Replying to @peteskomoroch
those are two issues at quite different levels of the hierarchy.
RT @peteskomoroch: @mikiobraun real blockers in data science: 1) skill to identify & solve the right problem 2) executive support to fund &…
Replying to @MassimoMorelli
those are affiliate links to Amazon, so probably you're blocking them ;)
Replying to @cartazio
maybe provide the usual primitive types and then usual collections with thin layers I for each language?
Replying to @cartazio
yeah and somehow get around the temptation to first copy and convert everything into ones one format.
Replying to @cartazio
true. You would need to have access to the in-memory stuff. Unlike in Hadoop. So a file system for in-memory? ;)
Replying to @cartazio
as long as we keep it real, we'll be safe. That's my current working hypothesis ;)
Believe me, I know it's hard, but I'd rather prefer someone just plain told me "please try my project" than faking objectivity. #datagenda
What's the name for Big/Data/Science related articles whose sole purpose is to push some hidden agenda?
And unlike programming, data analysis doesn't even "crash", you always get out some numbers. IMHO it takes experience to interpret those.
Reminds me of the mid 90s when people were predicting visual programming would enable everyone to write programs.
I don't know whether the word mainstreamification exists, but that's what happening to data analysis right now.
Replying to @ChrisDiehl
@mdreid Nico mentioned the paper ON-LINE ONE-CLASS SVMs. AN APPLICATION TO SIGNAL SEGMENTATION. By Arthur Gretton.
Okay apparently, it's called tweetception #sigh And given how that one status by @isaach has been referenced, there's no end to it... .
RT @fjsteele: My 7yo daughter captured the essence of programming after an hour with @hopscotch http://t.co/693pMHmZFX
Media
Replying to @mdreid
still better than C++ templates compile errors. Although that has also been a while...
Replying to @mdreid
probably got better since I tried it 4 years ago ;) back then it was like "no idea what the type is because you forgot a semicolon".
Replying to @mdreid
yeah. My impression was also that the compiler has a hard time isolating errors because all is just one large expression.
Replying to @ChrisDiehl
@mdreid I'll ask Nico, one of our students (not in Twitter). He's done a lot of work on one class SVMs and friends.
Replying to @chrshmmmr
indeed ;) There also was some article somewhere how Apple does not disclose where they get all their rare elements from.
Replying to @chrshmmmr
hehe. I think what he meant was that CS people like to define all kind of things even if wouldn't be necessary.
Math prof once called CS very "definitory", lots of definitions little theorems. Today the term should probably be "frameworkory."
Replying to @LarsFronius
Zombie without appetite for brains. That sounds bad :( Get well soon!
Replying to @LarsFronius
As long as your appetite still mostly focusses on brains everything's fine ;)
People saying that big data is cheap because tools are open source and free forget that the biggest cost and time factor is people.
Whatever you worry about, be glad you don't have to work in Apple's lithium mines. #firstworldperspectives
Replying to @superglaze
gosh. So first you bet your friends marriage won't make it and then you start hitting on his wife to nudge the odds?
@haiqus as far as I understood, you quickly get some money for high interest rates and the option to get shares if you default. Pre-IPO...
@haiqus Actually I first had to look up the term to see whether they just made it up or its really a thing.
Does that mean Klout will stop sending me emails asking me to reauthenticate their Facebook access?
@haiqus at this point it's really hard to tell whether it's bad UI design or they're just gaming their KPIs. ;)
RT @manamica: "The person who says it cannot be done should not interrupt the person who is doing it." - Chinese Proverb
New Twitter app interface layout leads to many accidental favs. I see them all the time. I do them all the time. #accidentalengagement
RT @drewconway: Dear #strataconf attendees: don’t forget to print your Data Science Conference Bingo cards, by @tdhopper https://t.co/6p5DE
RT @nik: streaming map-reduce is basically telling your customers that you want them to stick nails in their head #painful
Anyone still remembers when people were suspecting Steven Elop is on the short list for the next CEO after the Nokia deal?
Replying to @roidrage
bei uns haben die Kinder mal die Platte kaputt gemacht, da war nur so Wabenpappe drin unter einem 2mm Furnier.
Replying to @pavlobaron
I've grown quite suspicious of all large frameworks. Too often the specific application relevant to you is hard to fit into.
Replying to @pavlobaron
often I think people are just lazy or don't have any real world experience so they say word count!
Replying to @mdreid
you almost got me, but when you mentioned your mail I finally knew you were just joking 😉
Replying to @noelwelsh
somehow I skipped the latest developments in favor of Scala and algorithms. #regretnothing
I hate to admit it but all that time of infrastructure coding in Scala has definitely affected my Python data analytics skills.
When an email account contains >80% spam, mailing lists, and notifications it should be legal to just abandon it.
Returning from an exhausting trip out with the kids because "oh look the sun is shining it's like spring outside" ✅
Replying to @purbon
two years ago they still stacked devs four rows deep in a room with 25m² 😉
@muratk3n a lot of it is explained by the fact that Thrun is not a trained statistician. I've seen lots of sloppy math in ML courses.
I mean unlike the error messages which I get from C++'s templating system. There I always think "well, gcc, you should've known better!"
Is finding Scala's type system error messages actually helpful a Good Thing or just signs of Stockholm syndrome?
Replying to @pavlobaron
I see. Yes that might be true. But the track endpoint is much better than it seems because it doesn't subsample but rate limits.
Hope the Twitter data grant doesn't lead to close control of research through Twitter. Results only valid if you are in, in only if you fit.
Replying to @pavlobaron
but what you said is probably exactly the kind of branding they are trying to achieve. Research only valid with Twitter grant.
Replying to @pavlobaron
also if you track certain keywords you'll get all the tweets unless it's very high volume.
Replying to @pavlobaron
depends on what you want to do. Looking for impact and trends, etc., then it's already enough.
Just start with the public sprinkler feed and store the data somewhere and you'll have more data than you'd ever need in a matter of weeks.
And if I really hungered for some Twitter data, last time we checked we still had a few TB of data lying around somewhere. #toolatetoolittle
Really, Twitter data grant? The amount of flatness this leaves me with is a good indicator how far we've come with @streamdrill.
Replying to @jonoandre
agree, I think the technology behind it is on par now. We'll see how this pans out. Exciting times ;)
Replying to @jonoandre
above all, search results were really bad in the beginning. I think that's what mostly broke them.
Replying to @jonoandre
had to go there to check. Whoa. You're right. What's up with that funny logo anyways...?
Replying to @jonoandre
yeah, that's right. That's from a time when services could have funny names. Like "Yahoo" and "Google" ;)
RT @medriscoll: When companies have a lot of cash, they take its management in-house. The same goes for data. On-premise computing isn't go…
It's almost comical how Facebook's friend recommendation algorithm has no sense of hierarchy. Friend request to your friend's boss? Yes?
Replying to @mdreid
that's exactly the kind of tweet which would cost me dozens of followers normally ;) #approved
Replying to @mdreid
I think the proof works irrespective of whether it's monkeys or highly trained mathematicians. So yes, provided the proof exists.
Replying to @RandomlyWalking
you mean he mistook it for the Bing search bar? ;) #pleasesomeonestopme
Okay, @satyanadella's also talked a lot about Bing. No prophetic powers at work, apparently 😉
Replying to @eoinhurrell
@UltimateHurl @twiecki resistance is futile! Prepare to be assimilated! 😉
Five years ago, @satyanadella's first tweet just said just "machine learning". WHAT DOES THAT MEAN???
That moment when you accidentally implemented a standard algorithm just because you did what felt right.
Replying to @chrshmmmr
but that's all stuff you'll experience first hand in a not so far future I fear ;)
Replying to @chrshmmmr
hehe. Yeah. Some people need the push from the deadline to overcome "writer's" block and actually do something.
Replying to @chrshmmmr
I thinks it's much too often that nothing gets done unless someone imposes a deadline.
Replying to @peter_c_william
Where is the paper draft? - I didn't do it. - This was a test. And you passed.
Replying to @chrshmmmr
those are the good deadlines. The others are just random people requesting random stuff by tomorrow ;)
Replying to @mdreid
@DRMacIver I always dream of a dynamic site based on the same files used for Jekyll. Shouldn't be so hard, right? RIGHT? ;)
RT @jroper: I'll tell you what I do want #chrome, I just want to view the search results. Would that be OK with you? http://t.co/djZ0EREWtc
Media
@thinkberg do you know a place? "@purbon: Any idea where to get an smart card reader (ISO 7816)in Berlin?"
Replying to @PreetamJinka
there are quite a few startups and middle size companies plus soundcloud and a few big ones. But no banks or insurances.
Replying to @erensezener
probably ;) I have to admit the screen on my Galaxy S3 has 306dpi, comes close, and sometimes I just admire the sharpness. ;)
I always used to say I don't understand Retina displays because lack of pixel density was never a deal breaker until... No, still agree.
From a paper draft: "[..] has spurred interest in varying fields, including [..], among others." CAN WE BOLD THAT STATEMENT A BIT, PLEASE.
Anyone still remembers when there were three competing video cassette formats? Anyone still remembers video cassettes? Anyone?
RT @rbranson: @kellabyte @nzkoz let's be honest, pretty much the only platform where threads aren't a total shit show is the JVM.
Replying to @ChrisDiehl
I gather from the time gap in your retweets that you managed to get some more hours of sleep? ;)
"But the smartphone market is super competitive, and [..] it helps to be all-in when it comes to making mobile devices." Wow. Surprising!
Replying to @ChrisDiehl
no problem, carry on, whatever calms your mind. ;) Also: lots of interesting stuff!
Replying to @ChrisDiehl
oh, wasn't commenting on your retweet streak but on Google selling Motorola to Lenovo ;)
Replying to @eoinhurrell
@UltimateHurl actually I saw a photoshop of a gozilla-sized Data somewhere a few days ago.
Replying to @ayirpelle
to me the interesting revelation was that once you have a critical mass of talent you can get any price you want.
My 6yo daughter looked at my screen and said "why are you writing so much for your work?" Then I realised that it's really all just text.
Replying to @beaucronin
@johnmyleswhite @syhw @ogrisel I did it. Funny thing, in the end I thought "it was probably worth it". ;)
Replying to @ogrisel
@johnmyleswhite @syhw @beaucronin I will, but the beauty lies in that we can't really know ;)
Replying to @syhw
I wasn't planning to attribute each piece of information to the tweeter, maybe just a line at the bottom saying "thanks to..." OK?
Replying to @johnmyleswhite
@syhw @ogrisel @beaucronin I'm considering putting the pieces of info from our discussion into a blog post. Any objections?
If you introduce an ethic committee into a story about AI, it's has to be overthrown later. If not, it shouldn't be there. #chekovsgun
Replying to @benhamner
it's like introducing a gun in storyline, it has to become significant later on.
Replying to @ogrisel
@syhw @johnmyleswhite @beaucronin that's actually a pretty good and we'll researched article I have to say... Thx for sharing!
Mostly wrapping my head around missing unsigned integer types in Java and handwelding correct carry bits out of thin air. 👾🎆 #mindblown
Wrapping my head around bit level arithmetic in twos complement and stuff. In Java. I just loooove my job. 😉 #isthatsarcasm
Replying to @beaucronin
@syhw @ogrisel @johnmyleswhite ah great, first patent portfolios, now people. Where do I sign up? ;)
Replying to @syhw
@ogrisel @johnmyleswhite @beaucronin I see. Who else is part of the team? Who from torch?
Honestly, I'd rather see Google buy some serious UI and enduser product guys than yet another Deep**** corp.
Replying to @beaucronin
@johnmyleswhite which IMHO are true but of a purely theoretical nature. Hope deepmind is not just working in that same direction
Replying to @beaucronin
@johnmyleswhite First serious use of the Brainfuck language in academic... . Apart form that, lots of claims of "universality"
RT @beaucronin: @johnmyleswhite I bet he is. But do these people understand the actual limitations of the tech they’re buying?
RT @ML_Hipster: "Split on income < $50k then… no, wait… test if years employed > 5 & predict 'no loan'… I mean 'loan'. Should I prune?" — I…
Non-empty iterator. Oh really Scala? I've gathered as much. Thanks for being so talkative... ;)
Replying to @aCraigPfeifer
I thought it was implied that we're talking about a method of mine ;)
Showed my 6yo a bunch of posts on vine. Her comment: "why is everyone screaming?" and you know what? It's true.
RT @stratosphere_eu: Stratosphere at the #CeBIT 2014 in Hanover. Meet the developers in Hall 9, Stand E44 and learn more http://t.co/77MzNx
Replying to @JoergM
@zynisch auf jeden! Rockstar Ninja Killern am besten. Bestest of the Best!
Forget about Big Data. The only thing that helps against your cold feet are Big Socks. #winter #berlin
LinkedIn claims @thinkberg endorsed me for... wait for it... PostgreSQL but he denies any involvement. What is going on??
Replying to @izendejas
@matei_zaharia I think that is the idea. If you know about iterations, you can even decide automatically what to cache.
Replying to @izendejas
not sure what the benefit is, but others like stratosphere have such ops, which they claim can be optimized more.
Replying to @izendejas
yes, you can do iterations by iteratively running things on spark, but there is no iterate() method.
Replying to @tyldurd
what really got me was how some places are still ok and some are really really skippy.
Oh great, first subzero temps of the year, 3 minutes of light drizzle and everything's covered in ice. Still want to swap, @haiqus? 😉
RT @alung: @mikiobraun @ogrisel compared shark with hiveª, yields great speed improvement* ** ª 0.9 * for not too complex queries ** with e…
Replying to @ogrisel
thanks. Yeah I'm probably not database guy enough to fully appreciate Spark but I'll keep an eye on it.
Mike Olsen (CEO of Cloudera): Pure open source + services biz model won't cut it, you need open source platform + proprietary tools.
On the other hand, I find the bandwidth involved in providing HD video on demand for millions of people just insane.
Broadcast cable TV is so odd. I pay 15€ or so per month and can't even watch what I want when I want to.
@haiqus most people I talked to (well complained about the weather) said they'd prefer a dry -6C to the rain.
Replying to @ChrisDiehl
but you're totally right. You need to be able to react in real-time. Humans often cannot. But machines can.
@haiqus which seems to be something entirely different... . Ok. What next? File a feature request? ;)
@haiqus Hm. It seems the set of endorsment categories is fixed. CATSWeb is the closest thing there is... .
After big refactoring code surprisingly compiles but data disappears mysterically. So much for the benefit of type systems.
Replying to @mdreid
and we're just getting started with the winter -10°C is not uncommon for Berlin.
Horrible weather, 4°C, something between cold rain and soft snow. And wind. Ugh. #berlin
Replying to @aCraigPfeifer
@noelwelsh @chrshmmmr I'm thinking Web video mini episodes of 2 to 3 minutes each. Need to finde some media interns with Macs
Replying to @aCraigPfeifer
@noelwelsh @AmazonVideo @Betas very well. Let's draft some episode and do a bit of storyboarding. Footage can be reused.
Replying to @aCraigPfeifer
@noelwelsh that's basically it. Subtitle: "but the emotions are real." ;)
Replying to @aCraigPfeifer
@noelwelsh I always dreamed of a research TV show. You'd only see people hacking away at keyboards, sometimes yelling out.
Replying to @noelwelsh
I'm looking for something more established which will still be around in the next 15 years or so.
Replying to @aCraigPfeifer
anticlimactic at best. Last time it was more like "why is my battery dying so fast now???". *turns Google Now off*
Replying to @aCraigPfeifer
Deep down I believe that one should never count on getting an update, but in reality I can't help it ;)
Replying to @aCraigPfeifer
Samsung Galaxy S3, T-Mobile/Germany. Kitkat is still only a rumor, though.
Ok, I just got Android 4.3 in late December and now they're claiming KitKat will roll out in April. #likeIgiveaf
Which gets me thinking... What kind of blackmail material has Google found on its servers to pay only $3bn for a company worth $230bn. #joke
We don't have Nest in Germany. So half the time I'm reading "Google acquired Nestlé". 😉
Waiting outside for the download to finish is the new lemme just finish this cigarette.
Immutability on the interface level does not mean you should use only immutable data structures in the implementation.
Because, let's face it, the most amazing thing about Java is that it's kinda messy but we've still build whole industries on it.
Replying to @superglaze
gosh. What happened to listening and hugs? Now empathy is just about sharing knowledge.
At least that's what I keep telling people but interest in unchecked meteorological fact seems less widespread than I assumed. ;)
Need to check whether current harsh winter in eastern US and relatively mild weather in Europe are caused by the same low pressure system.
Replying to @sscdotopen
@iamuce yeah, would like to take a look. You can also share privately, if you prefer ;)
Replying to @sscdotopen
any API pointers? Anything connected to the RDDs looks just like Scala collections (probably on purpose, though)
Replying to @jasobrown
my brain is just sufficiently working to catch up with Twitter for a few minutes that early in the morning ;)
Replying to @jasobrown
I see ;) So you're saying you get up even an hour before that too hack on side projects? I'm impressed. O_o
Replying to @jasobrown
lucky you, I have to stand up that early just to get my kids to school & kindergarten ;)
Replying to @nikete
interesting, although they only study one data set. Wonder how results hold up over a varying range of difficulties.
RT @SZ: "#Pofalla erklärt Bahn-Verspätungen für beendet": Ex-Kanzleramtschef wechselt zur @DB_Bahn - und das Netz tobt http://t.co/GPL6QJ7t
Replying to @superglaze
indeed. I have vague recollections of my mother using an eurocheque to settle some bills. Must have been in the late 80s.
RT @drewconway: “Data science is a buzzword…in particular to re-brand existing competitive intell and business analytics approaches.” http:…
RT @jeffjarvis: A key point from @ioerror's #30c3 talk: the internet is under martial law. http://t.co/HBQk9KAVy1
Media