ML in Limited Medical Data Sets: Doing Precision Guesswork – Jiří Mekyska
ML in limited medical datasets usually means doing precision guesswork on unreliable data provided by those with high expectations. The first part of this talk will focus on issues that data scientists and engineers have to address when working with this kind of data (e.g. unreliable labels, effect of covariates, necessity of clinical interpretability, difficulties with fusing more data sets), then some common ML approaches in this field will be described (yes, sometimes you have to go from XGBoost to logistic regression) and finally some specific ML-based research studies in the field of neurology and psychology (diagnosis and prediction of motor/non-motor deficits in Parkinson’s disease, assessment of graphomotor disabilities in children with developmental dysgraphia) will be provided.
Jiri Mekyska is the head of the BDALab (Brain Diseases Analysis Laboratory) at the Brno University of Technology, where he leads a group of data scientists and biomedical engineers. He deals especially with the research of non-invasive quantitative analysis of neurodegenerative and neurodevelopmental disorders based on speech and handwriting processing. In cooperation with neurologists/psychologists from different countries, he develops diagnostic/monitoring systems focused on Parkinson’s disease and developmental dysgraphia.
How to do ML if you have lots of Google’s GPUs – Vladimir Macko
Journey of a student from UK Bratislava into doing research at Google New York as an AI resident. What is AI residency, what is AutoML, what is architecture search, how can one make use of tons of GPUs, why does AdaNet have such an amazing GIF, what does it mean to play at state-of-the-art levels of accuracy in image classification, and what is so far the best indicator for getting into the AI residency?
Vladimir „Vlejd“ Macko is a graduated from UK Bratislava with 4 years of ML startup experience and two internships in Google. He spent his last year in Google research New York as an AI resident. He worked on architecture search in AutoML team AdaNet, and combinatorial optimization for mixed integer program solving in collaboration with DeepMind.
Let There be Light… Anomaly Detection in Network Traffic – Petr Chmelař
Finding a needle in a haystack is easy compared to finding security or traffic anomaly in a network. Some say it cannot be done. Or can it?
- What was in the beginning?
- What information is in network traffic data?
- Anomaly Detection: If it works, why? If not, why?
- How do you train 100TB/day for a month in an hour?
- Is 1 bite in 1,048,576 exciting enough?
- Do you really know all the admins in your network?
- And on that bombshell, how we created GREYCORTEX?
Does this sound boring? Don’t worry, there will be free beer at the meetup!
Petr Chmelar, the Chief Technology Officer of GREYCORTEX, possess 15 years experience in advanced data mining, machine learning, and artificial intelligence. Starting at Brno University of Technology, he gained multiple successes in US-based NIST challenges (TrecVid and AVSS Challenges between 2008 and 2012) and acted as a yellow-teamer in the Crossed Swords NATO CCDCOE cyberdefense exercise. He is a co-founder of GREYCORTEX and the mastermind behind its technology.
EdgeAI: running ML models on embedded HW – David Filip
EdgeAI means that ML models are processed locally on a hardware device by using data that are generated by the device itself. By using readily available platforms you can easily create smart devices that are able to interact with the outside world.
In this talk, we are going to talk about embedded platforms that can run your ML models. We are also going to talk about sensors and how to connect data signal to model itself. In the last part of the talk, I will take a few hardware components and we will build something interesting. You might even see me burning some expensive hardware.
David Filip is a software developer. For the past two decades, he keeps failing to build his own startups as well as helping other companies build new software. From VC funded Silicon Valley consumer-grade software to laboratory instruments. Most of his work is about interfacing with hardware or printing stuff on paper. Nobody ever wanted to learn about latter.
Machine Learning Solutions for Forensic Investigations – Kateřina Veselovská
E-mails, Facebook statuses, Whatsapp messages… We all leave textual traces all over the place every day. Text analysis, including authorship identification or anomaly detection, has been a critical competency within forensic investigations for a long time. However, the volume of analyzed text data grows rapidly and multi-language litigations are getting more and more common. Traditional approaches are getting costly and less practical and it is getting difficult to find a “smoking gun”. To discover patterns and trends in the data, it is no longer possible to rely on manual analysis only. This talk introduces the basics of forensic linguistics and the state-of-the-art ML methods and tools for automated unstructured content analysis within electronic discovery and crime detection as such. Real-life examples and demo included.
Kateřina is a data scientist with a natural language processing background, focusing on semantic analysis of textual data. Having previously worked as a product developer and business consultant specializing on text analytics in the big data domain, she is now involved in forensic data analysis at Deloitte. Kateřina got her PhD. in computational linguistics at MFF UK Prague. Her research concerns mainly sentiment analysis and information extraction. She gives lectures on Linguistic Applications at Charles University in Prague and Palacký University Olomouc.
Challenges of practical NLP – Jiří Hana, Rado Klíč
Customers don’t care about your F-measure.
„In theory, there is no difference between theory and practice. But, in practice, there is.“ (source unclear)
We will talk about our experience with NLP in business as opposed to academic settings. Using examples like sentiment detection or football article generation, we will illustrate various challenges such as lack of data or very specific customer requirements.
Jirka Hana is a co-founder of Geneea. He has a PhD in computational linguistics and teaches at MFF UK, so he has ample experience from academia as well.
Rado Klíč is a software engineer at Geneea and Jirka’s former student. He also worked as a research programmer at Seznam and as a software engineer at Amazon.
Statistical vs. Deep Learning Methods for Time Series Forecasting – Petr Simecek
Who got it wrong?
In recent years, Deep Learning (DL) has revolutionized many fields such as image analysis, speech recognition, and natural language processing. However, Time Series Analysis is still dominated by classical statistical methods. In a recent comparison of statistical (Stat) and machine learning (ML) forecasting methods, prof. Makridakis, one of the leading authorities in this field, even claimed: „The forecasting accuracy of the best ML method was lower than the worst of Stat ones while half the ML methods were less accurate than a random walk“ (Makridakis et al., PLOS One, 2018).
In this talk, we start by a high-level introduction to time series forecasting. Next, we get an overview of M1 – M4 competitions‘ results and publicly available datasets on Kaggle. We propose an explanation why for some data, DL forecasting methods are superior, while on the other datasets, they cannot compete with Stat methods. And what can help you to choose in an era of automatically generated time series all around.
Petr Simecek recently moved to Brno and joined Central European Ai (CEAi) as a Machine Learning Engineer. Before that, he worked in the US as a Data Scientist for Google and The Jackson Laboratory. Through his career, he went from theoretical concepts (PhD. on structures of conditional independence at MFF UK) through applied statistics (genetic studies on mice at IMG AV CR & JAX) to rather practical Time Series analysis.
As a former Software Carpentry instructor, he believes in keeping doors to Data Science wide open, helping others to learn R & Python and looking for more contributors to Daily Python Tip Twitter account.
Beyond Embeddings and Towards Artificial General Intelligence – Radim Burget
This talk is involved in artificial intelligence and knowledge representation using deep neural networks beyond the traditional approaches. First, it will give a summary of the development in recent years and especially it will be focused on the most significant events in 2018 that have potential to form further development of the field. Then also some recommendations will be provided how these recent findings can help to improve the accuracy of existing neural networks and also some inspiration how the others succeeded in public competitions. Second, it will summarize the current one of the most successful methods such as deep convolutional neural networks, their features and their limitations compared to human capabilities. Also, some advanced architectures of neural networks for knowledge modelling will be introduced that has the potential to overcome some of these limitations.
Radim Burget is an associate professor at Brno University of Technology and is heading Signal processing program at SIX Research Centre. He is involved in research of artificial intelligence for many years and was involved in plenty of research projects which includes projects funded on European and national level or privately funded projects as well with companies such as Honeywell, Mitsubishi Electric, Rapidminer, Webnode and other.
Applied Machine Learning In Medical Imaging: Challenges And Successful Approaches – Rene Donner
Deep learning has had a great impact on Medical Imaging. But what is Medical Imaging? What are common Computer Vision tasks in that domain? We will look at registration, semantic segmentation, retrieval and anatomical structure localization. We will not only encounter DL techniques but also see the application of Random Forests, Random Ferns and Markov Random Fields. Lastly, we will look at publicly available data sets to get you started with Medical Imaging, as well as the practical aspects of working with medical domain experts.
With a background in electrical engineering, René has worked for 8 years at the Medical University Vienna as a researcher in computer vision, focussing on anatomical structure localization and content-based image retrieval. He is now CTO at context flow, applying deep learning to large-scale medical image data and developing smart tools to aid radiologists in their challenging tasks.
Deep Learning for Object Detection in the Real World – Michelangelo Fiore
One of our main areas of focus in KiwiSecurity is the correct detection and tracking of people. Deep learning strategies have provided significant improvements in this area, but still have several limitations, such as their relatively slow inference times. In real world scenarios, the system needs to work reliably in very complex situations, such as crowded environments, where the objects are severely occluded. Also, in many situations, it is necessary to detect and track people simultaneously on multiple scenes, while having limited hardware. Creating a system that is accurate even in complex scenes, while still performing faster than real time and not having heavy memory requirements, is a very difficult task.
In this talk, I will give an overview on the state of art of object detection, focusing on deep learning strategies, and show how KiwiSecurity is using deep learning for the task of people detection. In particular, I will focus on how the real world constraints have impacted our choice in selecting deep learning frameworks and models, and how we have used the strategy of model pruning to improve the inference time of our application.
Michelangelo Fiore is Computer Vision Developer at KiwiSecurity since June 2017. Previously did PhD studies in Human-Robot Interaction at LAAS/CNRS in Toulouse.
Roboauto – Jan Najvárek
Self-driving cars will be our transport in the future. The question is not if, but when. Let’s see the what is the current progress of the world leaders in autonomous cars, what we do at Roboauto team here in the Czech Republic and what is our plans for the near future. We will talk about AI progress in the self-driving area, how are simulations important in the field and current hardware background of autonomous cars.
Jan Najvarek is a self-driving enthusiast, co-founder of Roboauto team/company (creator of first CZ self-driving prototype) and also a co-founder of Artin (250 sw developers).
www.roboauto.cz, Roboauto videos
Pitfalls of ML at (not only) Kiwi.com – Roman Rožnik
No doubt machine learning is a hot topic in recent years, it seem’s everybody can easily become a data scientist and do ML within few lines of code. Reality is much harder. Understanding the problem, preparing right training data, cleaning them, designing features, interpretability/complexity of the model, defining right metrics, looking at false positives/negatives, interpretation of ML results or AB tests – those are topics highly tied with data science that are often overlooked and underrated. I’d like to emphasize that those are very important and ML itself is just one small piece of complex data science puzzle. Bringing data science and ML approaches to the crazy company like Kiwi.com is very hard and often frustrating costs lot of blood, toil, tears and sweat and brings disillusion, sadness and lot of fails.
Roman has always been the developer most interested in math and algorithms. With a stroke of luck, he became the machine learning guy at Seznam.cz where he introduced ML to the full-text search team. After he fulfilled his mission with Seznam.cz, he decided to bring the ML and data science approach to Kiwi.com.
Deploying text RNNs in bank – Vlado Boža
One part of the anti-money laundering process is a background check of the client to determine whether the client has been engaged in any risky or illegal activity. This was done mostly by searching manually news databases and reading returned articles. Vlado will present how Merlon Intelligence helps automate this process by implementing a solution using recurrent neural networks and also how it has coped with limited availability of training data.
Vlado Boža is a Lead ML engineer at CEAI where he focuses on hard problems requiring interesting solutions, especially in the area of NLP. He is a co-founder of Black Swan Rational, a boutique data science consultancy where he worked on projects like computer vision for detecting clouds in the sky, analysis of electrical smart-meter data or an analysis of M&A data. Vlado holds a PhD in Bioinformatics focusing on processing DNA sequencing data. He is a big fan of clever heuristics, probabilistic algorithms or any interesting efficient algorithms
Building Safe AI – Andrew Trask
We are experimenting with a new format of moderated live streamed sessions with top experts from the field, so don’t hesitate and come with friends to Impact Hub to support us, enjoy talk and rock the afterparty!
In the first half of this talk, I’ll introduce and describe Private Deep Learning, which is an approach to training neural networks in an encrypted state such that it’s growing intelligence (and the underlying data) is protected from theft. This will include a description of Federated Learning and Multi-Party Computation.
In the second half of this talk, I’ll be discussing the significant impacts this technology has when combined with the recent advancements in Blockchain and Peer-to-Peer into a new open-source platform called OpenMined. This will include a live demo showing how to train a neural network on a large, distributed, private dataset.
Andrew Trask is a PhD Student at the University of Oxford studying Deep Learning. He is also the author of Grokking Deep Learning, a Manning Publications introductory book which has sold over 6000 copies, an instructor in Udacity’s Deep Learning Nanodegree, and the author of a popular machine learning blog http://iamtrask.github.io. Previously, Andrew was a researcher and analytics product manager at Digital Reasoning where he trained the world’s largest artificial neural network with over 160 billion parameters and helped guide the analytics roadmap for the Synthesys AI platform deployed to many Enterprises such as Goldman Sachs, UBS, HCA (the largest hospital network in North America), various members of the Intelligence Community, and the US Military. Andrew lives on a boat in Oxford with his wife Amber and plays the piano in his spare time.
Why we need GANs for image manipulation – Michal Hradiš
Image processing certainly did not miss out on the big convolutional network revolution. Networks are at the core of state-of-the-art methods in image deblurring, superresolution, motion estimation, and even in such mundane tasks as image compression. Compared to more traditional approaches, networks can be trained for specific types of images, don’t require deep understanding of complex mathematics, and they can even hallucinate realistic image details.
In this talk, I will show you how efficient image processing networks can be built and trained. I will explain why the hell do we need Generative Adversarial Networks and how they relate to human perception. The presented ideas will be demonstrated on real world image enhancement applications. You’ll get a chance to experiment with them at home using provided TensorFlow code.
Towards General AI – Tomas Mikolov
I will present CommAI, a project aiming to build the first general AI with human-level communication skills. This includes a novel type of training setup which stresses the importance of unsupervised and incremental learning. I will describe the simplest learning problems defined in this environment, and various attempts to solve these problems using classic machine learning techniques.
Tomas Mikolov is a research scientist at Facebook AI Research since May 2014. Previously he has been the member of Google Brain team, where he developed and implemented efficient algorithms for computing distributed representations of words (word2vec project). Tomas has obtained his PhD from Brno University of Technology (Czech Republic) for his work on recurrent neural network based language models (RNNLM). His long-term research goal is to develop intelligent machines capable of learning and communication with people using natural language.
AI in the Office – Pavel Dvořák & Petr Mejzlík
How much time do you spend searching for the information during your workday? Have you ever been searching for some document, presentation or diagram with information necessary for creating your report, writing a documentation or designing a user interface according to company’s standards? Have you always found it immediately or have you had to go through several folders to finally shout “Eureka”? Or have you even sometimes forgotten that a particular document exists and discovered it after your work was finished? Konica Minolta Laboratory Europe’s vision is to develop an operating system for the workplace of the future, called Cognitive Hub, as a nexus for users’ information flows. With Cognitive Hub we aim at changing the busy workers into effective workers by letting them focus on creative work instead of doing tedious and boring tasks such as searching for information or people with the right skill sets. In this talk, we will introduce you our overall goal and the role of Machine Learning in it, with a special focus on Computer Vision area, and present what we have already done.
Pavel Dvořák works as a Computer Vision Research Specialist for Konica Minolta Laboratory Europe. In 2015, he obtained a PhD degree in Computer Science at Brno University of Technology with a thesis focused on the application of Computer Vision in Medical Imaging. During his doctoral studies, he also worked as a researcher at the Czech Academy of Sciences and spent altogether two years as a visiting researcher at several European universities, e.g. TU Munich, Medical University of Vienna or Vienna University of Technology.
Petr Mejzlík works as a Machine Learning Research Specialist for Konica Minolta Laboratory Europe. He obtained MSc. in Computer Science in 1987, and PhD (Dr.) in Molecular Biology and Genetics in 1994 from the Masaryk University in Brno. He had been teaching and doing computational chemistry/biology research at the Faculty of Informatics, Masaryk University until 2002. Since then, he participated on technology research and technical software development in Virtual Reality Simulators, ANF Data, FEI, Honeywell, and Kinalisoft.
0-day Malware Detection at Scale – Zdenek Letko
In this talk, we will introduce the key aspects of the global infrastructure built by Wandera to offer organizations a global solution for Enterprise Mobile Security and Data Management. Next we discuss the exciting journey towards Machine Learning (ML) based zero day malware detection in a production environment. We will talk about the usual ML steps such as data harvesting, feature extraction, classification algorithm optimisation, model training, and evaluation. Having a functioning ML model is great but how to use it in production? The remainder of this talk is devoted to answering this question and will focus on model retraining, deployment, monitoring, and solution maintenance. And since we are not super heroes, the talk will be interlaced with lessons learnt – usually discovered the hard way. 😉
Zdenek is a software engineer and data science/machine learning enthusiast. He is currently working for Wandera, helping MI:RIAM to see, understand, and predict Internet traffic and applications behaviour.
FlowerChecker: Exciting journey of one ML startup – Ondra Veselý & Jiří Řihák
FlowerChecker — machine learning startup — was established three years ago by three PhD. students with one goal: plant identification.
The story-like talk shows how we use machine learning to validate the initial business idea. How we struggled trying to use existing image-recognition software and also and how and why we have collected dataset for the first commercial machine learning system with different interfaces: mobile app, facebook chatbot or twitter guerrilla marketing bot. Many colorful graphs included.
The second part of the talk goes more technical: TensorFlow, Inception v3, data preprocessing tricks, performance tuning and debugging. Basically all the struggles we needed to overcome to be able to identify 9000 different plant species.
Ondřej is a developer and data engineer. After a brief experience with development for Seznam.cz and AVG Technologies, he founded FlowerChecker where he plays CEO role. After stabilising the business, he joined Kiwi.com on its early startup-stage to establish analytics, research teams. Currently he builds streaming pipelines for business intelligence, leads Czechitas python courses and consults R&D projects for the European Commission.
Jirka is FlowerChecker co-founder responsible for app development and ML in plant identification. He is also finishing his PhD in Adaptive Learning group – small, but enthusiastic research lab at FI MU focused on application of ML in education.
Image Search @ Seznam.cz – Lukáš Vrábel
Seznam.cz is, among other things, a major player on the search engine market in the Czech Republic. A few years ago, we switched our image search service from a third-party provider to our own in-house solution. It started as a simple modification of our fulltext search engine. But, over time, it has gradually developed into a standalone system.
This talk will be focused on the evolution of the search, the obstacles we faced and the solutions we implemented. We’ll briefly discuss the models, machine learning techniques and features that are used in the image search pipeline. The focus will mostly be on our investigation into deep learning in order to further improve the relevancy of the system.
Lukáš has industry experience with various machine learning tasks ranging from NLP through web page analysis to image recognition. Formerly a head of research department at Seznam.cz, he is currently solving the world’s largest industry problems using AI and machine learning at CEAI.
Yes, we can improve it with Evolutionary Computation! – Lukáš Sekanina
In this talk, I will survey main principles and branches of EC and show typical applications of EC. In particular, EC will be presented in connection with approximate computing – a recent approach introduced for developing faster, more energy efficient, and less complex computer systems in which the correctness requirement can be relaxed to some extent.
Case studies will be focused on evolutionary approximation of digital circuits that are crucial for low power image processing, deep neural networks and other systems on a chip.
Lukáš (co)authored over 150 papers mainly on genetic programming, approximate computing, applications of bio-inspired AI and digital circuit design. His research results were awarded with one Golden and two Silver medals at the Humies competition annually organized at GECCO.
Face detection and verification – Marián Beszédeš
Time Series Predictions using Neural Networks – Rudradeb Mitra
One of the key application of time series data in AI is Predictive Analytics. Companies are using predictive analytics in various ways – from predicting customer buying behavior to predicting heath risk or predicting the future breakdown of trucks.
In this talk, the speaker will explain about the concepts of time series database and predictive analytics. Then he will show through examples on how NN is used to make time series predictions.
After finishing his Masters from Univ. of Cambridge he went on to build 4 startups – two in Silicon Valley, one in UK and one in Belgium. These days his focus is on applications of AI and IoT. In his free time, he writes and talks about to Artificial Intelligence, IoT and startups.
Speech data mining: not yet ready for retirement – Honza Černocký
Bradford Cross – How is Machine Learning used in Fintech? Vertical AI and its Applications in the Financial Services Industry
We will start by mapping the financial services sector into banking, insurance, investments, real estate and consumer financial services and looking at machine learning applications in each.
Then we will contrast modern machine learning approaches against traditional ‘quant finance.’
Finally, we will dive into specific applications like underwriting credit and insurance, risk scoring, product and marketing, financial crimes, real estate, and some more exotic ideas like using satellite imagery to make economic predictions on earth.