Web scraping is a valuable tool for structuring and utilizing the vast amount of data available on the web. It can be employed for research, in the enterprise, or for personal use. The only limit is your creativity (and potentially, a site's terms of use)!
Using Node.js and packages like Cheerio and X-Ray, web scraping is easier than ever. With a minimum amount of effort, you can get nicely-structured Javascript objects and arrays extracted from the content of a site.
This session will include an overview of the basic tools and techniques for scraping in Node.js. I’ll provide helpful tips and cover some of the obstacles and caveats. It’s important that you don’t take data that people don’t want to have scraped! There will be code examples, real-life examples, and ideas of other potential use cases.
Tim Beck is a Software Developer with Financial Partners working on single page web applications. He is experienced with the Microsoft web stack, Node.js , and HTML5 applications using technologies like Angular, Knockout, jQuery, and LESS. He has done a lot of work with performance, tooling, and improving developer quality-of-life. His recent emphasis has included web scraping, Chrome extensions, machine learning, and natural language processing. In his spare time Tim is an active game developer, with an emphasis on tool creation.
- Log in to post comments