
Kantu Web Automation Browser
Kantu is a picture-driven web macro recorder. Kantu combines screenshots and image recognition for visual web automation, form filling, web testing and data extraction/web scraping.
*Get data from web pages automatically: Diffbot's computer vision APIs turn the web into your database. *AUTOMATIC APIs: Extract Automatically Get structured...
why diffbot?
we're focused exclusively on getting you better web data.some of the reasons hundreds of customers make (hundreds of) millions of calls every month:
#the web's best content extractor:
diffbot works automatically—without rules or training. there's no better way to extract data from web pages. see how diffbot stacks up to other content extraction methods:feature comparison textextraction quality shootout
#identify pages automatically:
use the analyze api to automatically find and extract all products, articles, discussions or images while crawling any site.analyze api
#detailed product data:
the product api automatically returns complete product info, including all pricing data, product ids, brand and full specifications tables.product api
#clean text and html:
articles, discussion threads, product descriptions and image captions are returned in pure text and sanitized html.start testing today
#structured search:
search structured content from any crawl onthefly using our search api, returning only the matching results.
plus...
¤ all apis execute javascript so content is parsed like a regular browser. ¤ works on most nonenglish pages thanks to visual processing. ¤ date normalization: datestamps are normalized and presented in rfc 1123 (http/1.1) standard format. ¤ multipage articles are automatically joined together in a single api response. ¤ entity extraction: automatic tagging identifies major topics and entities within article text. ¤ fix any issues realtime with the api toolkit. ¤ bulk api allows the extraction of hundreds to hundredsofthousands of pages. ¤ access crawlbot and bulk job data in full json or csv formats. ¤ optionally crawl using a diverse array of ip addresses.
web-development api html json extraction data-extraction web-extraction