File processing SDK with 600+ format supported : Outside In Technology
Introduction
Outside In Technology is a suite of software development kits (SDKs) that provides developers with a comprehensive solution to extract, normalize, scrub, convert and view the contents of 600+ unstructured file formats.
What problem does OIT solve?
Assume that you have business application, where you are dealing with wide range of file formats and want to implement search functionality across the files. To achieve this, you have to extract text content from each file for indexing.
You will have possibly two options
- Write your own code
- Leverage SDK which can extract text from file
I am sure you will pick the option (1), but still we haven’t crossed all the hurdle. Your user might upload PDF, MS-Office, Auto-CAD etc files. Technically it will be nightmare to have individual SDK for each file format.
This is where OIT SDK will give you the edge with Single SDK to process 600+ file formats (refer to supported file format).
What’s the value Proposition
It depends on how many SDK you want to leverage but at high level below points will be always valid to evaluate your ROI
- Gain Competitive Advantage: Extend your application’s reach and value with the ability to access, transform and control unstructured information
- Reduce time to market: Quickly and easily implement new features with robust and flexible development tools
- Build vs Buy: Reduce engineering effort to develop & manage wide range of growing file formats at fraction of cost.
Understand OIT SDK
Here is quick summary about each SDK and potential use cases, for more detail please refer product documentation. Each SDK has it’s own functionality around content processing and developer can quickly pick more than one SDK according to their requirement.
List of OIT SDK
SDK | Capabilites | Common Usage |
---|---|---|
FileID | File ID employs a proprietary algorithm to identify file types without using unreliable file extensions or mime types | The technology is particularly useful at the start of a workflow when dealing with unknown data, in security applications, and anytime the format of a file needs to be quickly and accurately identified. |
Clean Content |
Extracts text, metadata and hidden information Bursts and reassembles documents |
Avoid accidental exposure of metadata & hidden information, that may contain sensitive, proprietary, or confidential data. This information easily overlooked when files are shared or distributed. |
Content Access | Extracts text and metadata from file and automatically translates the text and properties from multiple possible encodings into a single encoding specified by the developer | It is widely used in search and data forensic applications. |
PDF Export | It is a cross-platform, application-independent PDF conversion solution that can convert any of the 600+ Outside In supported formats into PDF. | It’s used is appropriate for content management, document management, web publishing, or any application that can benefit from application independent, server-based PDF conversion |
Image Export | Converts content into TIFF, JPEG, JPEG2000,BMP, GIF, or PNG. The SDK offers numerous options, including the ability to size the image output from thumbnail to full-size, and control image resolution. | It’s used in imaging, e-discovery and other applications which require access to static, high-fidelity images of business documents. |
HTML Export | It converts the contents into HTML, rendering embedded graphics as a choice of GIF, JPEG, or PNG | Its ability to provide browser access to hundreds of file types without plug-ins or other proprietary application. |
Search Export | Extracts the text and metadata of supported file types and converts it into XML, HTML or text. | It’s used is appropriate for search, forensics or any application that needs to extract content and convert it into a format conducive to post-processing and analysis. |
XML Export | Export converts and normalizes the content of supported file types into XML defined by the “FlexionDoc” schema. | It’s appropriate for use in any application that can benefit from normalizing documents into XML that explicitly describes all the elements of the document’s content, structure, properties and formatting. |
Web View Export | Export enables an application to produce high quality HTML5 renditions of documents created by standard business software. | It can be used with almost every browser based application to preview document without download or any dependency. |
Viewer Technology | Enables file viewing, printing, and copy/paste functionality for desktop based application | It’s used is appropriate to solutions that can benefit from desktop viewing functionality. |
Hands on Experience (on Windows platform)
Working with PDF Export SDK
- Step 1 : Dowload PDF Export SDK from product download page and select Windows(x86-64) version.You need to be logged in for downloading the binaries, you can register for free.
- Step 2: Download & Install Visual C++ 2013 Redistributable package from here . Select file based on your operating system, for example if you are running on Windows 10 then select vcredist_x64.exe
- Step 3: Unzip the downloaded SDK and execute makedemo application. This will do the needful to run demo sample application.
- Step 4: Navigate to folder demo and execute Export.exe application. Your sample application will looks something like below
- Step 5: Select your input file for Source document: field and provide output file path where you want to export the result (for example \output\output-file.pdf)
- Step 6: Click Export button to run conversion. You can check the progress inside Status field.
- Step 7: Go to output directory and open PDF file to verify the conversion.
Working with WebView Export SDK
- Step 1 : Dowload Web View Export SDK from product download page and select Windows(x86-64) version.You need to be logged in for downloading the binaries, you can register for free.
- Step 2: Download & Install Visual C++ 2013 Redistributable package from here . Select file based on your operating system, for example if you are running on Windows 10 then select vcredist_x64.exe
- Step 3: Unzip the downloaded SDK and execute makedemo application. This will do the needful to run demo sample application.
- Step 4: Download & Install NodeJS from here.
- Step 5: Open Windows Command Prompt ( Press Windows Key + R and then type cmd )
- Step 6: Go to the folder where you unzipped WebView Export SDK and navigate to samplecode > demoserver_nodejs folder
- Step 7: Execute command node demoserver.js , this will run the local server
- Step 8: Open URL http://localhost:8888 in browser and application will look like below screen
- Step 9: Select the Directory where you have sample file ( or you can use the sample file provided under /sdk/sampefiles)
- Step 10: Click the file from left side panel and it will show the converted web-viewable format (HTML) on right side panel.
- Step 11: You can try the tool bar to use highlighter, sticky note and other features.
Summary
Outside In Techology is developer friendly SDK which can be easily embded to your application and comes with C,C++,Java etc. API to reduce integration effort. To evaulate the product features or the quality of conversion, you can play around with Sample application as explained above.