Software Developer - Functional Programming Internships

LexiFi is currently looking for talented master students with a genuine interest in functional programming for an internship that would take place in our offices in Boulogne-Billancourt.

The candidate should be confident in its ability to quickly learn to write OCaml code proficiently and should have good algorithmic skills. No knowledge of finance is required.

If you wish to apply please send an e-mail to careers@lexifi.com with your resume attached.

About us

Founded in 2000, LexiFi is a software publisher that builds tools for the pricing and management of derivatives and structured products. Financial institutions and technology firms partner with LexiFi to boost their analytical and processing capabilities for tailored financial products.

At LexiFi, we use OCaml as our primary implementation language. We maintain our own version of the language and contribute actively to the evolution of the official version. LexiFi's full control over its implementation stack enables tremendous flexibility in making its technology available in a wide range of technical environments.

Proposals for internship topics

These proposals are a starting point and the exact goals will be discussed with the candidate. Note that the first two topics may lead to an open-source release of the code.

A tool to migrate generic float arrays

The OCaml runtime system handles the case of float arrays specially in order to support a more efficient representation which is crucial for numerical code. This has the unfortunate consequence that all other codes dealing with arrays of any type is slowed down. This is why the latest version of OCaml, 4.06, comes with a configure-time flag to disable this special treatment.

The purpose of this internship is to develop a tool to facilitate the migration from the old representation of float arrays to a dedicated representation that do not use generic arrays. The tool will use typing information produced by the compiler to spot the occurrences of float arrays and specialize the code as needed.

References

  1. https://github.com/ocaml/ocaml/pull/1294
  2. https://www.lexifi.com/blog/about-unboxed-float-arrays

Experimenting with BuckleScript as a possible alternative to js_of_ocaml

Some of our user interfaces are designed using web technologies. Written in OCaml, they are translated into Javascript using the js_of_ocaml compiler. In order to interact with the Javascript API, we developed a tool gen_js_api to generate automatically the binding code that takes care of translating values between OCaml and Javascript and dealing with Javascript calling conventions.

Recently, a promising alternative to js_of_ocaml called BuckleScript has emerged (already being used in production by the Reason team at Facebook). The first goal of this internship would be to extend gen_js_api to support BuckleScript. This would allow all projects using gen_js_api to freely switch from one back-end to the other. Then, at a later stage, the intern will have to see how much of our code base needs to be adapted to be compatible with missing features of BuckleScript (eg. Marshalling).

References

  1. https://bucklescript.github.io
  2. https://github.com/LexiFi/gen_js_api
  3. http://ocsigen.org/js_of_ocaml

Content extraction in PDF documents

A component of one of our products is dedicated to extracting information from PDF documents. The input of this process is the geometric description of the document provided the PDF file. LexiFi already developed geometric algorithms to reorder chunks of text in order to recover the natural reading order, to recognize the logical structure and to extract tabular data from such descriptions. However, these algorithms are very ad hoc and need frequent adjustments to account for new layouts. The goal of this internship will be to try and experiment with more generic and reliable approaches to these problems.

References

  1. A Table Detection Method for Multipage PDF Documents via Visual Separators and Tabular Structures, Fang, J.; Gao, L.; Bai, K.; Qiu, R.; Tao, X.; and Tang, Z. 2011
  2. Pdf-trex: An approach for recognizing and extracting tables from pdf documents, Oro, E., and Ruffolo, M. 2009
  3. Layout and content extraction for pdf documents, H Chao, J Fan - Document Analysis Systems, 2004 - Springer
  4. https://github.com/johnwhitington/camlpdf