Software Developer - Functional Programming Internships

We are looking for master student interns, with a passion for functional programming, ideally in OCaml, and for beautiful code.

Learn about how LexiFi uses OCaml: http://www.lexifi.com/product/technology/ocaml

No prior knowledge of finance is required. If you are interested, please send your resume to careers@lexifi.com.

About LexiFi: LexiFi is a financial software vendor based in Paris (Boulogne Billancourt). OCaml is our main development language, and we are strongly involved in the evolution of OCaml. We have created approaches and tools, rooted in the theory of programming languages, to radically simplify the development of financial applications. Our products are used by financial institutions and other software or service providers that embed our technology (and sometimes become OCaml enthusiasts as well). For more information, visit http://www.lexifi.com.

Proposals for internship topics

These proposals are a starting point and the exact goals will be discussed with the candidate. Note that all topics may lead to an open-source release of the code.

A tool to migrate generic float arrays

The OCaml runtime system handles the case of float arrays specially in order to support a more efficient representation which is crucial for numerical code. This has the unfortunate consequence that the rest of the code dealing with arrays of any type is slowed down. This is why the latest version of OCaml, 4.06, comes with a configure-time flag to disable this special treatment.

The purpose of this internship is to develop a tool to facilitate the migration from the old representation of float arrays to a dedicated representation that does not use generic arrays. The tool will use typing information produced by the compiler to spot occurrences of float arrays and specialize the code as needed.

References

  1. https://github.com/ocaml/ocaml/pull/1294
  2. https://www.lexifi.com/blog/about-unboxed-float-arrays

Experimenting with BuckleScript as a possible alternative to js_of_ocaml

Some of our user interfaces are created using web technologies. Written in OCaml, they are translated into Javascript using the js_of_ocaml compiler. In order to interact with the Javascript API, we developed a tool gen_js_api to generate automatically the binding code that takes care of translating values between OCaml and Javascript and dealing with Javascript calling conventions.

Recently, a promising alternative to js_of_ocaml called BuckleScript has emerged (already being used in production by the Messenger team at Facebook). The first goal of this internship would be to extend gen_js_api to support BuckleScript. This would allow all projects using gen_js_api to freely switch from one back-end to the other. Then, at a later stage, the intern will have to see how much of our code base needs to be adapted to be compatible with missing features of BuckleScript (eg. generic marshaling).

References

  1. https://bucklescript.github.io
  2. https://github.com/LexiFi/gen_js_api
  3. http://ocsigen.org/js_of_ocaml

Reimplementing LexiFi Runtime Types as a syntactic preprocessor

It is often useful to get access to types at runtime in order to implement generic type-driven operations. A typical example is a generic pretty-printer. Unfortunately, the OCaml compiler does not keep type information at runtime. At LexiFi, we have extended OCaml to support runtime types. This extension has been in use for years and is now a key element in many of our interesting components, such as our automatic GUI framework (which derives GUIs from type definitions) or our high-level database layer (which derives SQL schema from type definitions, and exposes a well-typed interface for queries). This extension is tightly integrated with the OCaml typechecker, which allows the compiler to synthesize the runtime type representations with minimal input from the programmer.

The goal of the internship is to develop a PPX syntax extension that would synthesize the runtime representation of types from their syntactic definition, with a deriving-like approach. Because the PPX extension will not be able to take advantage of the integration with the OCaml typechecker, it would not be as flexible as our modified compiler but it will be compatible with the upstream compiler and will allow to open-source the infrastructure we have developed around the machinery of runtime types and libraries built upon them.

An extra possible step would be to add hooks to the OCaml compiler that would allow implementing the same level of comfort as the current solution (including implicit synthesis of representation through type inference, and implicit insertion of representation arguments on call sites), without patching the compiler itself but instead by loading an extension at compilation time.

References

  1. An overview of LexiFi extensions to OCaml with a focus on runtime types (2017)
  2. Presentation on Runtime Types in OCaml (2011)
  3. Blog post on Dynamic Types (2010)
  4. A Guide to Extension Points in OCaml (2014)
  5. Extension Points - 3 Years Later (2017)

Data extraction from PDF documents

A component of one of our products is dedicated to extracting information from PDF documents. The process starts from the geometric description of the content provided in the PDF file (thanks to the camlpdf library) and then applies both geometric algorithms and parsing combinators to extract required pieces of information. Examples of geometric problems includes: (i) reordering chunks of text in order to recover the natural reading order; (ii) recognizing the logical structure; (iii) extracting tabular; (iv) recognition of mathematical formulas. However, these algorithms are rather ad hoc and need frequent adjustments to account for new layouts.

The goal of this internship is to try and experiment with more generic and reliable approaches to these problems.

References

  1. A Table Detection Method for Multipage PDF Documents via Visual Separators and Tabular Structures, Fang, J.; Gao, L.; Bai, K.; Qiu, R.; Tao, X.; and Tang, Z. 2011
  2. Pdf-trex: An approach for recognizing and extracting tables from pdf documents, Oro, E., and Ruffolo, M. 2009
  3. Layout and content extraction for pdf documents, H Chao, J Fan - Document Analysis Systems, 2004 - Springer
  4. https://github.com/johnwhitington/camlpdf