VRDSynth

tures, deep learning techniques either leverage a pre-trainedtransforme show annotation

there are works that use more than one and even all 3, such as

PICK: https://arxiv.org/pdf/2004.07464.pdf

SPADE: https://arxiv.org/pdf/2005.00642.pdf

could be more

between these key-and-valuesis still unaddressed [13] . In detail, on FUNSD [13], BERT show annotation

This is not true, PICK and SPADE also predict the links. Also there is W2NER (https://arxiv.org/pdf/2112.10070.pdf) which also directly predict links between tokens, however this one addressed 1D documents, but extension to 2D is trivial.

Those are what I know from 2 years ago, so there should be more now

show to achieve high accuracy in existing literature .Following that, we propose a no show annotation

Please provide some reference

two phases: identifying initial programs and iteratively refin-ing progra show annotation

This could be that I’m fairly unknowledgeable in this field, but I would like to have a brief definition about the “program” here, i.e. what are its inputs and outputs ?

From what presented in this paragraph, I have the impression that the programs take as input a graph where vertices are the semantic entities and edges represents semantic connections. Then it outputs a subset of the input graph with some edges removed. Contradictorily, in the abstract, it is stated that the programs “link semantic entities”, which indicate they create links instead of removing them.

Even though this will be described in details in later section, I think It’s good to present a high level description here to set up the context for the readers.

If this is obvious to the intended audience (researchers in the field), then please ignore this comment.

first stage, VRDSynth mines the frequented relation between semantically li show annotation

Did you mean frequent ? frequented (adj): (of a place) visited often or habitually.

If this is a term commonly used in the field, please ignore this comment.

rainingdocuments and uses these frequented relations to construct initialpr show annotation

construct initialprograms that filter out potential pairs of semantically show annotation

did you mean “filter” ? if you have a mix of A and B and you filter A out, then you are keeping B. I suppose here you want to keep the “potential pairs” ?

ebpages, or on specific formats, our method takes as input a document graph constructed fromscanned visually rich documents and synthesizes programs thatlink semantic entities from scanned documents in diffe show annotation

this piece confuses me in the sense that if the input is a graph, which means there are links in it, then why we need to “link semantic entities” ?

Is it that you start with a lot of links then your programs try to filter down to only links between semantic entities ?

Or is it that the links in the input graph and the links you trying to get are of different kinds ? If so please briefly describe the difference.

bes our domain-specificlanguage. Section 7 shows our experiment settings a show annotation

missing 5 and 6

� isan ordered pair of semantic group from 𝐷𝑖, indicating a directi show annotation

groups

ments, we are provided with the groupingof entities 𝐺𝑖 ={𝑔1,𝑔2 …𝑔𝐾} and t show annotation

how is a group of entities is defined ? Is this a subset of D_i, or a subgraph of {D_i, D_R} ?

Also I’m confused that while previous section talks about the links between entities but now it’s between group of entities instead.

s 𝐺𝑖 ={𝑔1,𝑔2 …𝑔𝐾} and the semantic link between 𝐿𝑖existing between elements , where each semantic link (𝑔𝑗 show annotation

did you mean “the semantic links L_i between existing elements” ?

sis can be expressed as follows: ∀((𝑔1,𝑔2) ∈𝐿𝑖)∃(𝑃 ∈S)s.t. 𝑔1 ∈𝑃(𝑔2)and (𝑃𝑔2,𝑔2) ∈𝐿𝑖∀𝑔2 (1)Intuitively, this means that show annotation

Please describe what P, S, P(x) and P_x are.

I suppose:

S: set of all valid programs

P: a valid program

P(x): output of P with input x?

P_x: not sure?

Also, is this describing 2 requirements or 1 ? It a appear to me that the part after “and” is a separate requirement since it only contains g2 while the other contains both g1 and g2. If so, please separate them into 2 statements and change the description to plural.

𝑀𝑖 ={(𝑒1,𝑒2)}. In the case o f semantic entity grouping ,we input pair-wise connections show annotation

I believe this task was not defined. So far there has only been the definition of semantic entities linking.

Also, the introduction only mentioned “generate programs that link semantic entities”.

layout and textual information, the predicates and conditions𝑃𝐶, such as 𝐿𝑏𝑙𝐶 for label comparisons, 𝐹𝑙𝑜𝑎𝑡𝐶 for floating-pointcomparisons to incorporate layout features, 𝑆𝑡𝑟𝐶 for string com-parisons. The label constraints 𝐿𝑏𝑙𝐶 m show annotation

This sentence sounded weird, seems like it is missing the predicate.

𝐷𝐶 is the domain ofconditions 𝐷𝑉 is the domain of visual elements in a document , (3)𝐷𝑅 is the domain of relat show annotation

missing number for this one ?

operates. Thus,early pruning of possible bindings leads to a better chanc show annotation

shouldn’t this be “impossible” or “invalid” ?

ynthesizing redundant programs, we the set of covered specifica-tions show annotation

missing verb

e;endif 𝐵+∗𝑝 \𝐶𝑜𝑣𝑒𝑟 ≠∅and 𝑝𝑟𝑒𝑐(𝐵+∗𝑝 ,𝐵−∗𝑝 >𝑝𝑟𝑒𝑐(𝐵+𝑝,𝐵−𝑝 then𝐶𝑜𝑣𝑒𝑟 ←𝐶𝑜𝑣𝑒𝑟 ∪𝐵+ show annotation

missing close parenthesis ?

𝑡 ∈ 𝑉𝑠𝑒𝑡_𝑟𝑒𝑡𝑢𝑟𝑛, (𝑏[ v0 ],𝑏[v𝑟𝑒𝑡]) ∈ 𝑀), and 𝐵−𝑝d show annotation

missing domain for v0 here ?

s 𝑁𝑃∗ and version spaces 𝑉𝑆∗ ==𝐶𝑜𝑣𝑒𝑟,𝑃𝐶𝑜𝑣𝑒𝑟,𝑁𝐶𝑜𝑣𝑒𝑟 ={}, {}, {};𝑉𝑆∗,𝑃𝑃∗,𝑁𝑃∗ ←{}, {}, {};== foreach 𝐵+𝑝,𝐵−𝑝,𝑝 ∈𝑉𝑆 doi show annotation

Inconsistent use of "←" and ”=” ? Might be that I misunderstood the notation, if so please ignore this comment.

𝑝 ;endendend𝐸𝑃𝐶𝑜𝑣𝑒𝑟 ←{} ==𝑝𝑢 =Union({𝑃𝑃∗}∪{𝑁𝑃∗})== ;foreach 𝐵+𝑝,𝐵−𝑝,𝑝 ∈𝑉𝑆 do show annotation

should be on a new line ? Also should be "←" instead of ”=” ?

reach 𝐵+𝑝,𝐵−𝑝,𝑝 ∈𝑉𝑆 doif 𝐵−𝑝 (𝑃𝐶𝑜𝑣𝑒𝑟 ∪𝑁𝐶𝑜𝑣𝑒𝑟) and 𝐵+𝑝 \𝑃𝐶𝑜𝑣𝑒𝑟 ≠∅then𝑃 show annotation

missing ”= ∅” ?

wards values on thedown right, ( "specific gravity" towards “not determined”), inth show annotation

the image is too blurry for this one

�𝑙 :=Exclude(Union(𝑝𝑝 ∈𝑃𝑃), Union𝑛𝑝 ∈𝑁𝑃) )7 EXPERIMENTWe implement VRDSyn show annotation

missing open parenthesis

f joining programs.7.3.1 RQ3.1. Inference time efficiency. We note down the meanand standa show annotation

It appears to me that VRDSynth is CPU-bound and does not benefit from GPUs. Thus, you might want to make the comparison with more CPU cores. If in that scenario VRDSynth performs better or comparable to LayoutXLM, then you can make some points about the practicality of VRDSynth.

For examples:

2-4 cores (no gpu) is the common spec for personal laptops. In businesses, document processing tasks like mentioned would be preferable to be able to run on edge devices. (small memory footprint is also relevant in this case)

more cores (no gpu): could be use on on-premises IT infra. Not needing gpu while still produce similar performance would be more cost effective.

Lone's notes

Recently Updated

Diffusion language modeling with maximum semantic likelihood

distill from AR LM to diffusion LM

LLM generation is path finding in activation space, each decoder block's processing is taking a step in said space

AI resources

Controlling reasoning duration with activation steering

All notes

VRDSynth

Graph View