tures, deep learning techniques either leverage a pre-trainedtransforme show annotation
there are works that use more than one and even all 3, such as
could be more
between these key-and-valuesis still unaddressed [13] . In detail, on FUNSD [13], BERT show annotation
This is not true, PICK and SPADE also predict the links. Also there is W2NER (https://arxiv.org/pdf/2112.10070.pdf) which also directly predict links between tokens, however this one addressed 1D documents, but extension to 2D is trivial.
Those are what I know from 2 years ago, so there should be more now
show to achieve high accuracy in existing literature .Following that, we propose a no show annotation
Please provide some reference
two phases: identifying initial programs and iteratively refin-ing progra show annotation
This could be that I’m fairly unknowledgeable in this field, but I would like to have a brief definition about the “program” here, i.e. what are its inputs and outputs ?
From what presented in this paragraph, I have the impression that the programs take as input a graph where vertices are the semantic entities and edges represents semantic connections. Then it outputs a subset of the input graph with some edges removed. Contradictorily, in the abstract, it is stated that the programs “link semantic entities”, which indicate they create links instead of removing them.
Even though this will be described in details in later section, I think It’s good to present a high level description here to set up the context for the readers.
If this is obvious to the intended audience (researchers in the field), then please ignore this comment.
first stage, VRDSynth mines the frequented relation between semantically li show annotation
Did you mean frequent ? frequented (adj): (of a place) visited often or habitually.
If this is a term commonly used in the field, please ignore this comment.
rainingdocuments and uses these frequented relations to construct initialpr show annotation
construct initialprograms that filter out potential pairs of semantically show annotation
did you mean “filter” ? if you have a mix of A and B and you filter A out, then you are keeping B. I suppose here you want to keep the “potential pairs” ?
ebpages, or on specific formats, our method takes as input a document graph constructed fromscanned visually rich documents and synthesizes programs thatlink semantic entities from scanned documents in diffe show annotation
this piece confuses me in the sense that if the input is a graph, which means there are links in it, then why we need to “link semantic entities” ?
Is it that you start with a lot of links then your programs try to filter down to only links between semantic entities ?
Or is it that the links in the input graph and the links you trying to get are of different kinds ? If so please briefly describe the difference.
bes our domain-specificlanguage. Section 7 shows our experiment settings a show annotation
missing 5 and 6
� isan ordered pair of semantic group from 𝐷𝑖, indicating a directi show annotation
groups
ments, we are provided with the groupingof entities 𝐺𝑖 ={𝑔1,𝑔2 …𝑔𝐾} and t show annotation
how is a group of entities is defined ? Is this a subset of D_i, or a subgraph of {D_i, D_R} ?
Also I’m confused that while previous section talks about the links between entities but now it’s between group of entities instead.
s 𝐺𝑖 ={𝑔1,𝑔2 …𝑔𝐾} and the semantic link between 𝐿𝑖existing between elements , where each semantic link (𝑔𝑗 show annotation
did you mean “the semantic links L_i between existing elements” ?
sis can be expressed as follows: ∀((𝑔1,𝑔2) ∈𝐿𝑖)∃(𝑃 ∈S)s.t. 𝑔1 ∈𝑃(𝑔2)and (𝑃𝑔2,𝑔2) ∈𝐿𝑖∀𝑔2 (1)Intuitively, this means that show annotation
Please describe what P, S, P(x) and P_x are.
I suppose:
- S: set of all valid programs
- P: a valid program
- P(x): output of P with input x?
- P_x: not sure?
Also, is this describing 2 requirements or 1 ? It a appear to me that the part after “and” is a separate requirement since it only contains g2 while the other contains both g1 and g2. If so, please separate them into 2 statements and change the description to plural.
𝑀𝑖 ={(𝑒1,𝑒2)}. In the case o f semantic entity grouping ,we input pair-wise connections show annotation
I believe this task was not defined. So far there has only been the definition of semantic entities linking.
Also, the introduction only mentioned “generate programs that link semantic entities”.
layout and textual information, the predicates and conditions𝑃𝐶, such as 𝐿𝑏𝑙𝐶 for label comparisons, 𝐹𝑙𝑜𝑎𝑡𝐶 for floating-pointcomparisons to incorporate layout features, 𝑆𝑡𝑟𝐶 for string com-parisons. The label constraints 𝐿𝑏𝑙𝐶 m show annotation
This sentence sounded weird, seems like it is missing the predicate.
𝐷𝐶 is the domain ofconditions 𝐷𝑉 is the domain of visual elements in a document , (3)𝐷𝑅 is the domain of relat show annotation
missing number for this one ?
operates. Thus,early pruning of possible bindings leads to a better chanc show annotation
shouldn’t this be “impossible” or “invalid” ?
ynthesizing redundant programs, we the set of covered specifica-tions show annotation
missing verb
e;endif 𝐵+∗𝑝 \𝐶𝑜𝑣𝑒𝑟 ≠∅and 𝑝𝑟𝑒𝑐(𝐵+∗𝑝 ,𝐵−∗𝑝 >𝑝𝑟𝑒𝑐(𝐵+𝑝,𝐵−𝑝 then𝐶𝑜𝑣𝑒𝑟 ←𝐶𝑜𝑣𝑒𝑟 ∪𝐵+ show annotation
missing close parenthesis ?
𝑡 ∈ 𝑉𝑠𝑒𝑡_𝑟𝑒𝑡𝑢𝑟𝑛, (𝑏[ v0 ],𝑏[v𝑟𝑒𝑡]) ∈ 𝑀), and 𝐵−𝑝d show annotation
missing domain for v0 here ?
s 𝑁𝑃∗ and version spaces 𝑉𝑆∗ ==𝐶𝑜𝑣𝑒𝑟,𝑃𝐶𝑜𝑣𝑒𝑟,𝑁𝐶𝑜𝑣𝑒𝑟 ={}, {}, {};𝑉𝑆∗,𝑃𝑃∗,𝑁𝑃∗ ←{}, {}, {};== foreach 𝐵+𝑝,𝐵−𝑝,𝑝 ∈𝑉𝑆 doi show annotation
Inconsistent use of "←" and ”=” ? Might be that I misunderstood the notation, if so please ignore this comment.
𝑝 ;endendend𝐸𝑃𝐶𝑜𝑣𝑒𝑟 ←{} ==𝑝𝑢 =Union({𝑃𝑃∗}∪{𝑁𝑃∗})== ;foreach 𝐵+𝑝,𝐵−𝑝,𝑝 ∈𝑉𝑆 do show annotation
should be on a new line ? Also should be "←" instead of ”=” ?
reach 𝐵+𝑝,𝐵−𝑝,𝑝 ∈𝑉𝑆 doif 𝐵−𝑝 (𝑃𝐶𝑜𝑣𝑒𝑟 ∪𝑁𝐶𝑜𝑣𝑒𝑟) and 𝐵+𝑝 \𝑃𝐶𝑜𝑣𝑒𝑟 ≠∅then𝑃 show annotation
missing ”= ∅” ?
wards values on thedown right, ( "specific gravity" towards “not determined”), inth show annotation
the image is too blurry for this one
�𝑙 :=Exclude(Union(𝑝𝑝 ∈𝑃𝑃), Union𝑛𝑝 ∈𝑁𝑃) )7 EXPERIMENTWe implement VRDSyn show annotation
missing open parenthesis
f joining programs.7.3.1 RQ3.1. Inference time efficiency. We note down the meanand standa show annotation
It appears to me that VRDSynth is CPU-bound and does not benefit from GPUs. Thus, you might want to make the comparison with more CPU cores. If in that scenario VRDSynth performs better or comparable to LayoutXLM, then you can make some points about the practicality of VRDSynth.
For examples:
2-4 cores (no gpu) is the common spec for personal laptops. In businesses, document processing tasks like mentioned would be preferable to be able to run on edge devices. (small memory footprint is also relevant in this case)
more cores (no gpu): could be use on on-premises IT infra. Not needing gpu while still produce similar performance would be more cost effective.